Amy Guy

Raw Blog

Monday, July 08, 2013

[Notes] Manfred Hauswirth at #SSSW2013




Streams: Any time dependant data / changes over time.

Has done a paper about P2P stuff.

Data silo - "natural enemy of SW scientists"

Massive exponential growth of global data.

Still have to integrate dynamic data with static data.
Multiway joins are domintion operator.  Need to be efficient.

Everything/body is a sensor.

Various research challenges:

  • Query framework.
  • Efficient evaluation algorithm.
  • Optimise queries.
  • Organisation of data.

CoAP ~= http for sensors.

Stuff about sensor networks and context - useful for Michael.

  • Common abstraction levels for understanding.
  • SSN-XG ontology
    • Application: SPITFIRE
  • You can buy a sensor off the shelf that runs a binary RDF store and can be queried.  So possible to use SW tech with resource constrained devices.
  • RESTful sensor interfaces stuff being standardised - CoRE, CoAP.
  • Linked Stream Model
  • CQELS-QL (extension to SPARQL 1.1; already legacy)

Rewrite query to spit out static and dynamic - lots of overhead.
But need to optimise between these.
Neither existing stream processing systems nor existing databases could be efficient enough.
So the built own LD stream processing system.  (Optimised and adopted existing database stuff).

HyperWave - didn't succeed.  Didn't listen to customers and wasn't open source (license fees).
But better than hypertext was back in the day.
Performance important for success/uptake.

Just putting it on cloud infrastructure doesn't mean it scales.

  • Need to parallelize algorithm.
  • Took it to a point where adding more hardware did help.
  • Problems!  Inconsistent results, engines don't support all query patterns.. very early, don't fully understand yet.
  • Long way to go.  How to prove what is a correct result?
  • Needs to be easy to use - dumb it down.
    • Linked Stream Middleware (available):
      • Flights, _live trains_ - SPARQL endpoint!, traffic cams.
      • SuperStreamCollider.org
      • Current Tomcat problem with twitter streams.

To do?

  • Scaleability
  • Stream reasoning (only processing, pattern matching, so far.  Want to infer conclusions).

World is:
... uncertain, fuzzy, contradictory.
So combine statistics and logics.
Hard to scale logical reasoning, so use statistics to shoot in the right direction.

Privacy?

  • Build systems! Can't do thought experiments about the Web.

Don't get hung up on approaches / labels.