Amy Guy

Raw Blog

Wednesday, January 16, 2013

[Notes] 1st International Open Data Dialogue (Day one)

December 5th, 2012.

Notes as I scrawled them.  See here for a proper review.  Purple is calls to action for myself.

Dr. Philipp Mueller - Openness as a means, not an end
I missed the opening keynote.  His slides are here.

Dr. Wolfgang Both - One Year Open Data Portal Berlin

Open cities EU - Nov '10 - Apr '13.
Amsterdam, Barcelona, Berlin, Helsinki, Paris, Rome.
Berlin responsible for open data working group; several working groups (OD is just one).


Knowledge Society Open Data working group
Guidebook for cities, 30 pages so far.

Open Data Berlin
Portal Sept '11
Press conference Feb '12
Short term: Political agenda, budget, working group.
Mid term: Harmonize data formats.
Long term: Legal framework (Berlin can't decide laws by itself, for whole EU).

Open Data Day May '11, '12 and '13 in prep.

WG open traffic hack (29th Nov '12)
- 150 programmers with transport data.

Portal stats
Are users interested?  Peak at start.  Other peaks for hacks, Apps4D contest (Nov '11)
Possibility for feedback - questions, advice, ideas.
100 datasets;

WG 2012
- formats and metadata
- licensing and user rules
- education for staff (lectures, this is new for many working in public sector)
- organisting and processing
WG 2013
- evaluation of OD studies
- Recommendations
- Exchange with other cities.

Datasets were volunteered, not selected
- But are looked at for quality, machine readable, looking for wide range of topics of interest to public.
- Want open, transparent process for publishing.
- Communicate with media as well as community.

- Heuristic... no legal advice available because it hasn't been done before.  Many possibilities; for opening data for individuals, CC was familiar to Internet community, includes origin data (CC-BY).  Some smaller datasets are licensed for non-commercial usage.  Discussion still ongoing.
Knows other cities will follow / copy there example whatever they do!

Jan Schallabok - Right to Freedom of Information on Enterprises

Call for open enterprise data.

Scenarios, set a timeline:
2015 - personal search
2016 - data disasters (identity hack)
2017 - pictures omnipresent, know all about everyone becomes normal
2019 - Google Glass on market

If there's no data on you, things don't work (eg. personalised advertising)
Society down the drain if it didn't open data (Switzerland in the story didn't open data, so Swiss woman moving to German couldn't settle in easily).

Moving away from clear facts towards probabalistic.
Google Translate fed by open (input) data, but algorithms aren't open.
Siri - enriches dataset from Web (he said Google search?)
OpenStreetMap (counter example)
If there was more data, everyone would use OSM paradigm (eg. government).

Harm businesses?
..maybe.  But more damage in the long run?

Need to make businesses move away from using peoples' data.   Like, we work for facebook.  our data, not theirs.

Data protection:
by law, data subject has rights to know logic involved
as long as it doesn't affect trade secrets

Privacy implications
'Anonymous' data can be used to identify people.  See AOL search database fail.
When can datasets go public?
Weather data can be personal data (no time for example..)

Michael Hörz - Open Data in Local Journalism

Journalists expect everything
- spending
- political decisions
  (district levels, searchable)
- quality (schools, pollution, food)
- real time sensors (air quality, traffic, energy)

- Open Data Paris (loads, on a map)
- Locrating (school performance on map, UK)
- Chicago bike crash reports (map) (sort by injury, date, day; data all open from Chicago Transport Authority, in a nice format).
- LA Times LAFD (fire dept.) response times.

- Airplane noise map, (Journalists had a PDF, eugh).  One of the first Berlin interactive visualisations.
- Berlin election.  Was real time.  Down to the polling station.
- Berlin bicycle accidents '11 - came from massive PDF (3,800+ cases)

- Wishes for xls or csv... wants directly processable.  WHY NOT RDF?!
- ..or APIS.  WHY NOT LD?!
Reality = PDFs, requests ignored, data incomplete or hidden.  Hard to get for journalists.
All datasets are interesting and should be out there.  In Berlin often only one or two districts are available, which is no good.


It's not always straightforward just to release data - need priorities; raw data/API documentation isn't always available straight away.

Why PDFs?  They don't know any better.  Need to make people aware.

Consequences of public seeing data they're not used to?  Panic?  Or activism?  Pressure politicians for change.  Empowers people.

Is there are resistance to making data available (eg. Italy - data there but useless).  Maybe, or maybe they just don't realise [it's useless].

Prof. Felix Sasaki - Linked Open Data @ W3C-Vocabularies, Working Groups, Usage Scenarios

== first half of MASWS. 

New work on LD 

Media fragments - spec finalised 

Ontology for Media Resources (DC for video and audio?) 

Internationalization Tag Set 2.0

SW core is stable, so work with vocabs now. Need interoperability. Decide: - Syntax - - Microdata not necessarily for SEO - -;; very basic schemas with increasing numbers of more specialised extensions. Discussion at - 

Application scenarios.

Organisation ontology - Membership and reporting structure, location information, organisational history - Interoperable organisations - 'Final call' stage - nearly done. Need feedback.

DCAT (interoperability between data catalogues) - Uses FOAF, DC, SKOS draft Namespace neutrality -

Language graph of the Web is cool.

Tomáš Knap - Tracking Data Provenance of the Published (Linked) Open Data
watch film

Defines provenance and agents, artifacts, processes. Provenance useful for data integration. Which is right/recent etc. 

How to cite. Vocabularies: PROV-O (almost w3c final, w3/ns/prov), VoiD (datasets w3/TR/void), FOAF, DC ODCleanStore - (prov aware storage, processing, querying) - Write rules/queries using web front end. Certain automation from inserting ontologies. Still manual work.

LOD2 WP9a - EU project LD tools

Maria Magdalena Theisen - Open Data and Big Data

Big Data - have to ask questions to understand what questions to ask. Consists of volume, velocity, variety. Can't say that all open data is big data, and vice versa.

Some BD from external social media, disaster information, sensors, smart meter. Lots of things bringing data to process.

Facebook has largest data collection by 2010.

Open and Big - Eye on Earth - air watch (over 1k stations in Europe providing live data), noise watch, water watch (static historical data) - can rate quality of data and give attributes

Cloud computing is an enabler for B and OD - Don't have to manage servers to provide data. - Flexibility and scalability - Interoperability with existing infrastructure - Easy access to data - Development platform (Azure) - can enter an app with open data into marketplace. Lots of examples.

Is Azure marketplace integrated with ckan? - No.

Evanela Lapi - Building Sustainable Open Data Platforms

Understand stakeholders

* Consumers
* Developers
* Citizens
* Less technical, can use open data to help with life

* Journalists, scientists, researchers
* First two more critical
* disseminate data
* Need open, standards-based, non-proprietary formats.  Easy to download/browse/search/redistribute/share.

* Publishers
* Provide transparency
* Want a cost-effective, easy solution platform
* Public sector has lots of data not online - because it's hard to publish?
* lots of friction, fragmentation

Socrata - end to end, custom solution.  Many implementations in US, Kenya.


Integrated, loosely-coupled - existing SW, eg. CKAN + Drupal (
Faunhofer OD platform is Java (Amsterdam uses)

Open Cities
- open innovation (see last time this was mentioned)
- on Github - get feedback from use.

Virtuoso triplestore + Liferay CMS + CKAN catalogue
(Java wrappers for REST APIs)

User roles:
  • Data owner
    • Publish
    • Maintain
    • Bulk upload
  • Platform user
    • Query
    • Discuss
    • Search
    • Browse
    • Download
    • Propose new - 137 datasets in 18 categories - 22 datasets

It's a good start, but still not enough - why?
  • Too much manual work, redundancy across different platforms.
    • Modernise environment - by modular, high level stuff?  (I think that's what she said)
"Germany isn't much into OD yet.."

Oliver Adamczak - Big Data for Smarter Cities

Leaders must innovate to exceed citizen expectations.
Functionality of BD - use variety and volume to innovate.
Vision - do things you haven't thought of before.

I should do a survey of open data about arts/media?  It's all about gov/science.

IBM BD platform

Hadoop to store
- low cost (open source)
- scaleable
- easy to load data - don't have to care about structure until afterwards

Text analytics to read PDFs etc. and extract data with context.

Streaming data is important
- Not for repo, just use/analyse and discard.