Amy Guy

Raw Blog

Wednesday, January 16, 2013

1st International Open Data Dialogue, Berlin, 5-6 December

Read my complete notes from day one, and complete notes from day two.

The 1st International Open Data Dialogue in Berlin in December was broadly a discussion about real-world applications of Open Data.  Lots of practice, less theory.  Despite this (or perhaps because of this, now I think about it) it wasn't as technical as I expected.  Felix Sasaki [1] talked about some basic technicalities of Linked Data and the Semantic Web, kind of the first things you'd learn if you were studying it in a structured way, and I heard a lot of people afterwards complaining that that had been too technical.

Importantly, there was a real message of getting things done at this event, and plenty of evidence that a world built on Open Data is not an idealistic pipe dream, but a reality right now.  Challenges are being articulated, and solutions are being created, and problems are being overcome.

I stress this particularly because a couple of sceptics who weren't at the conference tweeted things along the lines of "Sounds like your conference is a bunch of idealist hippies preaching to the choir…"  A genuine concern, but what's really exciting is that this definitely wasn't the case.  It was instead a bunch of realist technologists with the expertise and influence to actively overcome barriers to improving the world.

Open Data is about social change and empowerment.  It is about accountability of organisations with massive influence over the lives of ordinary people.  It is not about an abandonment of personal privacy, or everybody knowing everything about everyone else.

It should go without saying (yet it still needs to be said) that it is not appropriate to blindly make all data available to everyone about every aspect of everybody's life.  But what if you had access to all of the data anyone had ever collected about your life?  Think about purchase history (shop loyalty cards, travel tickets), online activities (searches, browsing history, social networking).  All this stuff is being stored anyway, all over the place.  Often by organisations who fully intend to profit from it, presumably with your unwitting consent.  They went to the trouble of collecting it, but you went to the trouble of providing it.  It's your data too.  What could you do with it (or hire a software developer to do with it)?  Then imagine you had access to the same data from everyone in your town, aggregated and anonymised, and visualised in a nice way.  Maybe you could team up with your neighbours for cheaper bulk food purchases?  Maybe you'd realise that others had similar hobbies or problems nearby, and could form special interest or support groups?  Reduce costs by sharing transport to similar destinations (or just have some company on the journey)?

There's so much potential within data that's already held.

The UK government's Midata initiative is a massive step in the right direction [3] toward compelling commercial enterprises to hand over machine-readable datasets to consumers upon request.

In Slovakia and Kenya (and possibly others, but these were the ones that came up), there is a constitutional right to data held by the government.  Not without loopholes and other problems, of course [5, 2].

One of the obvious problems is convincing large organisations that hold lots of data (like commercial enterprise and governments) of the circumstances in which it would be in everybody's best interest to release (some of) it.  Reasons they don't include a lack of understanding of the benefits; disproportionate assessment of risks; aversion to change; a lack of technical expertise and infrastructure; "data hugging syndrome" [2]; licencing issues; outdated business models.

Nigel Shadbolt's experience says that large organisations who open data always see benefits.  It's always worth the effort.  When the data is there, suddenly developers start doing things with it; applications appear, many unexpected, and usually free.  He stressed that it's important to have a stockpile of success stories in case you need to convince someone in charge of the value of Open Data, and his favourite one was the publication of MRSA rates in hospitals (resulting in sharing of good practice, and an 85% reduction in MRSA over two years).  See a list at the end of this post for all of the success stories I came across over the course of the two days.

There were lots of discussions about the users or audiences of Open Data, and the various different roles people can have.  Most consumers of Open Data are developers, and 'ordinary people' see the data via an application.  Many won't know (or care) about the source of the data that powers the app, even if it about them.  Many will, and trust must be built for people see the value that such apps could bring to their day to day lives.  Ideally, releasing a dataset would be part of an ecosystem, rather than a one-time thing.  Data providers should value consumer feedback, and commit to good quality, up-to-date data.  Rufus Pollock wonders why every dataset doesn't have a public issue tracker, and notes that poor quality data creates wasted time, especially at hack events [4].

A successful Open Data world needs partnership between the public, media and organisations.  All of these parties need educating on appropriate combinations of the realistic potential of Open Data, and the technicalities of releasing and using it.  Michael Hörz [6] discussed the journalist perspective on Open Data; they're desperate for data about everything, and often manage to get hold of it.  But they find themselves begging for spreadsheets or CSV files, because what they get given are PDFs.  Eugh!  Yet they're not asking for Linked Data formats?  Which means, presumably, that after they've been through the trouble of extracting data from PDFs, they're putting it in a spreadsheet or something, and there's still a whole level of usefulness missing.  And I assume that's because they don't know otherwise, or perhaps don't have the resources to learn even if they're aware of the possibilities.  Similar sorts of reasons that they're being given PDFs by organisations in the first place.

So awareness, and easily digestable educational resources (how about need to be promoted.

Now then, about those success stories...  This list includes data publishing projects, groups and apps that have been built on Open Data.

That'll do for now.  Lots of the portals and competitions have links to app examples etc.  There's lots to explore.

Finally, I highlighted in my notes quite a lot of things that I need to find out more about.  A lot of them are technology or platforms for publishing or sharing Open Data, and various standards or studies I need to read in more detail.

I have a couple of questions to ponder on, too:

There's a massive focus around hacks (more often than not one off events) as a way of using and promoting Open Data.  What other ways are there?  What will the path to a deeper integration of Open Data in society look like?

There are lots of datasets and vocabularies about public services and society, as well as science and education.  What arts, culture and media datasets are out there?  (And what has been done with them?)  Ooh, or online social interactions?  Maybe I'll do a survey.

[1] Prof. Dr. Felix Sasaki, keynote: "Linked Open Data @ W3C-Vocabularies, Working Groups, Usage Scenarios."
[2] Prof. Dr. Simon M. Onywere, talk: "The Kenya Open Data Incubator Project – Outreach to Research Community."
[3] via Nigel Shadbolt
[4] Dr. Rufus Pollock, keynote: "Open Data, Building the Ecosystem"
[5] Peter Hanečák, talk: "Open Data and Open Government Partnership in Slovakia."
[6] Michael Hörz, talk: "Open Data in Local Journalism: An Excel file?"