Amy Guy

Raw Blog

Monday, January 28, 2013

Week in review: brainstorm

21st - 27th January

Brainstormed about how to work with amateur digital media creators, mostly with the aim of starting to put together a vocabulary for representing various digital media creation processes, collaboration dynamics and audience engagement.  More coherent thoughts on that next week, I should think.

Lightning-talked about the Berlin Open Data Dialogue at an OKFN meetup in the National Library of Scotland on Thursday 24th.

Talked about ontologies for sensor data at Ontologies with a View.

Thursday, January 24, 2013

OKFN Meetup #4

Which was hosted by the National Library of Scotland. (Information).

I reported on the 1st International Open Data Dialogue in Berlin that I'd been to in December, but then had to immediately leave, so I don't have any notes on the rest of the talks..

Sunday, January 20, 2013

Week in review: ILW hack, mostly

14th January - 20th January

Further discussed ILW hack with various parties, and released a website with information for anyone interested in being involved, and registration for students.

I went to the first Machine Learning and Pattern Recognition class.  It looks interesting, but might be a bit maths heavy... I might concentrate on learning the maths now, and take the class next year, instead of killing myself with both.

Caught up on all my blogging!

Wednesday, January 16, 2013

Week in review: Networking

7th January - 13th January

I was subject of Paolo's procedural task labelling pilot experiment.  It was quite complicated.  I don't know if I did a very good job.

I went to TechMeetup and networked like crazy.

  • I talked to Andy Hyde about ALISS getting involved with the ILW hack, and Edinburgh OD scene in general.
  • And a few other people about supporting the ILW hack in various ways.
  • I talked to Felix Gilfedder of Popcorn Horror about novel ways of engaging with creative digital media communities (PhD related!)
  • I briefly met Russell, with whom I hope to discuss digital video metadata (PhD related!)
  • One other thing which isn't related to Open Data or my PhD but is very exciting nonetheless.
I went to a Design Informatics talk about the Internet of Things.

Week in review: 6 month plan

31st December - 6th January 2013

Wrote a 6 month plan.  It's nice to have solid targets.  Here it is in calendar form.

Got the flu.

Week in review: Open Data in the community

17th December - 23rd December

Mostly things not strictly PhD related...

18th - Initial meeting about organising a hack for undergraduates during Innovative Learning Week in February
19th - Discussed how local small scale community groups can benefit from Open Data, and how they could get involved in the ILW hack, with Freda O'Bryne and Euan Jackson.
        - Also discussed these things with Ewan, and started on a spreadsheet of decent apps that use Open Data, and links to their data sources.

Digital Methods as Mainstream Methodology, London, 7 December

(Write-up about this event by the organisers here).

DMMM was a conference aimed at sociologists and anthropologists and the like, so, having never studied these disciplines in any way, I was worried I'd have no idea what was going on.

Fortunately everyone was friendly, and everyone's research was really interesting, relevant and mostly made sense to me.  You can read all of the notes I took here.

Humanities researchers are using and gathering digital data in lots of interesting and unique ways.  Using social media and other digital methods to engage with study participants (Jo Belcher, Lorenza Antonucci, Eve Stirling); sentiment analysis (Mike Thelwell); examining archives; image use in online interviews (Emma Hutchinson); e-focus groups (Ibrar Bhatt); digital records (reflections) of a creative arts process (Carole Kirk); crowd-sourcing of commercial ideas (Temitayo Abinusawa); avatars and virtual interaction spaces like SecondLife (Evelyn McElhinney); brilliant playful use of hacking to disrupt discussions about online learning (Jeremy Knox on MOOCs).

I talked about digital media on the Semantic Web, with as much of a sociology swing as I could give it given my limited expertise in that domain.  My slides, beautifully illustrated by Chloe Dungate (available for hire!  Academic slides starting at £2 a drawing!  Loves topics she doesn't understand so she can be as outrageously creative as possible!), are here.  You can see my talk notes there, too.

danah boyd, whose work I've followed more or less since my undergraduate, teleconferenced in to give a really interesting keynote called "Making Sense of Teen Life: Strategies for Capturing Ethnographic Data in a Networked Era."  She discussed working with young people for the last few years to examine their use of social networks (mostly MySpace), and all of the challenges and considerations that came up along the way.  She was surprised a lot.

The open discussion at the end raised a lot of discussion about ethics.  It was implied at one point that the content of tweets or YouTube comments are ripe for the picking with no strings attached because they're already in the public space.  It's definitely not that simple.

There's also a danger of humanities researchers being out of touch with modern techniques and best practices.  Commercial research is sometimes way ahead, but there's no communication between each end of the spectrum, so methods get developed and optimised unnecessarily.

A lot of people had experiences indicating that digital methods in humanities are often not taken seriously.  Supervisors, ethics committees, funding bodies, who have only worked with traditional methods can struggle to see the legitimacy of results gathered by digital means.  On the other hand, certain levels of ignorance can sometimes work to the researcher's advantage in terms of being allowed to get stuff done with minimal red tape (because authorities don't know what questions to ask).

Personally I was interested by this general feeling of novelty about digital methods.  Having been in computing for almost my whole academic career (not to mention a child of the Web), a lot of things were being critically and confusedly discussed that I just take for granted.  Things like the validity of friendships that exist entirely online, and feelings expressed through short-lived text alone.  I think arts and humanities researchers who really want to get to grips with digital methods as legitimate research tools should consider orchestrating placements alongside technical researchers and immersing themselves in a world where the main options are all digital by default.

[Notes] Digital Methods as Mainsteam Methodology

December 7th, 2012.

Notes as I scribbled them.  Read a proper review.

Peter Webster - Digital Resources for Social Scientists at the British Library
British Library research methods guide under social sciences.
Web about individuals and organisations is fragile, things disappear. Most site owners ignore requests for permission to archive stuff. New legal deposit legislation next year will allow scraping the Web and archiving everything without worrying about infringing copyright. Using this data is restricted; library premises or print one copy for non-commercial use. Can't use an item simultaneously with any other user.. (regulation is for print, derp)

Full text search for internet archive... open question, very complicated. Consultation going on about this. Legislation is very restrictive; need to look at data derived from dataset and how that can be made available.

Mike Thelwell - Sentiment Analysis for the Social Web
Sentiment is seen as peripheral, and often ignored, but actually it's core. Emotional reaction to tea or coffee. SentiStrength - detect strength of +ve and -ve sentiment in short text. Takes into account that text might not be gramatically correct. In social media, sentiment expressed in different ways (eg. emoticons, deliberate misspellings to embed sentiment: haaaapppyyyyyy). List of +ve and -ve term stems and strengths from -5 to 4. People disagree about sentiment surprisingly much. Something lots of people tweet about must arouse sentiment. But for big events, surge of quantity of tweets, but not surge in +ve or -ve sentiment. Lots of sentiment is implicit.

YT comments are easy to get so good source of social data. No ethical concerns about getting permission to analyse because it's already public. (hmm, no?)

Longer the text, less well it works. But does have a long text mode, with slightly different scoring.

Used by Yahoo question answering system to work out best people to give answers. Companies use it for product reactions.

Would like to see these techniques to smaller scale case studies. The most focused data stream still has loads of junk like what they had for breakfast..

Why are 1 and -1 neutral instead of 0? Because psychology - two scales, not one.

Jo Belcher - mixing traditional and digital methods to research hidden carers.

Is online support for parents and carers socially patterned like social media use? Needs to reach people who don't use the Internet, too, for comparisons. People respond in the medium they were first contacted. Treat online and offline results differently for analysis.

Lorenza Antonucci - Using social media in different phases of research process

Digital methods allow you to do something different. A lot of focus on big data and secondary digital data (twitter). Not about collecting own data, but using data already available on social media.

Might be a problem fitting secondary data into existing theoretical frameworks. So need to collect. She's looking at real vs virtual identities. Not possible to do follow up interviews just over facebook. DM help to select people. Also cross-national research. Age/generational aspects.

Jeremy Knox - MOOCs

Is learning simply the consumption of information online?
Web enabled sensors. GPS to record where he goes within MOOC - physical location and digital space together (cool).
Locations tweet when he's in the MOOC.
RFID system that allows office books to tweet content.
Experiments to disrupt and critise. Playful methods to think about the MOOC in a different way. Assemblage of human, technology and place; learning might be post-human.
Final outcome of research? Not sure.. a way to provoke thinking in what is often a closed area.
How is learning affected by physical space?

Sue Thomas - technobiophilia

O'Reilly's topsoil metaphor is cool.
Five categories about how people talk about online experiences.
Lots of nature metaphors. Metaphors aren't deliberate. We bring nature into computing because we innately want it to be there (biophilia)

Carole Kirk - Digital reflection: A method for arts practice-led research?

Questions and methods come from practical. Tacit knowledge. Capturing creative processes.
How to leave a trace of an action for reflection?
Digital methods can help.
Not a complete record, but a trace. Digital technologies that involve a high level of manipulation stimulate greater reflection. Only archives - metadata, feedback & discussion. Visible record of reflection. Process of creating records doesn't replace the practice itself. Might trigger embodied memory. Help to articulate fleeting things.
In arts, practice-led research is more about creating digital data.

Emma Hutchinson - Asynchronous online interviews and image elicitation

Async, like email or forum.
Complement interview with photos, but not used much online yet.
Identity performance of online gamers.
Images help with articulation. Lots to talk about.
Photos that do/don't get uploaded to facebook, why / why not?

Eve Stirling - Facebook profile as research tool

Undergraduate transition to university.
Lots of HE happens on fb. Looking at the every day.
Digital and physical spaces.
Personal fb is not academic, and hidden. Twitter is academic.
Ethnography is about understanding every day culture and developing trust and rapport with participants.
Fb friends are linked to study. Intrinsically linked. Does becoming fb friend need a disclaimer? Informed consent? Personal and professional lives blurring. After study - delete friends?

Me!  Digital Media on the Semantic Web.  
My slides are here.

danah boyd - Making Sense of Teen Life: Strategies for Capturing Ethnographic Data in a Networked Era

Understanding social networks before there were sns
The rise and fall of myspace.
How much can be made sense of from a distance? Engaging with own friends not like working with young people. How radically difficult it is to interpret what she sees. Young people better at encoding the information they make available, because of adult surveillance. Just because she can see their content, doesn't mean she knows what's going on.

Observes offline too. Adults help to recruit youths with different perspectives.
Thought about recruiting online but stopped. Not a good norm to start.
Make sense of online things with offline interviews (ethical things considered, parents/friends nearby).
Don't begin the conversation with online material; need them to feel comfortable first; usually an hour into the interview.

How to coordinate data? Serious challenge. Blogging about things as she's thinking through them; trying to make sense of them in a public way.
Thinking out loud, can be corrected and challenged during sense-making process. Not just experts, but her participants too.
Controversial piece about shift from myspace to fb. Got picked up overnight and got over 10k responses. Most people frustrated and angry and didn't understand where she was coming from. People came forward with quantitative data that helped. Adults attacked her for being racist; young people responded with their stories.

How public to make the young people? Don't expose them, no real names unless their already public figures. Never quotes online material exactly so people can't search to identify the young people. Visibility has consequences people don't expect or understand. Can make people more vulnerable by making them visible.

The young people could choose to make themselves visible through the process (some do).
Speak for them or help them speak?
Public in the media to make young peoples' voices heard as much as possible. Never publishes in a closed access journal.
Collaboration is in sense-making, not writing. More in intervention-driven projects.
Generally don't want to, but a few exceptions - published two papers with teenagers.

Tension between MS and research?
No, MSR is academic institution. Lots of freedom.
Teenagers expect her to fix xbox.. External perception is more confusing than internal.
Can make the case for open access in a way that lots of university scholars can't.

Problems with paraphrasing quotes from websites?
Tries to make quotes more common, found everywhere.
Her ethics about making people not more vulnerable is worth more than skimping on real quotes. Helps that she doesn't rely entirely on online data.

Says she's not good at articulating her methods.

Ibrar Bhatt - e-focus groups and e-interviews

Separate summer project on student experience for postgraduate research students, distance learners, part time students globally.
Needed in depth focus group without them being there.
How involved do the students feel in activities in the School of Education?

Different doing focus groups online - affordances and challenges.
Used Adobe Connect. Facilitator echos some questions.
Multidimensional focus group. Participants could also discuss with each other, and with researcher, over personal chat.
Guidelines - rehearsal, drop-in session, beforehand - recording of everything; can integrate their video into a transcription.

Temitayo Abinusawa - Social Networking and innovation

Technical background. How organisations use IT to promote activities.
Social networking - Internet was loads of words, chaos. Make sense of the words, then you can innovate. Can create products and services to meet needs. Good ideas that need funding. People discuss ideas online. Organisations are looking for new ideas too. Organisation can search the Web for ideas to create innovation.

Outcomes: organisations can create more for less.
Feedback is consumer interaction that takes place on the Internet. Dell IdeaStorm - turn your ideas into reality.

Openness is important, not exploitation. (Seems to be reward focussed, rather than transparent exchange culture, query).
People don't read t&c so think they're being exploited.

Evelyn McElhinney - Social virtual worlds: a new place for the avatar researcher

Focus groups in Second Life with avatars collecting, sat in chairs like IRL.
Most people aren't roleplaying, their avatars are just themselves.

Closing Discussion


Commercial research is sometimes ahead with digital methods practices. Things happening one end aren't being noticed by other end. Need to communicate and look out for each other. Mixed methods research.

Digital methods need legitamising to be taken seriously. Sometimes people not knowing much about it can help to get stuff done.

1st International Open Data Dialogue, Berlin, 5-6 December

Read my complete notes from day one, and complete notes from day two.

The 1st International Open Data Dialogue in Berlin in December was broadly a discussion about real-world applications of Open Data.  Lots of practice, less theory.  Despite this (or perhaps because of this, now I think about it) it wasn't as technical as I expected.  Felix Sasaki [1] talked about some basic technicalities of Linked Data and the Semantic Web, kind of the first things you'd learn if you were studying it in a structured way, and I heard a lot of people afterwards complaining that that had been too technical.

Importantly, there was a real message of getting things done at this event, and plenty of evidence that a world built on Open Data is not an idealistic pipe dream, but a reality right now.  Challenges are being articulated, and solutions are being created, and problems are being overcome.

I stress this particularly because a couple of sceptics who weren't at the conference tweeted things along the lines of "Sounds like your conference is a bunch of idealist hippies preaching to the choir…"  A genuine concern, but what's really exciting is that this definitely wasn't the case.  It was instead a bunch of realist technologists with the expertise and influence to actively overcome barriers to improving the world.

Open Data is about social change and empowerment.  It is about accountability of organisations with massive influence over the lives of ordinary people.  It is not about an abandonment of personal privacy, or everybody knowing everything about everyone else.

It should go without saying (yet it still needs to be said) that it is not appropriate to blindly make all data available to everyone about every aspect of everybody's life.  But what if you had access to all of the data anyone had ever collected about your life?  Think about purchase history (shop loyalty cards, travel tickets), online activities (searches, browsing history, social networking).  All this stuff is being stored anyway, all over the place.  Often by organisations who fully intend to profit from it, presumably with your unwitting consent.  They went to the trouble of collecting it, but you went to the trouble of providing it.  It's your data too.  What could you do with it (or hire a software developer to do with it)?  Then imagine you had access to the same data from everyone in your town, aggregated and anonymised, and visualised in a nice way.  Maybe you could team up with your neighbours for cheaper bulk food purchases?  Maybe you'd realise that others had similar hobbies or problems nearby, and could form special interest or support groups?  Reduce costs by sharing transport to similar destinations (or just have some company on the journey)?

There's so much potential within data that's already held.

The UK government's Midata initiative is a massive step in the right direction [3] toward compelling commercial enterprises to hand over machine-readable datasets to consumers upon request.

In Slovakia and Kenya (and possibly others, but these were the ones that came up), there is a constitutional right to data held by the government.  Not without loopholes and other problems, of course [5, 2].

One of the obvious problems is convincing large organisations that hold lots of data (like commercial enterprise and governments) of the circumstances in which it would be in everybody's best interest to release (some of) it.  Reasons they don't include a lack of understanding of the benefits; disproportionate assessment of risks; aversion to change; a lack of technical expertise and infrastructure; "data hugging syndrome" [2]; licencing issues; outdated business models.

Nigel Shadbolt's experience says that large organisations who open data always see benefits.  It's always worth the effort.  When the data is there, suddenly developers start doing things with it; applications appear, many unexpected, and usually free.  He stressed that it's important to have a stockpile of success stories in case you need to convince someone in charge of the value of Open Data, and his favourite one was the publication of MRSA rates in hospitals (resulting in sharing of good practice, and an 85% reduction in MRSA over two years).  See a list at the end of this post for all of the success stories I came across over the course of the two days.

There were lots of discussions about the users or audiences of Open Data, and the various different roles people can have.  Most consumers of Open Data are developers, and 'ordinary people' see the data via an application.  Many won't know (or care) about the source of the data that powers the app, even if it about them.  Many will, and trust must be built for people see the value that such apps could bring to their day to day lives.  Ideally, releasing a dataset would be part of an ecosystem, rather than a one-time thing.  Data providers should value consumer feedback, and commit to good quality, up-to-date data.  Rufus Pollock wonders why every dataset doesn't have a public issue tracker, and notes that poor quality data creates wasted time, especially at hack events [4].

A successful Open Data world needs partnership between the public, media and organisations.  All of these parties need educating on appropriate combinations of the realistic potential of Open Data, and the technicalities of releasing and using it.  Michael Hörz [6] discussed the journalist perspective on Open Data; they're desperate for data about everything, and often manage to get hold of it.  But they find themselves begging for spreadsheets or CSV files, because what they get given are PDFs.  Eugh!  Yet they're not asking for Linked Data formats?  Which means, presumably, that after they've been through the trouble of extracting data from PDFs, they're putting it in a spreadsheet or something, and there's still a whole level of usefulness missing.  And I assume that's because they don't know otherwise, or perhaps don't have the resources to learn even if they're aware of the possibilities.  Similar sorts of reasons that they're being given PDFs by organisations in the first place.

So awareness, and easily digestable educational resources (how about need to be promoted.

Now then, about those success stories...  This list includes data publishing projects, groups and apps that have been built on Open Data.

That'll do for now.  Lots of the portals and competitions have links to app examples etc.  There's lots to explore.

Finally, I highlighted in my notes quite a lot of things that I need to find out more about.  A lot of them are technology or platforms for publishing or sharing Open Data, and various standards or studies I need to read in more detail.

I have a couple of questions to ponder on, too:

There's a massive focus around hacks (more often than not one off events) as a way of using and promoting Open Data.  What other ways are there?  What will the path to a deeper integration of Open Data in society look like?

There are lots of datasets and vocabularies about public services and society, as well as science and education.  What arts, culture and media datasets are out there?  (And what has been done with them?)  Ooh, or online social interactions?  Maybe I'll do a survey.

[1] Prof. Dr. Felix Sasaki, keynote: "Linked Open Data @ W3C-Vocabularies, Working Groups, Usage Scenarios."
[2] Prof. Dr. Simon M. Onywere, talk: "The Kenya Open Data Incubator Project – Outreach to Research Community."
[3] via Nigel Shadbolt
[4] Dr. Rufus Pollock, keynote: "Open Data, Building the Ecosystem"
[5] Peter Hanečák, talk: "Open Data and Open Government Partnership in Slovakia."
[6] Michael Hörz, talk: "Open Data in Local Journalism: An Excel file?"

[Notes] 1st International Open Data Dialogue (Day two)

December 6th, 2012.

Notes as I scrawled them.  Read a proper review.  Purple is calls to action for myself.

Rufus Pollock - Open Data, Building the Ecosystem

Open API is a contradiction.  
An API is not open (though still valuable) but not the same as bulk open data.
Should we open governments?  So much is hidden.
Ultimately a lot of it is about in/justice (debts paid off by younger generations).

Open spending
- only 30% of departments are up to date - can keep checking who's updating their stuff.
- but sometimes more up to date that government's own records - they use it themselves.

Getting the data is only the beginning.  Need a platform.

Getting citizens, journalists and government to work together.

Why doesn't every dataset have a public issue tracker?  This is what he'd like to see most.

The process is important.

Digital doesn't run out.  That's why open makes so much sense.

FourSquare uses OSM.

Average consumers won't see raw open data - but via products and services.

Next?  Building communities.  Takes time to nurture.

We'll go from hackdays to deep integration.  Toy vs. core datasets.

Geodata is mature.  Data analysis, programming, training.
- Teachathons better than hackathons.

Should have access to all our own data - travel (Oyster), shopping (loyalty cards), social.
- Would like to control who uses it.
- Compare with others (aggregate).  Only companies can do that.
- Some people will choose to publish/open their own data.  Even if it's a few people, still lots of data.

Open Data = Platform; !Commodity
- Build on it rather than sell it (and let others build on it)
- Already seeing people building companies/making money on this principle.

Ecosystem.  Need feedback.  Poor quality data creates wasted time, especially at hacks.

Datagov census dashboard - who has what out.
There's no gov. data that's come back cleaned up from the community (even though people are almost certainly cleaning this data up).

Be patient.  Needed to wait for pressure to build up (why it's happening now).  Low investment.

Way to fund production of gov. data (or combination of):
- Charge users
- Taxpayer revenue
- Charge creators (most attractive)
  eg. Companies Register data; more efficient to charge people registering companies a bit extra, people won't be deterred from registering a company because of this.  Basically, add to existing fees for things people will pay anyway.

Open data saves lives!  Heart surgery data caused improvements.

MiData (more from Nigel)
Supermarkets not very good, but some big companies involved.

Prof. Esteve Almirall - Reinveting Cities - Open Innovation in the Public Sector

Competition now is about innovation, not money.

Adopt a Hydrant (Boston)

Peter Hanecak - Open Data in Slovakia

2012 govt signed open government partnership. 

Laws already align with Open Data principles. 

Everything must be public, unless stated otherwise (eg. according to a specific law, like army secrets. Must be clearly defined, no rubbish excuses for not publishing something). 

Open licences (GPL, CC) are not recognised, because there's no signed paper. Currently they're campaigning to fix this. 

Existing laws do not fully apply to regional governments, only to state government. 

Data that is available isn't in great formats. Lots of things are ignored or misunderstood.

Achievements so far: - Datastores and apps.,,,, - app competitions, conferences. Restart Slovensko.

- Spread the word 
- Advance principles on regional levels 
- Major release of OD by gov 
- develop OS publication platform for OD 
- incorporate results into other projects

They have a standard published for state government.

Ivonne Jansen-Dings - Code4EU and Apps for Amsterdam

Technology is a key component, but impact on citizens is central. 
Taking linked OD out of academia and applying it to 'real life'.
Tourist one. FairPhone. 

Collaboration between government, coders and citizens. Can't do this top down. Creating something together, that has to evolve naturally. There isn't a formula.

apps for Amsterdam is a platform for people to talk about OD; creating it and working with it. 
  • Lots of reasons why coders participate. 
    • Hobbies
    • Solve local problems
    • Network, get new business
    • Ideas start during hack events/workshops. 
  • Past two years gone from 22 - 130+ datasets. 
  • Quality of data is essential for good apps. 
  • Mostly about getting the government to see the necessity of open data. 
  • All sorts of working groups arising with municipalities to solve specific regional issues using OD. 
  • Helping people who have made/are making apps, to help them move forward. 
  • Lots of people are stuck because they need certain data. 
  • Creating ideas for apps is a good place to start with deciding which data to open up. 
  • Essential for app developers is also getting paid. Necessary to help people evolve apps. So a4A helps to connect developers to companies, NGOs, etc. for whom the app is relevant, so they can work out a way to work together.
  • They don't have much government data about spending and stuff, working on that.

Apps for Democracy (Apps Voor Democratie) 
  • Can see which parties and people collaborated with each other on getting motions put in and passed etc. Now integrated into actual parliament website. 
  • PolitFutures. Stock exchange around politicians.
  • Currently hiring developers to create solutions for problems they see within municipalities. 
  • Changing government from within. Civic innovators.
  • Lots of people come with problems, not ideas for apps.
Not about the apps, about the social change.
Local issues can have solutions that have global value.

Lena-Sophie Mueller - Open Government is more than Open Data

Stuttgart local people had issues with train station and there was a huge protest. Planning happened in a black box, people didn't understand why decisions had been made. 
(Ed trams)

Same as ACTA. People negotiated about it for 7 years, but it was not transparent.

Need transparency to face obstacles of 21st century. 

IT-Planungsrat say open govt needs to be a focus point, but will focus on open data first. 
  • Open goverment though, is more than just open data. 
  • It's hard to keep information secret now. Things get leaked. To counter leaks, they put it online too, so it's a trusted source. Can publish accurate new versions. 
  • What's needed is a Diff, so information needs to be in machine-readable formats so documents can be compared. 
  • Then politicians can be asked specific questions about changes.
Open gov means taking contribution of people as valuable. 
  • US crowdsourcing patent information for making decisions about patents. 
  • Open Budgets - people participate in deciding how budget is spent.
  • Governments and administrations collaborate. 
  • PledgeBank ("a site to get things done")
Lots of administrative workers aren't used to working with data. 
  • Need help with extracting data and organisation and processes and technology. 
  • Change management. 
  • It's going to take some time, but it's worth making the journey.
Convince organisations that it's okay to open data (sans personal information). 
  • Lots is digitalised already, and expensive to digitse paper. 
  • Smaller municipalities are mostly paper. 
  • eGovernment projects are important for open data.
Status of open government in a country? The thing Rufus mentioned. 
Initatives that measure freedom of information laws, but it's very difficult to measure openness. Need to develop a measurement that could be used internationally (big complicated project). 
Web Foundation published an index a few months ago. 

What does she use?  Usually use CKAN.

Christoph Lutz - Open Data and Social Media

Social medai readiness in Hamburg. 
Many insights from social media can be transferred to open data. 
Social media adoption and readiness can take place on several levels 
- organisational level 
- individual level 
- societal level

This project looks mainly at organisational and individual. 

  • different agencies, like culture and financial services; regional agencies. 
  • structure, leadership and culture

  • drivers and barriers to social media adoption: cognitive (know how) or affective (acceptingness of technology, concerns)
Some agencies in Hamburg have started intiatives (like fb, twitter). Some were recalled, but in general there is political support and Hamburg is more advanced than other cities. 

Their research project: 
  • conduct interviews with employees who had contact with social media. 
    • 8 people 
    • analaysed successful vs unsuccessful social media use 
    • different levels of responsibility, and different agencies 
  • case studies 
  • soon a big quantative survey
Organisational factors: 
  • political support (crucial) 
  • leadership support (some people high up in hierarchy aren't familiar with technologies) 
  • autonomy and trust (people can experiment, be proactive) 
  • structures of organisations (complicated, different motives and experiences in different departments, hard to coordinate) 
  • processes (hierarchy and bureaucracy; you need fast feedback for social media, which contrasts with usual way of work) 
  • resource (stressed most in interviews; employees don't have time at work to administer social media accounts, sometimes IT resources)
Individual factors: 
  • age (younger people more interested) 
  • affinity (how much people enjoy working with IT; intrinsic motivation) 
  • experience 
  • social capital 
  • concerns (privacy, security, technostress)
Identifiable strategies: 
  • avoid resistance (make projects appear small, non-invasive, simple) 
  • externalise project (work with other organisations, avoid bureaucracy)
Engage people in open data via social media.

Objective of Hamburg project? Mainly about representing administration to the citizens, and providing feedback to people. Later about engaging people in conversations and participation.

How to make use of social media in administration? How to get public offices to produce open data? The quantitative questionnaire will help show how open people are to social media.

Simon M. Onywere - Outcome of the Kenya Open Data Consultative Forum - Kenya’s Strategy to Make Government Data available to Communities

Increase transparency and accountability of government. 
Help people make decisions. 
Support economic development in the country. 
Supported by World Bank.

In Kenyan Bill of Rights, citizens have a right to the data held by the government. 

Media plays a very important role.

KODI - Code4Kenya:
  • Lack of certain data
  • lack of metadata
  • limited search
  • data duplication. 
  • Need for better analysis and visualisation tools. 
These observations allowed holding a stackeholders forum.
  • People interested in solving problems that face the country.
  • (MANY issues; things Europe was facing 100+ years ago, but no data about what's going wrong, what the impacts are likely to be). 
  • Interested in building a platform, but the content is important, and what is it supposed to help us do? 
  • Issues about data collection. 
  • In the forum, ended up talking about issues that face the country, rather than the data.
  • data hugging syndrome: 'this data is mine' 
  • Lack of Freedom of Information Act 
  • Slow digitization 
  • Lack of trust and low culture of openness
Demand for open repository of all Kenyan PhD and Masters theses. Needs metadata, and needs to be widely accessible. Will help with research, and help people understand research that has been done that might be able to solve problems. Help avoid double research.

Not really free information, because most public sectors spend money to get information.

Project requires various degrees of collaboration. Countries in Europe that are one or two steps ahead. Need training for socio-economic transformation.

Are the licensing issues being sorted out? There's a comprehensive statement on the website (so a custom license?). 
Citizens will hold government accountable for information provided, so they (gov) are worried about data being inaccurate. 
Letting research community use the data can help get useful feedback about what is wrong - you don't know if things are wrong until you try to use the data. 
Up to people to give the government the correct situation on the ground, because gov't is not always right.

Nigel Shadbolt - Finding the Value in Open Data

Local data matters. 

Open Data Institute - build economic value in a serious way. 
2005 AKTive PSI was early beginnings with getting various bodies imagining opening their data. Nobody would really give them the data at that point, but were curious about what could happen. 
Reported to parliament in 2007 - said it was exciting and had great promise, but nothing else. 
Activism, top-level political will and committed individuals needed to move things on.

Two years is not a long time for really disruptive movements like open data. Natural lifecycle to processes. 
Early enthusiasm, but wall of people who question impact, value, if it's worth the effort. We can see now that it's always worth the effort, and doesn't cost much.

Government is becoming more comfortable with this stuff. 
"Can we put a government website up that says 'beta' in the right hand corner, and not be ridiculed?" 
Gov't got used to the idea of agile web development; of not knowing how the system would be finished when it's started.

Suddenly, applications appear. Many unexpected. 
Gov't efforts would be more expensive and less effective.

Virtuous cycle: open licences - open standards - open source - open data - open participation.

CC isn't for everyone... companies worried about giving rights away. In the UK there's an open government license, developed by government lawyers (to make officials feel more comfortable, and understand conditions).

Open Data market needs a steady stream of successes. Always have a story the person in charge can understand. 
OD is abstract principle. 
He uses MRSA. They began to publish infection rates in hospitals. Two years later, it's down by 85%. Worst hospitals can look at ones that are doing better. People started to ask questions; simple procedures implemented (sunlight, disinfectant).
We have to understand that companies are in the business of making money, and public services are in the business of providing efficient services.

Transport... Companies think it's more valuable to keep hold of the data than to have people on the transport - lolwat?

Visualising data highlights issues.

Quality over quantity. 
Routine to publish certain sorts of information, but there's other stuff that's really important. 
Needs to be found easily. 
Data portal needs lots of metadata (quality of content, what kind of links, how much). 
Every public data should rate itself on the 5* score card.

Open Data business models: 
It isn't enough just to publish. 
Need to build demand for data that you're supplying. 
If the data is poor or gets turned off, people will let you know. 
Data Marketplace. 
OD Apps (people think this naturally). 

Innovate - economic benefits for host, sponsor and developers. 
Developers innovate on behalf of companies. 
Build and maintain trust. 
Prove that you're doing good things, eg. where materials come from, or effects on environment. Various different kinds of open APIs.

Open Data needs a balanced and broad ecosystem. 
Not just gov'ts. Businesses are beginning to, citizens might eventually (think about social networks). 
Lots of varieties of open data AND closed data (some just cannot be released). Or personal data that only the owner can/should have access to. 
It's much richer than just "everything's open and we need to work out a way to monetise it".

We will increasingly become aware that we can collect our own data. 
So why can't we get the data that other people collect about us? Why don't you have access electronically to every receipt - and what would that world look like? Switching suppliers, teaming up with neighbors for shopping. 
Energy providers in the UK (three of them, other three will do soon) give access to all their raw data.
It's hard to get data out of companies, but those who do see real benefits. Telephone companies in the UK are seeing increasing data exchange between them and consumers.
Products and processes that we'll see emerge from this are exciting.

What's the mix between open data and personal data (midata)? 
Government midata. Most people don't claim cold weather allowance. Costs a lot to get credits moved around. Open data meeting midata would benefit this. Same in health area.

Open Data Institude - 
Leading the creation of the open data ecosystem. 
Trying to improve public supply. 
Training people to produce and publish open data. 
Incubating companies. 
Work with public bodies, big corporates, small startups, trying to find values in datasets. 
After 8 weeks, 4 companies working in their space. 
Locatable is a transport API provider. High quality access to all open transport data. 
Mastodon - green cloud computing options. 

It isn't all sweetness and light. Have to demonstrate tangiable benefits to keep progress going. Have to give company/politicians good reasons why this is better.

Huge amount of capital value is based on "I know something you don't know". 
As information becomes abundant, the landscape will be changed. You can't rely on knowing something any more, you have to provide something extra quality. 
Drive innovation and improvement in service delivery. 
With heavy investment in acquiring certain information, why should they share it? 
Evolutionary arms race means that someone else will find a way to collect that data more cheaply. How long can they sit on their monopoly?

Governments are not here to become revenue generating businesses, but to provide public services.
In the UK, the office of national statistics gross value added figures are from 2010.
Hasn't seen any examples yet that cost more than the benefit that's gained. 
Hospitals, traffic data. These studies need doing more carefully than they have.

[Notes] 1st International Open Data Dialogue (Day one)

December 5th, 2012.

Notes as I scrawled them.  See here for a proper review.  Purple is calls to action for myself.

Dr. Philipp Mueller - Openness as a means, not an end
I missed the opening keynote.  His slides are here.

Dr. Wolfgang Both - One Year Open Data Portal Berlin

Open cities EU - Nov '10 - Apr '13.
Amsterdam, Barcelona, Berlin, Helsinki, Paris, Rome.
Berlin responsible for open data working group; several working groups (OD is just one).


Knowledge Society Open Data working group
Guidebook for cities, 30 pages so far.

Open Data Berlin
Portal Sept '11
Press conference Feb '12
Short term: Political agenda, budget, working group.
Mid term: Harmonize data formats.
Long term: Legal framework (Berlin can't decide laws by itself, for whole EU).

Open Data Day May '11, '12 and '13 in prep.

WG open traffic hack (29th Nov '12)
- 150 programmers with transport data.

Portal stats
Are users interested?  Peak at start.  Other peaks for hacks, Apps4D contest (Nov '11)
Possibility for feedback - questions, advice, ideas.
100 datasets;

WG 2012
- formats and metadata
- licensing and user rules
- education for staff (lectures, this is new for many working in public sector)
- organisting and processing
WG 2013
- evaluation of OD studies
- Recommendations
- Exchange with other cities.

Datasets were volunteered, not selected
- But are looked at for quality, machine readable, looking for wide range of topics of interest to public.
- Want open, transparent process for publishing.
- Communicate with media as well as community.

- Heuristic... no legal advice available because it hasn't been done before.  Many possibilities; for opening data for individuals, CC was familiar to Internet community, includes origin data (CC-BY).  Some smaller datasets are licensed for non-commercial usage.  Discussion still ongoing.
Knows other cities will follow / copy there example whatever they do!

Jan Schallabok - Right to Freedom of Information on Enterprises

Call for open enterprise data.

Scenarios, set a timeline:
2015 - personal search
2016 - data disasters (identity hack)
2017 - pictures omnipresent, know all about everyone becomes normal
2019 - Google Glass on market

If there's no data on you, things don't work (eg. personalised advertising)
Society down the drain if it didn't open data (Switzerland in the story didn't open data, so Swiss woman moving to German couldn't settle in easily).

Moving away from clear facts towards probabalistic.
Google Translate fed by open (input) data, but algorithms aren't open.
Siri - enriches dataset from Web (he said Google search?)
OpenStreetMap (counter example)
If there was more data, everyone would use OSM paradigm (eg. government).

Harm businesses?
..maybe.  But more damage in the long run?

Need to make businesses move away from using peoples' data.   Like, we work for facebook.  our data, not theirs.

Data protection:
by law, data subject has rights to know logic involved
as long as it doesn't affect trade secrets

Privacy implications
'Anonymous' data can be used to identify people.  See AOL search database fail.
When can datasets go public?
Weather data can be personal data (no time for example..)

Michael Hörz - Open Data in Local Journalism

Journalists expect everything
- spending
- political decisions
  (district levels, searchable)
- quality (schools, pollution, food)
- real time sensors (air quality, traffic, energy)

- Open Data Paris (loads, on a map)
- Locrating (school performance on map, UK)
- Chicago bike crash reports (map) (sort by injury, date, day; data all open from Chicago Transport Authority, in a nice format).
- LA Times LAFD (fire dept.) response times.

- Airplane noise map, (Journalists had a PDF, eugh).  One of the first Berlin interactive visualisations.
- Berlin election.  Was real time.  Down to the polling station.
- Berlin bicycle accidents '11 - came from massive PDF (3,800+ cases)

- Wishes for xls or csv... wants directly processable.  WHY NOT RDF?!
- ..or APIS.  WHY NOT LD?!
Reality = PDFs, requests ignored, data incomplete or hidden.  Hard to get for journalists.
All datasets are interesting and should be out there.  In Berlin often only one or two districts are available, which is no good.


It's not always straightforward just to release data - need priorities; raw data/API documentation isn't always available straight away.

Why PDFs?  They don't know any better.  Need to make people aware.

Consequences of public seeing data they're not used to?  Panic?  Or activism?  Pressure politicians for change.  Empowers people.

Is there are resistance to making data available (eg. Italy - data there but useless).  Maybe, or maybe they just don't realise [it's useless].

Prof. Felix Sasaki - Linked Open Data @ W3C-Vocabularies, Working Groups, Usage Scenarios

== first half of MASWS. 

New work on LD 

Media fragments - spec finalised 

Ontology for Media Resources (DC for video and audio?) 

Internationalization Tag Set 2.0

SW core is stable, so work with vocabs now. Need interoperability. Decide: - Syntax - - Microdata not necessarily for SEO - -;; very basic schemas with increasing numbers of more specialised extensions. Discussion at - 

Application scenarios.

Organisation ontology - Membership and reporting structure, location information, organisational history - Interoperable organisations - 'Final call' stage - nearly done. Need feedback.

DCAT (interoperability between data catalogues) - Uses FOAF, DC, SKOS draft Namespace neutrality -

Language graph of the Web is cool.

Tomáš Knap - Tracking Data Provenance of the Published (Linked) Open Data
watch film

Defines provenance and agents, artifacts, processes. Provenance useful for data integration. Which is right/recent etc. 

How to cite. Vocabularies: PROV-O (almost w3c final, w3/ns/prov), VoiD (datasets w3/TR/void), FOAF, DC ODCleanStore - (prov aware storage, processing, querying) - Write rules/queries using web front end. Certain automation from inserting ontologies. Still manual work.

LOD2 WP9a - EU project LD tools

Maria Magdalena Theisen - Open Data and Big Data

Big Data - have to ask questions to understand what questions to ask. Consists of volume, velocity, variety. Can't say that all open data is big data, and vice versa.

Some BD from external social media, disaster information, sensors, smart meter. Lots of things bringing data to process.

Facebook has largest data collection by 2010.

Open and Big - Eye on Earth - air watch (over 1k stations in Europe providing live data), noise watch, water watch (static historical data) - can rate quality of data and give attributes

Cloud computing is an enabler for B and OD - Don't have to manage servers to provide data. - Flexibility and scalability - Interoperability with existing infrastructure - Easy access to data - Development platform (Azure) - can enter an app with open data into marketplace. Lots of examples.

Is Azure marketplace integrated with ckan? - No.

Evanela Lapi - Building Sustainable Open Data Platforms

Understand stakeholders

* Consumers
* Developers
* Citizens
* Less technical, can use open data to help with life

* Journalists, scientists, researchers
* First two more critical
* disseminate data
* Need open, standards-based, non-proprietary formats.  Easy to download/browse/search/redistribute/share.

* Publishers
* Provide transparency
* Want a cost-effective, easy solution platform
* Public sector has lots of data not online - because it's hard to publish?
* lots of friction, fragmentation

Socrata - end to end, custom solution.  Many implementations in US, Kenya.


Integrated, loosely-coupled - existing SW, eg. CKAN + Drupal (
Faunhofer OD platform is Java (Amsterdam uses)

Open Cities
- open innovation (see last time this was mentioned)
- on Github - get feedback from use.

Virtuoso triplestore + Liferay CMS + CKAN catalogue
(Java wrappers for REST APIs)

User roles:
  • Data owner
    • Publish
    • Maintain
    • Bulk upload
  • Platform user
    • Query
    • Discuss
    • Search
    • Browse
    • Download
    • Propose new - 137 datasets in 18 categories - 22 datasets

It's a good start, but still not enough - why?
  • Too much manual work, redundancy across different platforms.
    • Modernise environment - by modular, high level stuff?  (I think that's what she said)
"Germany isn't much into OD yet.."

Oliver Adamczak - Big Data for Smarter Cities

Leaders must innovate to exceed citizen expectations.
Functionality of BD - use variety and volume to innovate.
Vision - do things you haven't thought of before.

I should do a survey of open data about arts/media?  It's all about gov/science.

IBM BD platform

Hadoop to store
- low cost (open source)
- scaleable
- easy to load data - don't have to care about structure until afterwards

Text analytics to read PDFs etc. and extract data with context.

Streaming data is important
- Not for repo, just use/analyse and discard.

Monday, January 14, 2013

Dynamic Web Design

For the second year, I'm tutoring PHP, MySQL, HTML, CSS and JavaScript to MSc Design and Digital Media students for the Dynamic Web Design course in Edinburgh College of Art.

Sometimes this makes me feel like a wizard.

Digital Media Studio Project (DMSP)

I'm supervising (created the brief, overseeing and guiding a group of 6 MSc students, grading) a Digital Media Studio Project in the Edinburgh College of Art.

I challenged my lot to use responsive web design techniques in a unique and inventive way.  You can see what they're up to here.

Wednesday, January 09, 2013

Morrissquirrel (crochet)

As a farewell present to Beth, who was vacating sunny Scotland for the harsh and unforgiving shores of the US, I made a squirrel.  But not just any squirrel.

A Morrissquirrel.

That's Morrissey-squirrel.  Don't question it.

I largely followed this pattern for the normal squirrel parts, then improvised to Moz it up.