Sneak peek at data.gov.uk

Are you a coder in the UK? Do you fancy tinkering around with government data for the potential good of the public?

Here’s an early Christmas present.

Visit data.gov.uk and it will bounce you on to a Google Group. Request to join the group and introduce yourself. You might get access to the developer preview of the site – like I just did!

This might be old news to some as it seems to have gone live on 30th September according to the site blog. I guess I assumed it was for “special” people, so that’s a lesson learned. If you see an interesting door, knock it.

I’m still finding my way around. There are 113 datasets including census data, ASBOs, air quality, crime, fear of crime, work, health, motoring, (un)employment, police and it goes on.

I think at least some of this data has been already available in different places. But having it linked to from one place is a good idea.

Pick a subject and there are always people out there who are cleverer than you. That’s another lesson, or rather, reminder I get from the web. Whatever you think about the UK government, inviting people to build things with a resource like this is at least a way of acting on that lesson.

The open invite encompasses not only a central website, but actually having exportable data formats and clear conditions for data re-use. I am not a lawyer but the terms and conditions seem fairly clear. Generally I don’t like the phrase “Crown Copyright” if it’s something that was gathered using public money, but it’s apparently there to ensure attribution and accurate use of the data. I thought copyright couldn’t be applied to numerical data? Perhaps someone out there can explain.

The preview is for load testing purposes, according to the welcome email:

Since the appointment of Sir Tim Berners-Lee and Professor Nigel Shadbolt in June we have been working on how to pull together a single point of access for government data. We’ve talked to a range of people, and looked at what others have done, and over the summer we have built a first version, with a combination of open source and re-using of existing facilities such as CKAN.

We would now like your help over the next weeks and months to make it better – and more useful for you as developers. We would like feedback on how the site should work, what developer support facilities and tools there would be useful to you, and what further data should be freed up for re-use.

We’d also be interested in your ideas on how the data can be used – and if you can build some more great applications with what is available now this will help the drive to free up more over the months ahead.

At this preview stage, we are manually approving membership requests so that we check how the load on our server scales as we ramp up our service.

I’m currently looking at census data for Welsh language ability, collected during the 2001 census. It might lead to ideas for our Hacio’r Iaith event.

Thoughts for Wales’ new Cross-Party Digital Group at the National Assembly

I went to a public meeting at the Assembly buildings in Cardiff last night, which was a chance to meet Wales’ new Cross-Party Digital Group and have a discussion to answer the question:

“How can we make better use of new media and digital technology to engage with the people of Wales?”.

The members of the group are the Assembly members Alun Cairns, Peter Black, Alun Davies and Bethan Jenkins. Not all of them could actually make it but as an intro conversation with q&a it was worth attending.

Foomandoonian has blogged last night’s line-up (with representatives from Google UK, Oxfam GB, MessageSpace and chaired by Rory Cellan-Jones). He’s also blogged some of his highlights of what the guests said.

Below are some of my thoughts. I did raise my hand and ask a question about monitoring blogs and other social media. I also filled in a feedback form (on good old paper). Maybe I can explain/expand on them here. I’m offering them from my perspective and as someone who works with the web and is interested in seeing Wales do well.

Getting attendees at the meetings. I get the impression there will be more of these meetings. “Engagement” is a popular word to drop in, how can we actually do this? Well, for what it’s worth I only heard about the meeting because a friend emailed the details to me. Otherwise I would have missed it. So next time please put a page on the Assembly site about this meeting. Then it can be found by Google and we can send it around by email, blogs, Twitter, Facebook and all the other various networks. Whatever existing publicity there was worked well because there were about 80 people in attendance, give or take. This was described as a good “turn-out”. It seemed that most of these people were middle-aged white men in suits. It’s just an observation, there were exceptions of course and it’s actually a good start. But not to have an open page (none that I could find anyway) on the web about a digital Wales meeting is missing a trick. Get a bigger room next time because we’ll be spreading the word!

The meeting itself needs to be more open. The meeting is already somewhat open because people are blogging about it and some people were using Twitter with the hashtag #digitalwales. Please record it next time. Just the audio will probably be fine, with a roving mic for questions. Document the whole thing. That is engagement because you can upload it somewhere and make the best use of public money. It’s a public meeting, so make it as public as possible! There’s no reason why time and space should prevent people from at least hearing meetings anymore. Someone in Pen Llŷn will thank you. I recently wrote about recording meetings on my Native blog.

Bill it as open. This is a public meeting. If people want a private discussion with the AMs involved, there are already lots of ways (email, phone, face-to-face). So just make it clear to attendees that things will be recorded and for the benefit of everyone in Wales.

Let’s have a discussion about open data in general. The USA house their open public data at data.gov and the UK are not too far behind with data.gov.uk (not launched yet but currently on a developer preview). On the feedback form I suggested mySociety as guests for a future meeting because they’re probably the UK’s leading experts on making tools that use political and other data to benefit the public. Good to see that this is what they’re planning. There are almost definitely opportunities for Wales in open public data. By that I mean business opportunities that create employment and projects which help communities – as well as ways to understand the viewpoints, hold politicians to account and run a proper democracy.

But we would like Assembly data in standardised formats please. The online transcriptions of AM speeches are a bit disorganised at the moment. If the Assembly exists to serve Wales, then one way to achieve that is to make them machine-readable. Ideally this would be XML format, but it doesn’t actually matter as long as it’s consistent all the way and the original language and translation are clearly indicated (English and Cymraeg). Then all kinds of things become possible. A good example is the volunteer project They Work For You, which has a search engine for parliamentary discussions and related functions. It has UK parliament, the Scottish Parliament and the Northern Ireland Assembly – but it’s missing the Welsh Assembly. I’ve written about the need for They Work For You to index Welsh Assembly discussions before. It’s been discussed on a mySociety mailing list and we welcome all coders! But the main point is NOT to raise the specific issue of They Work For You because it’s a volunteer project and only one of many possible applications. The point is making it as easy as possible for citizens to use the data.

Good broadband access. There was some discussion of this last night. I don’t know much about the situation elsewhere in Wales, other than that it’s important. Broadband is infrastructure, like railways. In the past, the railways moved coal and steel. Now we also move information, at much higher speeds. As with any infrastructure, it requires good usage – there is no magical transformation. But it does increase the possible ways people can communicate, learn and work.

Let us see the political process. So much of the discussion at the Assembly and the Assembly Government is private and it doesn’t need to be. I suspect it’s private because people tend to rely too much on email and waste opportunities to “engage”. The question for ministers and staff should always be: does this NEED to be private? If yes, then use a private method like email. If no (could even be the majority of cases) then quickly upload/publish it somehow (blog or wiki or some other tool). Now email the link to people. Thanks, you just opened up the political process! Don’t spin or polish the posts. We’ll vote for you if you’re honest and you communicate.

Some quick notes on tools. Posterous is a good tool because you blog by sending it an email. It’s not the only one with this feature but it’s quick, easy and free of charge. Facebook is OK here but be careful. By default your personal profile is not open – it’s halfway between private and public. My suggestion here would be upload/publish on an open platform (blog etc.) then post a link to Facebook for your friends and supporters.

I could have emailed my thoughts here to somebody. But I blogged them instead. Now anyone can email the link to anyone – or link from anywhere. It doesn’t mean they will but it allows it. People can also find it via search. I’d like to see this model in action.

Your blog posts don’t have to be as long as mine! Preferably they would have a name and a face next to them, not a logo.

Search is key. An AM should probably monitor (or have someone monitor) mentions of their name and issues they care about. Google Alerts are OK, but RSS is probably better. It’s not “ego searching” to look for your name. It’s… that engagement we keep talking about.

Thanks for reading, comments are open.

Blogging about Welsh politics

I’m going to be writing more about politics on this blog.

My interest is how politics might relate to technology, business and “ordinary” people in the UK – with a particular emphasis on Wales.

As a personal rule I try and stay away from the various personalities and day-to-day machinations, allegiances, squabblings, who wore what clothes and so on.

More generally, I’m not even a party political blogger.

Some of those things can be important (and entertaining), but they’re not what I specialise in. If you want to read that stuff it is available online.

I’ll carry on writing about the stuff I otherwise write about. Quixotic Quisling is deliberately an “anti-brand” which can contain anything I want for the next x years. Sometimes things converge into sense as you go along, if you know what I mean.

Don’t hold me to ANY of these things either. Any or all of them might change at some point. It’s my blog.

Now I’ve got the disclaimers out the way, on with the next post!

Welsh Assembly Government bundles of RSS feeds

The Welsh Assembly Government generates a lot of its own news.

The news is available as separate RSS feeds for 22 different topics, which is good. Actually, double that because there are 22 in English and the same 22 in Cymraeg.

This week I wanted to subscribe to a complete feed of everything, but I couldn’t find one listed on the site, in either language – which is not so good. So I made two feeds myself with Yahoo Pipes.

Welsh Assembly Government RSS feed, every topic (English)

Llywodraeth Cynulliad Cymru porthiant RSS, pob pwnc (Cymraeg)

Let me know if you do anything with these feeds. Anything at all. Even if it’s just a word cloud or something.

Unfortunately, on that note, they’re not complete feeds just headlines with a one-line description. (That’s all I’m getting from the 22 original feeds.) That’s fine for subscribing in your feed reader, it’s just an extra click per item to reach the full web page. But if you want to do anything else it’s restrictive.

You could probably make a more advanced pair of feeds which included the full page data from the site. Clone and modify my English pipe source and Cymraeg pipe source if you want.

Hacking or faking a wiki history for good purposes

I want to utterly hack the wiki format because I don’t think it’s been fully explored.

I’d like wiki software into which I can manually insert fake edits. I’d like to write the history in arbitrary order and set the dates myself. (Usually the dates are automatic.) I love the history!

Why? The history is a really useful way of representing the progression of a document.

Here’s one application. Lots of documents change and it might be useful to show their development in this fashion. In the UK, bills are discussed in parliament, they are edited then they sometimes become acts which are the basis of law. Very few normal people actually follow the process. A wiki-style history might help their understanding.

There’s a similar process at the Welsh Assembly and we certainly need help understanding what happens there.

There are also famous documents like the USA constitution which might be fun or historically interesting to represent in a wiki fashion. Imagine being able to see prohibition as a literal 18th amendment to the wiki and it being repealed by the 21st amendment.

As well as the democracy stuff, there might be journalistic applications of something like this. Representing important documents in different time-based ways.

This idea strikes me as somewhat “obvious”. (It was inspired by a comment in a video interview with Matt Mullenweg about open source.) Has it been done before?

I might have a go, there are many open source wiki software systems. For instance MediaWiki or DokuWiki could be adapted to do this. There are also document comparison programs, maybe I just need to do it as a set of documents which can be compared.

Maybe this intersects with what Google Wave can do, I haven’t tried it yet.

I use Google Docs every day now and it’s obvious that that has borrowed heavily from wikis. I’d struggle to go back to emailing attachments back and forth.

Google have two products called wiki – SearchWiki and Sidewiki – and neither of them are really wikis! But Google Docs are proper wikis. If you haven’t tried Google Docs, try it.

I’m thinking of other documents that change over time, which could be wikified. Like chessboards and images of your dog’s face.

My own face is a wiki edited by time. My body is a wiki, edited by beer and curry.

The evolving blog: things that resemble blogging

This loosely follows on from the previous post about Twitter being a variant of blogging. Incidentally, normal service on this blog may be resumed at some point or possibly never. Anyway.

Sometimes I think almost EVERY form of publishing in social media can be considered a form of blogging. Is everything here blogging?

On Flickr, for example, you upload images which have dates and tags. YouTube and other video sharing sites let you upload video, again with dates and tags. There are subscription options in these too – you add people on Flickr and you subscribe to channels in YouTube. There are variants on other video sites. These “content services” also have feeds of course. They don’t look exactly like blogs but I’m saying the default view you get is incidental to this concept of them being about blogging. Of course, the default display of a blog is incidental. You could take feeds or content from any blog or set of blogs and display them in aggregate in a multitude of ways. The point is, all are about time-based publishing which is essentially all a blog is.

Facebook is like a huge group blog. The newest thing is at the top. Posting a status or whatever is obviously like doing a blog post, but almost everything else you do is subscription. Clicking Like for something is subscription. Writing a comment on a post is a form of subscription. Becoming a fan of a page is subscription. Responding to an event is subscription. And of course, adding a friend is a subscription. It can only be two-way, symmetrical. I tell people Facebook is weirder than blogging and Twitter because of the privacy stuff. There’s a grey area between private and public, but let’s forget about those aspects for now. Facebook is a huge group blog. The things that are slightly annoying on Facebook are the non-bloggy things, mainly the private inboxes. There’s your inbox for requests and your inbox for direct messages. Another thing, if you don’t respond to an event you are automatically subscribed to receive direct messages about that event. That’s annoying because automatic subscription to anything is not bloggy.

Stretching this even further – and this is highly provisional now – maybe a wiki page can be considered a form of blog. The time-based element is most apparent if you look at the history page. This page shows all the edits that have taken place. It looks like a blog, except that instead of different posts it’s the same post being refined over time by multiple authors. And of course there’s a feed of this history too.

Or, the other way around, maybe a blog can be considered a history for its AUTHOR. The author is a biological wiki changing over time! Changes are occurring in the author’s mind and each post is a snapshot in time. So each blog post is a wiki edit. Or at least an indication of one. (If you comment on my blog, I will read it and you will edit me slightly. And the potental future of the blog will change. Have fun.)

Starting an open content service like Twitter, YouTube or Facebook looks like so much fun. I would do it differently to those guys, natch. If I were starting such a service I would look at blogging in detail for which features I could borrow. This often happens subconciously as people have absorbed the customs and features of blogging. Maybe I could start by adapting an old UNIX command.

I’m abstracting features of software here. When I studied Computer Science, I went to a lecture about “computing in the real world” delivered by a software consultant. He said that he’d been asked to work with a prison for their database of inmates. Should they pay to develop an expensive new database system for the prison, from scratch? In a stroke of inspiration, he suggested they just adapt an existing hotel booking system. A prison is a hotel, except if you’re staying you can’t decide when you’re going to leave. On an abstract level, that’s the only functional difference. Inmates are guests.

That observation has always stuck with me and I’ve always tried to look at problems in a similar way.

Of course, not everything is blogging. Now go and eat your tea.

The evolving blog: Twitter as microblogging

Veteran blogger Meg Pickard wrote an insightful post last month about how the adoption of Twitter has mirrored that of blogging before it.

Twitter the company never describe their service as “microblogging”. That’s a smart move from the viewpoint of marketing the service to people who might have preconceived ideas about blogging. But mainly, it probably helps each user and the communities represented to be unconstrained and perhaps more creative in the way they actually use it as a medium.

Twitter feels like blogging at reduced friction. Each tweet (blog post) is tiny and you can type it quickly, on the go. They are also quicker to read than macroblog posts.

So Twitter could be fairly accurately described as microblogging. Some of the Twitter observations Pickard makes are accelerated in comparison to blogging.

People write more posts (tweets) than on a long-form macroblog – in my experience. The “half-life” of conversations is reduced. There’s probably a whole bunch of research someone could do on that if they wanted. (And I’m not talking about the paper where they dismissed 40% of Twitter as “babble”. I think that totally missed the point.)

So I wanted to expand on Pickard’s post and draw more connections between blogging and Twitter, between macroblogging and microblogging if you will. Some of this will apply to Identi.ca and other microblogging services. But I think Twitter’s larger user base makes it a bigger playground for this stuff.

The post
Let’s start with the obvious. A tweet is a blog post. Your tweets are organised by time, with newest at the top. Apart from that you can write anything you like. Same, same.

Following
Following is subscribing. Again, there’s less friction on Twitter because it happens in fewer clicks.

The client
Your Twitter client is your feed reader. The default web client is just a web-based feed reader. You get everyone you’re following aggregated together. But it can also be set to a single blog (a single person’s Twitter timeline).

URL and feeds
Your blog has a HTML version and it also has an RSS or Atom feed. Twitter feels like it has feeds but they’re invisible, they’re simulated by API calls. What I mean is, when you click Follow you’re not made aware of what happened in the background, it’s a black box. Whereas when reading blogs there is a URL to a feed which you subscribe to. (Although every Twitter account has a bona fide RSS feed as well.) Also, because Twitter and other services have emphasised real time there are efforts to make blog feeds real time. Twitter, in turn, is influencing technologies that were established before.)

Replies
Replies on Twitter are like blog pingbacks. They notify @someone that you made a response to their post. But unlike blogs, the “pingback” of a Twitter reply is not visible to onlookers reading the original tweet.

Tags and categories
The counterpart of blog post metadata – tags and categories – is the Twitter hashtag, which was deliberately introduced by a user and then popularised. The Hashtags website is what Technorati is for macroblogs (or rather used to be).

Retweet
Retweets, usually written as “RT @someone” or “via @someone”, are ostensibly about acknowledging a source. They’re a somewhat strange byproduct of Twitter’s lack of a quick way to link to, and read, another tweet. For programmers, it’s analogous to passing by value instead of passing by reference. They’re not native to Twitter at the time of writing.

Suggested user list
When someone joins Twitter now, the site suggests accounts for you to follow. This helps new users to get started and see how it’s being used. But it also offers a huge boost and arguably an unfair advantage to companies and individuals represented. It’s an editorial decision made by Twitter staff, one of the very few such decisions on a service which is mostly neutral – which to some “feels” wrong. There’s no equivalent on the blogosphere, which is sustained by a network and not hosted by a single provider. If Twitter the company want to be seen as fair, maybe they should behave like the blogosphere.

Blogrolls
In the early years of blogging, a blogger would have a “blogroll” which is a list of links to their favourite blogs. These seem to have faded in importance and usage as blogging has popularised. But during the growth of the new medium, they were useful for people navigating the blogosphere and finding other bloggers to subscribe to. Blog rolls were also about giving kudos and link juice. The earliest form of blogroll I have noticed on Twitter is the #followfriday tag, where people suggest accounts worth following.

Twitter list feature (new!)
The new Twitter list feature is a bit like a blogroll. It can be seen as a public endorsement of certain accounts and also a way of giving kudos. You can have up to 20 different lists, e.g. colleagues, bands, journalists, people in my hometown – which is similar to blogrolls that have categories. With Twitter, the emphasis seems to be on usefulness to the compiler of the lists, with the openness and kudos as byproducts. Like blogrolls, the lists help to grow the network by helping people navigate. Twitter lists can also be likened to OPML files, which are bundles of links to RSS feeds. In other words, an OPML file is a blogroll in a file.

Besides Twitter has always had lists. Each account has a grand list of all the people you’re following and it’s public. So the list of people you’re following is a blogroll. Albeit massive and context-blobby.

I think I’ve talked about Twitter as microblogging in enough detail now.

The origins of words, with Sioned Stryd-Cludydd

Mostly, what comes from the mouth of Janet Street-Porter is total bum gravy. This is no exception.

“We had a Welsh-speaking budgie. My mother missed Wales very much. I don’t feel Welsh at all. There’s no Welsh words for anything modern.”

Street-Porter is one of those people who enjoys a level of media coverage disproportionate to her level of ability or insight. (Incidentally people like this are certainly not worth protesting against, don’t waste your time. Maybe a quick throwaway blog post though…)

It did make me think how someone can really struggle if they attempt to pass comment on things they know very little about. And I figured, it’s at least a good chance for me to learn more Welsh words.

So if you have any good modern words, feel free to comment. And together let’s make a page on the INTERNET!

Modern means anything of, relating to, or characteristic of the present or the immediate past. But no use being ultra-strict about it.

Here are some modern Welsh words, each with an English translation.

ailgoedwigo (reafforestation)
ailgylchu (recycling)
amser real (real time)
biohinsoddeg (bioclimatology)
blogiwr (blogger)
chwyddo mewn (zoom in)
cludadwyedd data (data portability)
cnewyllyn (kernel)
cronfa ddata (database)
cyfalaf menter (venture capital)
cyfieithu peirianyddol (machine translation)
cywasgu data (data compression)
datganoli (devolution)
diagram Venn (Venn diagram)
dirwasgiad (recession)
gallu i ryngweithredu (interoperability)
gwefan (website)
meddalwedd (software)
porthiant RSS (RSS feed)
rhesymeg Boole (Boolean logic)
sebon dogfennol (docusoap)
siocled (chocolate)
system weithredu (operating system)
teledu (television)
tewdra (obesity)
troseddwr rhyfel (war criminal)
unben ffasgaidd (fascist dictator)
weldiad bôn (butt weld)

You might sometimes notice the Latin root of some words. Welsh has incorporated words from Latin for many, many centuries, just as English has done with Latin, Greek and French. Seemingly “civilised” Welsh words, particularly certain legal concepts which might be assumed to derive from Latin, can often date from pre-Roman times. Read John Davies A History Of Wales!

Globalisation can sometimes result in many different languages all adopting the same, or a similar, word for something. I’m thinking of “chocolate” in different languages, as well as “blog”, “wiki” and so on.

I heard that teledu was the result of a magazine competition to find a suitable word when it was a new technology (is this true?). It’s based on darlledu (broadcast). Of course, the English word “television” was mocked when it emerged for being half-Greek and half-Latin. And I now mock modern attempts to coin English words like “staycation“, which just catch on anyway.

New Welsh words are frequently invented of course, just as English ones are.

Language is, in the words of George Orwell, “an instrument which we shape for our own purposes”.

Sources: Geiriadur BBC, Geiriadur Llanbedr, Termiadur

What is Hacio’r Iaith?

Hacio’r Iaith is a new and exciting event where we will explore how technology applies to, around and through the Welsh language. That means idea sharing, APIs, mash-ups, localisation, machine translation and so on. The event will be part hack day and part BarCamp (both are well established templates for events worldwide). There will be stuff for beginners as well as geeks. Our pencilled date for Hacio’r Iaith is Saturday 30th January 2010 in Aberystwyth, which is to be confirmed. (I’ll update this post if that changes.) Entry will be very cheap. In Welsh, “yr iaith” means “the language” and “Hacio’r Iaith” means “hack the language”.