Map i Gymru: building an OpenStreetMap in Welsh

I’ve cross-posted this post on Open Data Institute Cardiff blog as well. Thanks to David Wyn Williams for his invaluable help on the post.

The draft map

Have a peek at this map of Wales, with place names in Welsh.

https://openstreetmap.cymru

Many people have never seen place names in Welsh such as Aberteifi, Treffynnon or Aberdaugleddau on an online map – or indeed any map.

These names have been used for many generations until the present day, from conversations to road signs to media. The Welsh-language Wikipedia, known to its users as Wicipedia Cymraeg, has articles bearing these names.

Nevertheless they are not usually offered or recognised by the well known proprietary map providers.

In order to build a map in Wales’ own language we at the project have drawn from freely licensed OpenStreetMap data, server software, and documentation. These are all the work of many contributors around the world, and to these people we are very grateful. We are also very thankful to the Welsh Language Unit of the Welsh Government who have funded this early work.

Building on the map

This is a draft map running on a prototype server. It gives you the ability to pan and zoom. As the developer on this project I am very pleased with the results so far.

I will introduce another feature very soon – the ability to embed this map on any website.

Nevertheless you might spot omissions or glitches while it’s being developed, and some big areas for functional improvement.

As I write this we have received a bundle of very useful place name data from the office of the Welsh Language Commissioner, which is itself the fruit of years of dedicated work. This is comprehensive down to the level of villages, and licensed under OGL.

Improving the data

This section contains background if you are interested in improving OpenStreetMap place names and other data.

Imports of the OSM data happen automatically overnight. Some pre-rendering of map tiles is also done, to speed things up.

The ideal OSM data set for place names in Welsh would have a name:cy tag for every single item. We are not there yet.

In the meantime my system uses name:cy tags and some name tags.

name:cy has highest precedence. If you want to add a definitive name in Welsh to anything, edit the map on osm.org and add a name:cy tag. You will need to create a user account if you don’t already have one. Provided your submission is accepted by the community this will guarantee its inclusion on the next nightly update.

Many name:cy tags already exist.

The challenge with the existing data is that some names that we want to use are currently only available from the name tag. That is, many places do not have a name:cy tag.

Understandably OSM contributors haven’t tended to add an identical name:cy tag for Morfa Nefyn, Abersoch, and hundreds of other villages and places.

I’ve tried rendering different versions of the map using different criteria. Enabling all name tags somewhat ruins the ethos and magic of having a map in Welsh. Then huge tracts of Wales vanished when I removed the name tags again!

So I have set the system to use name for these types of places only:

  • ‘village’
  • ‘hamlet’
  • ‘town’
  • ‘island’
  • ‘neighbourhood’
  • ‘square’
  • ‘farm’
  • ‘isolated_dwelling’
  • ‘locality’

For other elements I also have a white list and black list, e.g. ‘Ysgol’, ‘Capel’ and ‘Eglwys’ are on the white list, among others. We will tend to want names containing those words.

name:cy currently overrides all of this however. Do please add name:cy tags via osm.org if you spot errors or gaps, and they will also be available to other projects around the world.

Use and applications in the near future

What you see now is just one possible app that uses the underlying map infrastructure to show a map of Wales.

Having a map like this introduces many exciting possibilities in:

  • learning
  • exploration
  • navigation
  • play
  • research
  • communication

Cof y Cwmwd: wiki website about Uwchgwyrfai’s history

Here’s an item in Welsh for the TV programme Heno about Cof y Cwmwd, a new multi-author website about the Uwchgwyrfai area.

The purpose of the site is to collect and share historical information about the area, its institutions and people.

As web developer I have been working with Canolfan Hanes Uwchgwyrfai on this site, which is powered by MediaWiki server software.

People are now contributing pics and articles to the site.

Activity has increased today during the first Golygathon (Editathon) on the wiki, a community event to stimulate contributions. The Golygathon is headed by Jason Evans of the National Library of Wales, who is the Wicipediwr Preswyl (Wikipedian in Residence) and knows a great deal about growing wikis!

It’ll be very interesting to see how projects like Wikipedia and the new website Cof y Cwmwd share content between them in the future.

Pic of the Golygathon by Jason Evans

Bilingual websites and multilingual projects in WordPress

WordPress translation system listing various languages

People often ask me about the best way to create and manage a truly bilingual or multilingual website. This is a common need in many contexts around the world.

Usually any given website, or section of a website, is on a spectrum of multilingual availability.

For example – and to take an extreme case- Wikipedia’s various language projects all have the same underlying software but are maintained completely separately by their communities, albeit with a certain amount of adaptation and translation flowing between them.

At the other end of the spectrum is a public body which publishes two or more language versions of every piece of content.

Translation is sometimes seen as a way of fulfilling this requirement, increasingly with a translation memory. Saying that, translation is certainly not the only way and may not even be always the best way.

Somewhere in between is a sort of hybrid website which publishes all ‘publicity’ material multilingually but gives the blog(s) and ‘human voice’ content for each language its own independent life. Let me know if you’ve seen an example of this being done successfully though.

My personal record, if I can put it that way, is a website serving four languages – German, Norwegian, English and Dutch, for a European theatre project which I co-developed in WordPress for a client in London some years ago.

Many models and forms of multilingualism are technically possible and implemented around the world.

It often pains me to see organisations offering a below par experience of multilingualism. This should be core to any discussion of user experience, and worth investing in to get right. There are plenty of examples of excellent practice, and there is help available!

Outside the world of organisations I’ve just added some new functionality and a new work section to my own website, morris.cymru.

This website originally started under another name in 2008 for various musings and thoughts. Over time I’ve switched languages a bit (English and Welsh), and also gradually had a need to share more work-oriented projects.

I have retained my nine years of blog post archives, and have added code and settings to recover gracefully when a historical blog post is available in one language and not another.

Here is the technical background. I use WordPress.org. Regardless of the multilingual plugin you use you need language files for WordPress core, the theme, plugins as well as text for widgets, menus, categories, and more. The QTranslate X plugin, which is in my opinion currently the best (apart from the erroneous use of flags for languages), automates much of this searching for language files, when they are available. This plugin does require a whole load of configuration.

Please contact me if you’d like to discuss help on this!

@Wicipedia, an automatic Twitter bot

Check out @wicipedia which is an automatic Twitter bot I thoroughly enjoyed making.

This work was commissioned by Wikimedia UK as a means of increasing engagement with Wicipedia Cymraeg, the Welsh-language version of Wikipedia.

It is an automated Twitter account sharing:

  • ‘on this day’ historical events,
  • links to recent new articles within certain criteria of ‘interestingness’,
  • ideas for articles that anybody can create about notable women in order to contribute to fair gender representation on Wicipedia.

You’ll see all of these types of tweets on the account itself.

If you’re curious the account is run by a custom PHP script which runs on a server and performs calls to the Wikipedia API and Twitter API.

Björk pic by deep schismic (CC BY)

Scheduling tweets in bulk in advance (my way)

I’ve created my own system to tweet a row from this spreadsheet every day.

The name of the Twitter account is @fideobobdydd, a way of sharing high quality videos in Welsh from various genres. Social media like Twitter have the potential to find audiences for videos and vice-versa, when algorithmic search and recommendation sometimes (feel like they) put content in Welsh at a disadvantage. This is what I’d like to investigate, anyway.

At the moment it tweets once per day at a set time. It’s a proof of concept which could easily be extended or adapted.

Why create a system? I’ve tried Hootsuite, Buffer and similar systems but on the whole these are too cumbersome for me. On the spreadhset I can see half a month and move things around quickly. Working with others is easy because the spreadsheet is on Google Drive.

I doubt that Hootsuite as a company is losing sleep over it. It’s just my homebaked solution to a particular problem. 🙂

It would be possible to add other sources rather than relying on manual input of videos. At the moment if there is a gap on any given day, there is no tweet. I could create a long list of videos to post randomly as well as the spreadsheet list, or syndicate videos from a list of favourites or YouTube playlist, and so on. Of course other platforms like Facebook Video are also possible.

Here’s some technical info. This is a PHP script which talks to the Twitter API. Rather than use the Google Drive API I have done a speedier implementation of retrieving the content as CSV.

Diolch i Nwdls am y (cy)syniad gwreiddiol o Fideo Bob Dydd.

New domain name for my site

I’ve just changed my domain name here to morris.cymru

Every link to the old domain name (quixoticquisling.com) should forward automatically to the equivalent link on the new site..

Please let me know if you see any problems.

Rough notes on using social media in one’s second language

My emphasis on this blog has changed over the years. It’s interesting to read back over old posts where I documented my progress with Welsh. Later on I was pretty uninhibited about blogging through the medium of Welsh on here, as a means of practising and as a method of seeing more stuff in Welsh online. Although very beneficial for me I guess it’s uncommon to practise like that in public. There were/are quite a few non-standard grammatical formations in my posts as well. Or, in other words, mistakes.

A few people have asked me recently about my experiences using social media in Welsh as a second language – especially blogging. Someone was asking me today about the experience and challenges as part of her research project.

So here’s a copy of some notes I sent as I figured they might be of interest to people who read this blog.

In hindsight it took me a while to get to a standard where I thought I had anything to say. There are blogs out there where people practise the very basics – which is obviously fine – but I think I wanted to do something more expressive. I think writing to be understood (which was an aim) is a challenge. There were a whole load of things to accomplish before even considering that as an option. I think I considered it for a fair while before actually doing it.

That said, as opposed to blogging, tweeting in Welsh was something I started quite early on I think. You could liken it to a child gaining confidence in learning to speak (or walk, etc.).

The first thing I tried on a computer was emailing in Welsh – even just greetings and valediction around an email in English. I made loads of mistakes with that but it was a key learning experience.

One thing to mention is that Facebook is not always the friendliest place for practising Welsh. It’s common to receive comments from ‘friends’ who are not comfortable with seeing Welsh being used. For example people have said things like ‘did a cat walk on your keyboard?’, ‘that’s easy for you to say’ and also some quite blatant expressions of disdain/displeasure at seeing Welsh being used. It’s funny how very few people say ‘I don’t understand your message, would you be able to give me a translation or a summary in English please?’, which is surely a more courteous way of answering. I’ve heard that that these attitudes cause problems for people who are learning, particularly the more timid. I’m a pretty confident person but even I sometimes feel a bit gun shy about using Welsh on Facebook. Interface is irrelevant here – it’s about existing friend group and expectations. Maybe this point counters the hypothesis that fluent Welsh users are judgmental about informal Welsh and bratiaith. That is, in my experience by contrast it has been the non-Welsh speakers who cause problems.

Twitter is better for confidence because it’s not predominently based on offline relationships for many people. There is a lot of freedom and variation in the way it’s used. There is a more of a sense that people can experiment, be individual and also that those who don’t appreciate it should just unfollow. And then blogging offers the best feeling of a space you control yourself where anything goes, at least in my experience.

These notes are incomplete and are based on personal experiences rather than data.

Global Voices: imaging the Welsh-language web

I’ve written an article for Global Voices about the Welsh-language web:

[…] Recent research presented by the BBC at a media conference shows that of the time spent on the web by the average Welsh speaker, only 1% is on Welsh language content. We can assume that most of the remaining 99% of the time is taken up by English-language content. There are several factors behind these percentages, which form a contemporary story of linguistic domain loss.

Although the Welsh language web is large from an individual user’s perspective, it has relatively few resources available when compared with other languages. Even the Basque language, which statistically is in a comparable situation to Welsh in its homeland, is much more privileged on the web in its number and diversity of established websites and levels of participation.

Wales has comparatively fewer institutions that would view an increase in quality web content as an important part of their mission. There is perhaps an excessive reliance on voluntary efforts. Yet the cumulative amount of spare time at the disposal of volunteers, what the American writer Clay Shirky refers to as cognitive surplus, is also small. […]

Read more.

Standing on Isaac Newton’s shoulders without a copyright licence

Cambridge Digital Library launched an online collection of Isaac Newton’s papers this week. All I saw was praise for the project and lots of it. I think that’s appropriate, it’s a good project. But what about the rights?

As far as I can see, the Library are the custodians of the papers as precious three-century-old artefacts and that’s it. Newton’s intellectual legacy, ethically and legally, belongs to everyone. You don’t have to ask or pay to adapt a Shakespeare script (say) and perform it. You can do a low-budget production in a village hall or you can make a multi-million pound movie out of it. You could combine it with a poem from Dafydd ap Gwilym, if you want to. That’s the beauty of the public domain. It belongs to nobody – and everybody.

The same freedom applies to Isaac Newton’s works – or should. But I went to Principia and clicked the download image link and a licensing notice popped up. I don’t care if it’s Creative Commons, that’s not public domain.

I’m sharing the email below because I’m not a legal expert and maybe somebody out there can help me understand this. It bothers me when institutions seemingly ignore the public domain and try to enclose it. It’s a pity that the public domain can’t muster a front as big as that of the content industries.

Dear Cambridge University Library

Congratulations on your successful launch of the Cambridge Digital Library and the digital versions of Isaac Newton’s papers.

While I am grateful and pleased that you have released Newton’s works online, I note that you appear to claim ownership of copyright in the scanned images of these works. I also note that you are attempting to apply a Creative Commons licence which appears to contravene copyright law, as would any licence applied to work in the public domain.

I quote from the terms and conditions of your website:

The University is the owner or the licensee of all intellectual property rights in the site and in the material appearing on the site. The material includes but is not limited to works such as images, artwork, text, data, files, audio/visual clips, illustrations, designs and documentation (the ‘Content’) and the collection, arrangement and assembly of the Content. Those works are protected by copyright laws and treaties around the world. All such rights are reserved.

Copyright law began with the Statute of Anne towards the end of Newton’s life. It has now been superseded by the Copyright, Designs and Patents Act 1988 which states that written works (with very few exceptions) pass into the public domain 70 years from the year of the author’s death. In his case if we were to apply current copyright law the works would have passed into public domain on 1st January 1798.

I am sure you would not want to be seen to be attempting to restrict the use of works in the public domain, as if that were possible. If so please could you reword these statements on your website to clarify humanity’s common inheritance of Newton’s legacy.

Many thanks

Carl Morris

I’m waiting for a reply from the librarians.

If you were to re-use one of Newton’s papers –  say, print up a bunch of t-shirts with his writing on and sell them – I don’t think there would be enough to make a case if the Library were to dispute it. It seems they just took some scans of the pages without adding anything novel. I don’t want to be an arse and say I’m ungrateful for all the work they did in preserving the documents and digitising them. But let’s be clearer about the rights and freedoms here. Correct me if I’m wrong.

And if someone trying to own Newton’s stuff doesn’t sound bizarre enough for you have a look at the Crown’s perpetual ownership of the King James Bible.

UPDATE 22/12/2011: So, more people are talking about this…

UPDATE 6/2/2012: No reply to my email from Cambridge Digital Library as yet.