Friday, July 13, 2007

Proposal: localisation sure but enable languages first !!

The Pan African L10N had a conference in Morocco in February. I learned about it through a blog I read today. I am really happy with the progress that is reported on of more languages being supported. I am however coming more and more to the conclusion that there is a need for a stage before actual localisation that will provide a service to the bilingual people of a language.

At this stage, the support of a language is very much an all or nothing affair. There is a localisation or there is nothing. This is not how it needs to be. When a language is known to exist, the lowest level of support for that language is the acknowledgement that this language exists. This is currently not done, and I think it is a missed opportunity.

The first thing to consider is, what languages and linguistic entities exist and, how do you support this. This is a surprisingly complex question. Languages are recognised in the ISO 639 standard. There are several versions of the standard and not all languages have a script that is supported in Unicode. Even when a script is supported in Unicode, it does not mean the an associated font is available for a language. The consequence of these two points is that a subset is needed on computer. On the other hand the currently recognised versions of the ISO 639 do not recognise orthographies or dialects or other entities that make a difference to how documents are to be supported.

This is not an issue the organisations that develop and localise software want to tackle. For them this a distraction. Deciding what linguistic entities can be supported is something that is best addressed by one organisation that exists to deal with issues like these. The World Language Documentation Centre (WLDC) is that organisation. Through its association with Geolang and because its board of experts in many of the relevant fields, it is already in a prime position to the research that goes into the development of the ISO 639-6.

With the WLDC and Geolang able to provide researched and verified information about linguistic entities that can be safely supported, it is then up to the applications to at least acknowledge the existence and allow a user to create content in that language. As more information becomes available, spell checkers can be added specific to that linguistic entity. In this way slowly but surely the functionality grows without the need to first localise the application.

In a way this is a solution for a "chicken and egg" problem. This problem is solved when you think of it in an evolutionary way. First there was the egg, the support of the language, and then the chicken evolved, the localisation of the application.

Thanks,
GerardM

2 comments:

Dwayne said...

I'm not sure if I missed something in this entry as you seem to be proposing the thing that happened in Morocco.

To be honest you make this much more complicated then it really is. It is time consuming and you need experts but most people do not have any of the problems that you highlight.

The tragedy is when these simple steps are not followed and technology fails language.

But these are steps that need to happen. I see them as localisation enablers, without them you cannot localise. They are not moving in minority languages because it is only recently that things like Open Source began to provide technology for communities that are not economically powerful. Where the localisation industry is about money.

There are many points of reference with experts that you ignore. The Unicode Consortium, CLDR and others. These are some that are allowing people to improve their languages.

GerardM said...

Much of the localisation industry is about money and for many languages there is no money. Consider, the ISO 639-3 already includes some 7000 languages. It is therefore not feasible to expect that all languages will be localised any time soon.

When you write about these other organisations, you may find that we have people of many of these organisations on the WLDC board. The point of having them all together is that it gives us a unique point where issues can be addressed.

As to the Morocco conference, the first time I heard about it was in today's blog. In contrast to what you write, there are many people that suffer from the problems that I highlighted, the Neapolitan language for instance is not known to exist in much of the software base. When people can indicate that they a text is Neapolitan, the content can be marked as being Neapolitan. This is existing functionality, it does not need changes in the software and, it really is a precursor to localisation.

Thanks,
GerardM