You can use these questions to help kick off this discussion thread:
- What are the specific technical obstacles that can be addressed in order to better represent more languages online?
- What is “locale,” how is it used, and why is it important for promoting underrepresented languages online?
- How do I create a custom font or keyboard for my language?
- How can open source software localization be used to support the online use of underrepresented languages? How do I start translating?
- How can we develop the appropriate technical terminology? How can we create more dictionaries and spell checkers in underrepresented languages?
- How do we get our languages into Google, Facebook, Twitter?
Share your experiences, thoughts, ideas and questions by adding a comment below or replying to an existing comment!
For many languages, orthography can be a major barrier, as setting up computer systems to write special characters can require quite a bit of customizing. However, the Mayan languages we work with (Belize, Guatemala, and Mexico) use only characters easily reproducible with any standard keyboard without any necessary customization and very limited use of 'hot keys' or special key strokes.
In the history of Mayan language standardization, the choice of using a Latin orthography was deliberate, and I think it shows the forward-thinking of the Academy of Mayan Language's approach here, because it has meant that we don't have to deal at all with issues of fonts, keyboards, and orthography in Mayan languages.
We've been able to jump directly to thinking about software localizations, and customization solutions, which has been a great time saver.
The basic technical obstacle facing Vagahau Niue (Niue Language) is that we don’t have formal translations for technical words or terms especially for ICT.
To be able to go online with our Vagahau Niue, our people would need to translate these technical terms. The project we plan is to involve the global Niue community using the internet. Once the initial translation is done worldwide, the final word/term will be decided by our ‘matua’ or gatekeepers of our Vagahau Niue. From there we hope to use the translations for creating Spell Checkers, translating softwares, general use, etc.
The other obstacle we face is that only 1,200+ people reside on the island itself with more than 20,000 living permanently in New Zealand and all over the world. So we need to work together with everyone to capture our language in its true form.
There are already some of our people who use social media especially Facebook to express themselves in the Niue language. Some of this work is at www.facebook.com/vagahau.niue including others who strive to revive the interest in our Vagahau Niue amongst our Niue people.
We are lucky that our Vagahau Niue uses some of the English alphabet but also includes macrons. This makes it easier but translation of an English word would sometimes require a sentence.
So one of our main focus is to try and organise all Vagahau Niue advocates and practitioners to work collaboratively with each other. They are all doing a great job and they can continue to do what they do best. Our work involves using Internet and ICT as a catalyst for preserving Vagahau Niue. It is our hope that each individual and group will share their work as we have one goal, Vagahau Niue Preservation (Niue Language Preservation).
For a long time, the Cherokee Nation used its own font, which has since been made unicode-compatible. There were many different fonts for the Cherokee syllabary, which created a lot of problems because you could never guarantee that the person trying to read your text had the same font you did (in fact it was more likely that they didn't. I don't know a lot about unicode or the development of our unicode-compatible font, but that's been one thing that has really helped in allowing the use of the syllabary online. Also, whereas we used to have to use third-party software to generate Cherokee syllabary from English keyboards, we now have an agreement with Apple to include it as part of their operating system.
One major consideration for how to type in Cherokee has been how to go from the English characters on a standard American keyboard to Cherokee syllabary. We've settled upon two ways: In the first, the keys are simply re-mapped entirely, using the shift key to switch between sets of characters. (Because there are 85 syllabary characters, one keyboard is not enough) This is probably more fitting in the long run, as it is a uniquely Cherokee way to type that does not rely on the Roman alphabet as an intermediary and does not see the Cherokee syllabary as a representation for English syllables. The downside is that it requires the user to re-learn to type, and is probably less ideal for second language learners whose first language IS English. Hence, the second method is by mapping syllabary characters to sequences of keystrokes: "ga" for "Ꭶ," "tla" for "Ꮭ," and so forth. In a way, this is similar to a solution developed for typing Japanese hiragana & katakana on an English keyboard. Again though, it may not be intuitive for a young first-language Cherokee speaker.
A third way to represent Cherokee online has been the development of a texting menu that can be used on mobile devices such as iPhones & iPads. (No support for Android yet, but it looks like it's on its way.) I believe the most commonly typed characters pop up in a menu and they can be changed depending on what characters are required. This video (while quite long) gives an explanation of some of the developments that have been happening.
Literacy and problems of writing your language on a computer are obvious problems but on the whole, my experience is that people very quickly adapt (mangle, as some would have it) the way their own language is written to suit the format, usually by abandoning all diacritics, abbreviating stuff. Well, txt spk, you know what I mean ;)
I think the problem, at least for anyone not a teenager is that they generally don't consider using their language online for a variety of reasons but often, because it never occurred to them that someone might be out there willing to talk back in their language, because they're of the generation which prefers face to face communication and a general fear of unknow technology. It's relatively to say to a teenager "here's something called Twitter" and 10 minutes later they're sending junk messages. It's harder to take someone - thinking of my own mum here - who's 50 something and still prefers to send letters half way round the world or use the phone to even consider WHY she'd want to use it, never mind anything beyond that.
I think we need to be smarter about not just letting especially older people find our for themselves but, without being patronising, introducing them to new stuff like that. Putting it the other way round, if someone gave me a canoe and a paddle and told me to go to the other island to get some eggs, I'd be totally out of my depth. Twitter can be just as scary :(
I liked your metaphor of Twitter being as scary as paddling a canoe!
Now when it comes to writing a language - even if it's the language that you are most familiar with and feel most comfortable in -, that often still is an unfamiliar and uncomfortable medium. On the other hand, the technology is there to input language onto the internet orally. So, if e.g. written wikipedia articles are transferred into spoken form (cf. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Spoken_Wikipedia), shouldn't it be possible to start a new text in the oral medium first? That then should be possible even in under-represented languages without a standard writing system. I'm not aware of any programmes undertaking this on a regular and wide-spread basis but technologically, it should be possible, and imho, it's one of the necessary steps to significantly increase the online use of an under-represented language, namely to potentially engage the majority of speakers of an often predominantly oral society.
I'd love to hear of any language representation on the web which starts with oral input rather than the written medium.
Oliver, you make some great points. I think we view the computer as a primarily text-based medium because that is how we were exposed to it initially. However, that does not need to be the case for everyone.
I think of my grandfather, now 98. He did learn to use a computer as a senior citizen, starting from text. When he was about 90, he said to me, "I think I'm ready to learn email now," so I had to teach him the basics for getting online, sending and receiving messages. His first email was written by someone who grew up sending telegrams - a message to my uncle saying, "Steve. I now have email. Dad." Once I persuaded him that he was not paying by the letter, he found email to be a very useful way of keeping in touch with his social network, until his eyesight deteriorated to the point where reading a computer screen became too difficult for long letters. Now, however, he uses Skype on a regular basis. All he really needs is to be able to see the photo icon of the person he is calling, and the green "phone" icon next to their name. Text does not matter, and for what he is doing the language does not much matter either. Once he is connected to the person on the other end, he can speak English or Yiddish or any other language, without any concern for the language of his technology platform.
Now that I think about it, that's really not so much of a miracle. I'm old enough to remember telephones that had exactly 11 input options - the numbers 0 through 9, which were invoked by stopping a dial at a particular point in its rotation (kind of like an ipod), and an engage/disengage switch that was invoked by putting an integrated mouthpiece/earpiece on or off a cradle (similar to opening and closing the lid of a laptop). Once you picked up the mouthpiece and dialed your number, the phone was completely agnostic about whether you spoke English or Swahili or Maya.
I certainly think it is vital that the technology be localized to as many languages as possible. People use ICT for many reasons other than phone or video calls, and of course my grandfather benefits from being able to read the Skype and Windows menus in a language that he understands (even though it is difficult for him). However, inasmuch as the technology is simply a shell, people can communicate in any language once they are inside that shell (as long as they have the fonts and keyboards for typing, or the microphones and webcams and network access for talking). Perhaps one direction in which we can take this discussion is visualizing ways in which technology can be made independent of language - not localization as such, but delexicalization. If the technology shell does not rely on text, then it becomes much less of a barrier for people who speak any language to jump in and use it for their own purposes, without the initial necessity of localization projects. Such an endeavor would not work in many circumstances, eg you could not produce a spreadsheet program that did not rely on text, but it could be a goal for simple applications such as the 11-function telephone of old, or a camera, or a stripped-down email interface.
PS - If my contribution to this dialogue is delayed tomorrow, it will be due to my grandfather's weekly Skype video chat with his great-granddaughter, an event that forms a central focus of his week. December 6 will mark the day when my grandfather and my daughter will have cumulatively spent 100 years on this planet - a century in which we've seen remarkable transformations, such as those that enable a great-grandpa in the US from watching his great-granddaughter grow up via video chat on the other side of an ocean. What will it take until such an opportunity is available to great-grandparents in the Tanzanian villages where Oliver spends so much of his time?
Thanks, Martin, not least for the inspiring example of your grandfather communicating with your daughter.
To me, the independence of technology from a particular language has been reached sufficiently with the likes of Skype. I am probably more worried about the ephemeral nature of the spoken word, i.e. speakers of under-represented languages can use that technology but these uses serve temporary communication purposes "only" (I put this in quotes as obviously, that is of immeasurable value to the individuals involved, like your grandfather and daughter) while their language remains under-represented. If however, we could develop standard procedures to archive spoken words on the web like we do with written text, then even these oral communcations could be preserved and re-heard. I don't mean that necessarily for private conversations (too reminiscent of big brother) but for cultural items like proverb collections, or the recollection what life was like in a Tanzanian village several decades ago, or which cultural practices were still practised then (like iron smelting or now lost ceramic techniques). And if in a private conversation, your grandfather wanted to give an historical account in Yiddish, wouldn't it be great if there was a simple button in Skype to say e.g. "start archiving" which would then store the spoken words in an online repository?
If something like that existed already, I'd love to know!
I think that whole idea of 'representing' a language online needs to be investigated a little. We must take into account that while the Internet connectivity worldwide has grown exponentially, still only 30.2% of the total global population is Internet connected as of 2009 according to http://www.internetworldstats.com/stats.htm. Of this North America, Australasia and Europe enjoy 58% and above of total internet penetration compared to the rest of the world - Latin America, Middle East, Africa and Asia - lagging at 37% and far bellow. This is more commonly called the digital divide.
While it is short-sighted to analyse Internet penetration geographically according to continent or by some imaginary North/South divide (take North vs South Korea - the former is rated in the 20 countries with no Internet available while the latter ranks no.13 globally with a whooping 77% of the population connected) it does beg the question: how inclusive is the Internet to the cultural and linguistic diversity of humanity as a whole? I would like to venture that at present the internet itself - as medium that still excludes large parts of the human population - may be a barrier to better representing languages.
The fact that many languages are 'under-represented' may not be only due to the fact that they have not got up to speed with ICT lingo, or that new media technologies have not been customised into over 6000+ distinct languages yet (not to mention the growing number of pidgins and hybrids.) It also reflects very real socio-economic inequalities in the world today. So before we get too inspired by philanthropic notions of sending geeky breeds of ICT/anthropologists into remote communities with endangered languages to try and get them to start blogging in !Gan!ne, Lae or Urmi, we need to ask which technologies can most serve these communities with their most essential needs while help keep their language alive.
Here is where open-source mobile platforms like FrontlineSMS or Freedom Fone can play an important role. With more than 90% of the world population currently connected to mobile networks, taking advantage of both the text and audio elements of mobile technology provides a means to address literacy and language barriers while also interact with audiences that wouldn't otherwise have access to other media - let alone internet. As Martin well pointed out above, the phone is a medium that is completely neutral to the fact if you speak Swahili, Maya or Urdhu. It is an interactive medium, independent of language, that finds a natural habitat in societies that are still largely oral based. Also, with speech-to-text technology developing fast, phones will play an ever increasing role in being able to bridge literacy and language barriers.
We need to remember that all technology must exist to serve our growing and diverse needs as human beings not visa versa. And on a planet currently facing imminent environmental and ecological crisis, what is more necessary than the need to communicate through mediums that include and don't exclude from a global conversation.
Thank you for sharing these great tools, Nico! It's great to have you included in this conversation. I would be interested in reading your thoughts on the challenges that communities might face in implementing these tools for themselves. Would it be difficult for someone that does not speak the languages the FrontlineSMS has been translated into, to set up the system to work for their community? Also, are you familiar with any examples in which these tools have been used by communities with the intention of promoting and preserving their languages? It would be great to hear about these tools in use in this context.
On the Frontline SMS website, I noticed: The current version of FrontlineSMS (1.6) comes with on-screen language support for English, Arabic, Azerbaijani, Bengali, German, Spanish, Finnish, French, Hindi, Indonesian, Khmer, Portuguese, Russian, Swahili and Chinese. This is great! Are there plans to localize the software in other languages? Are there plans to localize the FreedomFone system?
The questions in the last pragraph are very good questions! Because, today, it's necessary; even urgent to get some application and platforms translated in more african languages to permit them to be on line like the other languages.
Another challege is how to sport who want to do it. You know, in some african countries, it's difficult, as I have already said, for people to work as volunteers. Eveen if we want to develop some ideas, it's not easy to get other people with us because they are not paid. So, a system must be seen to support people who really want to work in that field!
Here is on of the large advantages of this place of meeting and exchanges experiences on line. I'm really very interested in FrontlineSMS an in Freedom Fone. I would like to download them to use for interesting needs here in Mali, but I need help to do it. I would also like to know if they are free and how they work really.
These solutions are free - except that in my experience that are technically difficult to get up and running. You need some IT support. I am actually not that person in our organization (I am more like the face to face guy), so I can't answer any specific questions. But perhaps some of the others can?
I know that Erik Sundelof from our group has essentially developed a customized solution (which you can view at www.hablaguate.com). He might be able to share some pointers with you.
This an interesting question about our languages in the answers of which we mustn’t jump the focal points.
To get a language on line, the users first have to be able to speak and write the language, know the technical materials used (computer or telephone).
For some languages like bambara, a specific keyboard is necessary permitting to type the specific letters (ɲ, ŋ, ɛ, ɔ).
For more activities, the existence of Internet connection is necessary before reaching the biggest focal point: translation of applications and platforms in these languages. Writing on facebook, tweeter and others in our languages by the moment the spaces are in French, English or in other developed languages is not sufficient! We can do it because we understand these languages, but, what will be the case for someone who does not speak English or French. Can he use Facebook or tweeter without how it showed to be connected or disconnected? Not possible.
So, one of the biggest challenge we have to resolve is to do all so that we can translate them in our languages. That’s why I appreciate what Kevin is doing for languages in tweeter.
It goes without saying that first and foremost an internet connection is needed for these groups. And then it may be necessary to involve individuals who speak that language as well as one that is already well-establishment online to start creating a presence online. Despite knowing and working with other languages, many people, despite years immersed in another culture, still appreciate listening to, reading and speaking in their mother tongue. Thus a community as well as an IT engagement is needed to get under-represented languages online.
As Bfrey pointed out, Unicode is very VITAL to using your language with technology. If your language’s character set is not represented in the Unicode code range, you will have major issues not being to using email, posting on the web, texting, etc. I work in the Cherokee Language Technology Program at Cherokee Nation Education Services. Our department has collaborated closely with companies such as Apple, Google, and Facebook to have our language represented in their products. Since all these companies, as well as most software developers, support Unicode, it was important that the Cherokee character set be supported by Unicode itself.
If your language uses the latin character set with various accent marks, your language should have no problem being supported by most technologies. It’s just a matter of learning how to type those particular characters on your computer. The Cherokee Nation made the application to be included in the Unicode code range in the mid 90s. It took about 5 years to get approval. Once that happened, though, our language really took, beginning with Apple’s Mac OS 10.3 which began including a Cherokee Unicode font and a Cherokee keyboard.
I agree 100%. Unicode really allowed the Cherokee language to take off. It showed tremendous foresight on their part. I would say that if your language does not use Latin script then Unicode is a vital first step in using technology in your language.
Locating a code/locale: Languages and their dialects have been assigned codes following the ISO-639 standard: http://en.wikipedia.org/wiki/ISO_639.
A handy resource generated through community input is online in English, French and Arabic – Friedel Wolff, Effecting Change Through Localisation, African Network for Localization, 2010.
The ANLoc front page displays an overview, with links to the different tools recently developed:
Beside the three initial versions of this ANLoc manual, translations into Spanish and Portuguese have been undertaken by volunteers. On pages 13-14, there is short description of "codes" and "locales" as identifiers in systems designed to handle different languages. Languages like English, Russian, German, French usually have a two-letter code: en, ru, de, fr. Others have the three-letter code. These codes can be expanded with country codes, if necessary. For example, English in South Africa ((en_ZA), New Zealand, Canada. Codes identify languages, they largely determine the creation of locales. A locale supplies a corpus of data/information that the software needs to know about a language: the standard way of writing numbers, showing hours and dates. It's a powerful formatting tool. If a locale is not in the system, the language is most likely not supported by it. But software programs go differently about this. With experience, the differences will become obvious to the localizers. Sometimes, they may choose a system like Linux to create their own locale.
Localizers: those who translate but do more – they also adapt a lot of non-lexical, contextual and pragmatic components to the appropriate cultural settings. They have to go about images, sounds, accents, colors, etc. that may or may not be appropriate to a particular context. However, localizers bump quickly into other issues like "codes" and "locales". Underrepresented languages may be confronted with a confusing locale-landscape. Let's take my own language Songhay. It is considered a set of languages and each dialect has been entitled to a language code. Up to twelve (12) codes. This is no problem as such. One can even find more if one looks closely enough. But our question is how does all this help or fetter people who want to create tools for resource-starved languages? In my own experience, the system in place has been a great hindrance and a frustrating wall erected by (to be honest, I have no clear idea!) – let's say the "system".
Sunday, November 13, 2011: I was sitting with a representative of WikiMedia and he was extremely excited about getting me over the first hurdle of establishing a "place" for Songhay articles. We created an account, easy enough, and he entered the locale code "son", which my team created for Songhay in 2008. Under which we have been localizing a dozen programs so far, we got an alarm. This is impossible because "son" has been accepted only as a "collection of languages", so one cannot establish it as a locale. We can always choose one of the twelve existing codes. We both left Berlin and the Mozilla meeting without making an inch forward. This has been the case for our OpenOffice project too. It was first initiated under "son" but recently the constraints are getting heavier on software providers to "comply" with ISO-639 coding that is vaguely related to field research but much less for what could justify such rigidity. We know the alternative of "switching" to dialectal codes but we disagree profoundly with the principle.
For one, the language of localization for Songhay is not just any of the 12 or X dialects. It's one of the standardized forms in Mali or Niger, for example, which structurally follow the same patterns. We are not claiming that they are identical. But the Songhay-Zarma units in Mali and Niger have worked closely together for decades; they published common textbooks with similar titles and texts, they also consult with smaller groups in Burkina Faso, for example. Since 2006, we have a common mailing list and collaborate to produce tools and digitize content for easy access regardless of the dialect in which it is done at this stage. It may be a naive, "nonscientific" opinion on a technical issue, but why not let people who have nothing to begin with pull their meager resources together to generate as much content and computing tools as possible, instead of forcing this kind of "babelization" or "balkanization" at ground zero? If the WikiMedia site is done under "son", we could invite people from all Songhay-speaking regions to submit articles, publish them, compare them, maybe scheme out a more realistic map of "interintelligibility" than the ones being held as authority to date. As far as Songhay goes, we face barriers inherent to a system put in place while most of the world languages lacked any native representative around the table. It is aggravated by the lack of interest of national language policymakers in such linguistic determinants and related technical nitty-gritty, and its risks fossilizing because of outdated research findings becoming entrenched with the commodity they offer to system managers. To take interintelligibility, for example, we know that this is a very dynamic process that cannot be studied once and for all, especially with growing cities, interacting youths and new interactive media bringing back and forth news and entertainment. Labov showed this for America, we know that dialectal leveling happens a great deal, especially if we consider two major Songhay-speaking cities like Gao in northeastern Mali and Niamey, the capital city of Niger: about 400 km/250 miles linked by tarred road on both sides since 2007. These are just an example, but a meaningful discussion about the pertinence of the prevailing coding system needs to proceed from a dynamic outlook, which encourages new research, even challenges contenders to task to create a more productive system. As it is, without ever forgetting the extremely valuable work done in documenting languages in a systematic way, there are cases in which the "beauty" of the system in the eyes of the dialectologist means the chain or straightjacket for the localizer.
Thanks, Mohomodou, for making us aware of this barrier.
I have forwarded your contribution to Wikimedia's Language Committee of which I am a member, together with a request to allow locales for macro-languages. I'm not sure whether that is the way to go about finding a solution but I thought I can at least try. Hopefully, other LangCom members will enter this dialogue - and together, we'll beat "the system" :)
Oliver – thank you for this prompt follow-up. Very much appreciated! I hope we will have some good news to cheer about this pretty soon. And should the committee members have questions and requests, we will be more than happy to engage a dialogue with them and hopefully come to a pragmatic settlement in this particular case.
Thanks, Kevin, for the explanation of the technicality. I found this documentation page. So, who would we have to apply to for changing ISO-code:son from Collective into a macro-language - SIL?
Here is the form for change requests.
Thank you Kevin, for this reminder. It points to important technical details.
I meant the term "collective languages" by "collection of languages". In any case, reading the elaborate descriptions of "macrolanguages" and the rather sparse passages on "collective languages", it becomes clear that more research is needed to disentangle the parallel categories to which we get thrown back and forth. Songhay figures among "collectives" under both "son" (Songhay) and "ssa" (Nilo-Saharan). Reading both entries though, one is struck by familiar incongruities. The report on the fundamental issue of "interintelligibility" under "son" is more anecdotal than anything rigorously grounded. It refers to dialects spoken in my hometown (Gao, Mali) and Niamey, Niger, where I lived one year in the late 1980s and to which I returned in 2008.
The point is not to get further enmeshed in specifics, but the blunt statement that the dialect of Gao is unintelligible to the speakers of Zarma (Niger) is enough for an alarm. I never spoke anything but the Gao idiom to Zarma speakers, and I never needed an interpreter. Then we go to "ssa" and the old controversies about Greenberg's "Nilo-Saharan" hypothesis, we see the same pattern, that code sequences that are taken as holy writ are actually based on unsettled, if not problematic, hypotheses. As I said in my original posting, having read a lot of arguments for the different turns in the rich regime of classification to which Songhay has been subjected, more than anything, there needs to be fresh research involving – this time – language experts that are also native speakers, as well as resource persons with experience in policymaking and cross-border collaboration. What I am hinting at is that it won't be enough to fit Songhay into a category. If we are going to aim for credible categories, then the "fuzzy" descriptors have to be reviewed, fine-tuned, and made more accurate. If need be, they should be undone. Maybe quixotic at this stage, but I don't see how we can avoid this. It will be a challenge to us too, as native experts, to present our own positions, not just as "native" intuition, but as an element of empirical evidence. In any case, this is a reflection, we initiated and this exchange is stimulating enough for a follow-up on our mailing list.
I agree 100% that more research is needed. The language codes and macrolanguages are the work of SIL and while they represent a remarkable achievement, much more input from local language communities is needed, especially on questions of mutual intelligibility. Some good news is that I've reported a few errors to the Ethnologue editors in the past, and they've always been very receptive and responsive to those reports.
From the documentation at sil.org:
From everything you've described to me over the years Mohomodou, this description would fit the Songhay languages. As I understand it, you are using a single written form of the language for your translation of Firefox., and the 8(?) Songhay languages are at least as closely related as the different varieties of Arabic. That sounds like a macrolanguage to me, and if so it would be worthwhile making this case to SIL.
Without understanding any of the details, if the written form you're using represents an earnest attempt at standardization across varieties, you might additionally argue for a new ISO 639-3 individual language code for "Standard Songhay"; this is how it works for Arabic (macrolanguage "ara" encompassing many around 30 varieties, including "Standard Arabic" with code "arb"), and Malay (macrolanguage "msa", 37 varieties, including "Standard Malay" with code "zsm"). Again, maybe this is nonsense, just suggesting some possibilities.
I'm a big believer in standards (see other threads in this discussion for the huge benefits reaped by the Cherokee Nation and others from the mere existence of the Unicode standard), but it's important that these standards reflect reality and don't stand in the way of people trying to get things done!
Kevin, Mohomodou, thanks for the link to the change request form, and for the reminder that more research is needed, especially input from Songhay-speaking researchers.
Would you, Mohomodou, tend more towards code:son as macrolanguage or towards a new code for Standard Songhay? And would you be the main contact for writing up what is required for the change request? I'll be contacting a couple of colleagues who work or have worked in Songhay and direct them your way.
If one outcome of this dialogue is the facilitation of a recognised Standard Songhay locale, I'll be very happy indeed!
Kevin: I followed the link to change requests. It's very helpful of you once again.I agree with you that even when contending with problematic criteria, it is essential to safeguard standards. Let me add that when we had the last localization workshop together in Gao, the issue of a unique writing system came up again. Much has been done to align the two main standardized forms adopted in Niger and Mali but I myself see the near future with two systems that one can push toward greater convergence, so that one can learn to read both with relative ease. Something like: 1) "to read", "ka caw" -> "a reading" ->> "the reading" - "cawo" (in Zarma) and "cawoo" (Songhay/Sŋ); 2) "market", "habu" (Z), "hebu" (Sŋ) --> "the market", "habo" (Z), "heboo" (Sŋ). The two language units colloborate but they keep to their national writing preferences beside the dialectal differences "habu" and "hebu", for example, "sanni" (language) and "senni". They are different forms but the patterns are also predictable in many cases. In this regard, I don't think one can speak of a single standard (written) Songhay. As I always say, our discussion should not elude the extent of variation across Songhay idioms. But variations like ã, õ, ẽ, ĩ, ũ in Mali vs. the tilde (~) below a, o, e, i, u in Niger do the same thing: mark more or less rare cases of nasalization. For the keyboard layout, we just accommodate both for now until the convergence happens.
Oliver: I am willing to be the main contact for a focused exchange on this issue. Mostly for convenience, I can receive and relay the information so that we can move forward. I also want other colleagues, who have been invited to follow the exchange, to have their say. We will be happy to pursue it with an open mind and in the interest of finding a reasonable solution. For example, if the acceptance of Songhay as macrolanguage allows us to localize tools like OpenOffice or create a WikiMedia under "son", then that's what we are looking for. Again, I appreciate your push for a quick breakthrough. I hope to see something coming from this momentum.
Thanks, Mohomodou, for being willing to be a contact point.
It doesn't look like the colleagues I've written to are going to react before this dialogue is over. So, I'm going to keep you posted by email.
Looking forward to continued interaction!
One way of validating the Songhay code with the Wikimedia Foundation is creating a test version in the wiki incubator. Actually, a placeholder already exists for son:Songhay. Follow the help manual to create the Songhay test wikipedia under ISO code 639-2:son. I was told that that is acceptable. Best of success!
Thank you, Oliver: we'll make good use of these hints to move forward. I guess it's the first step I took with the WikiMedia colleague in Berlin. So we can build on it. I also wrote him yesterday with the link to this dialogue.
"As far as Songhay goes, we face barriers inherent to a system put in place while most of the world languages lacked any native representative around the table. It is aggravated by the lack of interest of national language policymakers in such linguistic determinants and related technical nitty-gritty, and its risks fossilizing because of outdated research findings becoming entrenched with the commodity they offer to system managers."
This comment reminded me of the quite arbitrary way Google has prioritized their language localization efforts. The current ISO categorization of languages falls under two principal nomenclatures: the 2-letter code languages, and the 3-letter code languages. The 2-letter codification was the original system used until they quickly realized that there are not enough 2-letter combinations to accomodate all languages of the world. So they started indentifying additional languages with 3-letter codes.
Tech companies such as Google interpret the 2-letter coded langauges as being the most important, widely used, and therefore highest prioirity languages. Hence they have localized there products in the majority of these languages and have largely excluded the 3-letter coded languages so far. In reality, the 2-letter code languages are simply those that were first categorized, and contain for obvious reasons a disproportianate number of Indo-European languages compared to other language families. For example, Corsican, a Indo-European language of tens of thousands of speakers, has a 2-letter code, while Cebuano, an Austronesian language of over 20 million speakers, does not. To this day, one can search on Google in the Corsican language but not in Cebuano.
This is an example of convention and poorly understood/considered technicalities can pose major barriers in opening up the technological/cyber medium to underrepresented languages.
I think like the other practitioners here, I could talk all day about this, so this is an overview of my thoughts. Happy to answer questions and to engage further.
Commercialisation is the biggest obstacle for Māori being online and even spoken. For too many years commercial organisations basically held Māori to ransom with high prices to purchase basic incomplete dictionaries, editing and writing tools. Prior to UTF-8 we had special fonts and soft keyboard that were so expensive that only large organisations could afford it. Then other basic language tools became commercial and out of the reach of most speakers. After about a decade, most tools are now freely offered and utilised by anyone who wants them. Though the same commercial entities still manage to sell to organisations.
We discourage the use of transliterations and opt in favour of creating new terminology based on traditional thoughts, objects and values. For example the Māori word for a vehicle such as a car is waka. Waka is the traditional word for the main form of ocean transport a canoe or boat. For other words such as Internet the word is ipurangi gained by combining ipu (vessel or bottle) and rangi (sky) together.
Another option is to look to other languages that are similar and see what words they have used. For Māori, we have the whole Pacific area of languages to consider.
Again, this should be community driven and the results shared. If the results are kept hidden and not shared, then multiple words for one definition will occur as has for Māori. Some new Māori words now have over 20 English translations.
For localisation projects, we have Windows 7, Office 2010, Moodle, Google, parts of Android and Skype. The localisation projects are often lead by a university who champions the project and individuals with a passion. But in my experience there is a great deal of open source developers who are willing to assist you. Much software today will allow you to localise and then distribute the files yourself. Skype is one example, Microsoft Office can be localised via Visual Basic and macros.
Google is open to be localised and is in my opinion the most important localisation project as Google is so widely used. I think if IT savy people see their language as the Google interface then it will encourage them to use their language and is a clear signal that their language is alive and cool to use.
The ability to create custom keyboard layouts is now plentiful. Though Microsoft offer a Māori locale, i find it to be ridden with bugs so choose to use Tavultesoft custom keyboard layout and use the same keys as Microsoft.
Spell checking tools are simple enough to create for Microsoft, without having a great deal of technical skills as is several browsers including Internet Exploerer. The help files will often mention how to create a custom speller. Then for Open Source there is ASpell from Kevin Scannell who is also a practitioner here.
Finally a comprehensive and publically available database of all words and definitions should be made available so that all the community have access to it. This will further increase the likelihood of
Promotion needs to be at grass roots level in my opinion. If the youth are using FaceBook, Twitter, Google+ etc, then maybe consider language groups that only use your language and localisation in areas they use. But make it cool and applicable. YouTube videos in your language with incentives to get wide participation is also key to getting minority languages online.
Thanks, Karaitiana! It would be great to hear about how the Māori community is using FaceBook, Twitter, Google+ etc. How are you and others working to strengthen this online community? Is there enthusiasm at the grassroots level for this kind of online engagement? I look forward to reading more about your experiences and ideas!
Kia Ora Karaitiana,
Thank you so much for all your excellent work with the Māori language. I have a soft spot for the Māori language - my wife is Māori (Ngāpuhi, Ngāti Whātua, Ngāti Hine). I am particularly interested in your work to create new Māori words:
My work on Taualanga Tufungalea / Tongan WordsWorld Facebook Group also favors creating new terminology based on traditional Moana Nui (Pacific) concepts, practices, and objects. I am very much interested in collaborating with you and other Moana Nui language advocates (like Emani Fakaotimanava-Lui from Niue and Keola Donaghy from Hawai'i) in creating/coining new terminologies based on our common Moana Nui culture. Maybe we should start a Facebook Group or website together. Let me know what you think. I hope Emani and Keola are reading this post (Is there a way for me to tag them?). One great resource we have online is Pollex - Polynesian Lexicon Project Online. I think a collaboration from individuals representing all 30 plus Moana (Polynesian) languages will be great.
Yes, any work would be good but lets decide what the best tool is. Will this be FB, Wiki, Pootle, etc or maybe during this dialogue we'll pick up some ideas as a base and develop from there? Or perhaps one of our friends from this dialogue may already have done such work and we could adopt similar concepts and adapt to our Polynesian World.
The concept I am looking at is to list the technical terms on a website, notify all Niue people online to go and add their own translation of the best Niue word. Once its completed, it will be given back to the local community and signed off by the "matua" or our old folks who are known language experts. This final copy then becomes the base of where we will move into translating softwares, websites, spell checkers, etc.
We are in the best place right now as a lot of work has been done already and our work would be complementing those. The people who have shared here have shown good initiatives that we can also adopt for our work and if possible we could also learn from them how to do it efficiently.
On this note, at the last PacINET 2011 held in Pago Pago, American Samoa which is the annual get together for the PICISOC (Pacific Island Chapter ISOC), a discussion was held at the AGM to form a Special Interest Group (SIG) for Local Content. Maybe you can contribute there as it may also work alongside what you already do for your work on Taualana Tufungalea.
"Google is open to be localised and is in my opinion the most important localisation project as Google is so widely used. I think if IT savy people see their language as the Google interface then it will encourage them to use their language and is a clear signal that their language is alive and cool to use."
I highly agree that getting access to a language on Google is a very potent and symbolic step for the language's reputation and tech-promise. However, is Google really open to be localized? In my brief research it seems that Google has stopped adding new languages to its search interface, having included most of the 2-letter code ISO 639-1 languages and seemingly unwilling to delve much into the 3-letter ISO 639 languages. Hence, if you go to the Google In Your Language (GIYL) FAQ page, they say they are no longer adding any languages at this time. GIYL is the principal tool by which the public can help translate Google products into different languages. If a language is not listed on GIYL, then it will not likely become available as a interface language of Google's products like Search, Gmail, etc because the public is simply not given the opportunity to help in its translation.
The exception to this rule is if Google, in partnership with a university or organization for example, privately takes the initiative to localize a Google product into a new language, even if it's not publicly listed in GIYL. This has been done for several African and other languages. The question I have for this forum is -- how to best go about starting such a partnership project?
As josenavarro mentioned in another post in this discussion group, the exclusion of the major Philippine languages from Google is having a detrimental impact on the status of these languages and their users. From those who have had experience working with Google in localizing their products into minority/marginalized languages (especially those that do not have 2-letter ISO codes), we would really appreciate tips as to how we might go about bringing Google Search to the other Philippine languages! Whom shall we engage at Google and how? Have a bunch of universities, literary organizations, and other language stakeholders in the Philippines send general letters to the regional Google office requesting for more languages, and offering help towards such a goal?
Reading all the contributions here has certainly been positive. Our people have been doing various projects to assist with the preservation of our Niue language but this dialogue has opened up other avenues. It is so great to know of so many other initiatives that have been undertaken by people in their own work for their language. Some of these tools will be adopted by us as a basis to further the work of our people. So for now, thank you all and I look forward to a lot more discussions and reading all of work that you guys are championing out there.
Has anyone done work in developing apps/games using language/themes like traditional legends, historical figures, heros, etc of a country?
An iPad app for Ojibwe was released this week. Unfortunately the interface is presented in English however, there appears to be a syllabic version available. Friends who have tried it say the volume is too low but we are happy for this development and expect it to improve.
In the future I would like to work on creating animations based on our clan based teachings/stories.
Thanks a lot for sharing your experiences, and for showing the way for under-represented and oppressed languages on the Internet and the new media. Btw, I favor the use of the term "oppressed" for my language, because language policy in the country which rules the territory where my language is spoken (the Philippines, which rules the Kapampangan [pam] area) oppresses, in various ways, my language and actively causes it not to be used. My language is forbidden in many schools, and actively discouraged in the media and public places, and this is causing a widespread language shift towards the national language (Tagalog, which has been renamed "Filipino").
The advent of the Internet and computers should have been an equalizer, but it is intensifying instead the marginalization of my languages, and all languages in the Philippines other than Tagalog. For instance, Tagalog has been made the default language of Google Philippines (in contrast to, say Yahoo! Philippines), as well as other media like Blogspot or Blogger (for its part, Facebook now has a Tagalog version, even while it ignores calls for versions in other Philippine languages). In the meantime, Google, while forcing the Tagalog version all over the Philippines, while denying interfaces for non-Tagalog languages such as for Cebuano and Ilocano (the second and third biggest languages in the Philippines; my native Kapampangan is seventh), making people look up to Tagalog even more, and down on their native tongues.
Forgive me for the negative tone of my post, but at this point, the new technology, while it holds great potential for reversing the situation, holds only "potential" at this point. The godlike status accorded to the national language in my country is intensifying language shift, at a time when, as I said, when the media might be doing otherwise.
This is also true for other media like online translators, which are again available for Tagalog (or "Filipino" as they now call it) and not in other Philippine languages. In the meantime, in other media, such as mobile phones, predictive texting is available, again for Tagalog, and unavailable for all other Philippine tongues. Forgive me for what sounds like carping, but that is the most dominant situation that I can share. I will greatly appreciate help towards reversing the situation, which is reinforcing the neocolonial treatment being received by my people in school, the media, and Philippine society in general.
Thank you for sharing these challenges, Jose. This sentiment has been raised by a few others in the dialogue, including Boukary. In his comment, he wrote:
It is a serious challenge - what do practitioners do when policy is not simply un-supportive, but working against you in your efforts to promote and preserve languages? It will be great to hear from others on this challenge/question!
I strongly believe that innovation in language policy always begins at the grass roots and with the private sector. Policy setters, like a Ministry of Education, for example, are reactionary and slow-moving beasts. But amazing things can happen without having policy in place, things which are prerequisite for pushing policy in the right direction.
In Guatemala, for example, it took decades of organizing to get policy concessions on language rights, and most of those concessions are still just paper promises only. Bilingual education happens, but it is still mostly terrible and mostly subtractive bilingualism, for example. We have a semi-governmental Academy of Mayan Languages, but it is mostly undercut and underfunding by the government.
Nevertheless, great things are starting to happen, even on line. A major resources have been the non-governmental organizations, and also academic institutions from outside the country (anthropologists and linguists from USA and Europe). Are there similar private sector resources in your context that you could hook into?
I get the impression that this is more a matter of implementation at local or regional level than the actual language policy at national level. Most UN member states have signed the 1993 UN Declaration on the Rights of Persons Belonging to National or Ethnic, Religious or Linguistic Minorities (it was easier to locate this at a US university than at the UN website!) - and some have revised their official language policy accordingly. Still, that doesn't mean that things necessarily changed for the better for minority languages. At times, local administrators act out of ignorance. At least in those cases, I have found it helpful to have a copy either of the UN resolution or of the official language policy at hand in order to show them (i.e. in the Tanzanian context I carried a Swahili version with me) and discuss it. Not always successful but in a couple of villages, it opened the door for the local official at least to tolerate the conducting of literacy activities in the mother tongue.
And in cases where an official "policy is working against you", you can at least show the local administrator that it is in defiance of international agreements. Who knows? Maybe, you can win an ally.
One thing I found really inspiring was a pamphlet published by Darell Kipp of the Piegan Institute (http://www.pieganinstitute.org/pieganindex.html) that listed a series of "do's" & "don'ts" for revitalizing your language. Although I can't remember all of them, my favorite one was "never ask for permission." The point was, if you wait for permission before beginning to revitalize your language, you're stuck on someone else's timeline & you're playing someone else's waiting game. Do whatever you can *now,* even if it's very minimal, and you're making progress. If you have to meet in someone's house & have lessons because you don't have permission to build a school, then do that. If you have a school but don't have funding, ask for volunteers. (Worst case scenario, you become the volunteer) The very minimim you have to have is speakers & learners. If the speakers are gathered together in a strong, tight-knit social network, that helps a lot. If the learners are young enough to be able to learn as a first language, that helps a lot. But even if you have to do something like partner a master speaker with a second-language learner as in Leanne Hinton's Master-Apprentice Program (http://www.amazon.com/Keep-Your-Language-Alive-One/dp/1890771422) that's also helpful. Your only goal really is that the language gets transferred from person to person. It has to do that and it has to be done in a meaningful way that ensures it will stick & be transferred again. Involving government organizations can be awfully nice and supportive, but generally it's something that comes much later, after the movement has risen from the bottom up. Of course, as Oliver has pointed out, sometimes de facto support is already there - you just have to enact some plan to call attention to it.
I like the spirit of Ben's kick-along perspective. There are all sorts of (mixed) feelings involved in blocking us. Oliver mentioned guilt. It's as relevant as the small number of speakers, the gap between generations and the actual ability to transmit the language to the youth, etc. I think it's not just naive cheerleading. More than that, it is the stubbornness, resilience, resourcefulness and endless creativity needed to push along with resource-poor languages, spoken in often empoverished and isolated regions. All the handicaps and barriers are real; only that they shouldn't be sufficient or allowed to have the last word. A saying in Songhay goes like this: A dog finds a dry and bare bone on its way and does what a dog is inclined to do. To stop and pay attention to the bone. The bone almost takes pity on the dog and tells it to keep going if it has any business ahead; because it (the bone) has been around so long, it is called the "hardest of the hard". To which the dog answers back, that it has just cancelled all its appointments. The one that takes all the time against the one that takes all the time... to get there. The maxim of "šendaa-šendaa nda goybenaa." Plowing ahead doggedly is certainly a vital strategy. Really when I look back, things we did six or seven years ago can only be amusing today. But it was necessary to go through that fog too and move beyond it, with help at times, or without any. For newcomers, it is essential to understand and accept that while it is wonderful to have a clear agenda, most of the time it is just impossible to know exactly what to do first, unless it suits one enough to follow others. Except that many are doing something totally new in their language(s). They have to map that road for themselves and others. They have to create the "community", build the "network", do and undo things continually.
I like the saying about the dog! That's exactly what I was trying to say. In the beginning it's hard - that's a given. You're trying to do something new, and everything really is stacked against you. You have to do SOMETHING, but you don't know what that something is - so just start doing things! You can't go wrong with something when the alternative is doing nothing. I remember when I started trying to learn Cherokee and I thought that there were NO resources - and at the time, there were none (or at least very few)! It has only been about 8 years between then and now, and we've come so incredibly far. It is possible! It's just a big investment up front.
Another way of increasing the online visibility of an under-represented language would be to have a wikipedia in that language. There is now a simplified way of starting wikipedia test versions. If you know the ISO 639-3 code of your language, there probably is a placeholder on wikimedia.org for your language already. For example, there isn't a Rangi language wikipedia yet but there is a page for starting one. Replace the last three letters in the URL (=lag in my example) with the 3-letter code for your language, and you should get the start page for your language. Open the help manual in another tab and follow the instructions.
Now, I realise that not every language community will have mother tongue speakers who are up to going through that process. So you may want to have an IT guy at hand to help train a couple of mother tongue speakers. From what I have heard, it is most encouraging to speakers of under-represented languages if they can focus on localised content first, e.g. writing on local customs or famous people and locations.
I wonder if you could simplify the process for your speakers by having them type their article into a word document so you could input it later? I'm not sure if that would be helpful, but it would probably save the speakers the hassle of trying to do the coding & starting up a page. I guess the other thing you could do if the speaker can't or doesn't want to type would be to have them tell a story about someone/something and have someone transcribe it later, clean it up, & post it as a wikipedia article (given, of course, that the speaker knew you were going to do that & had given permission). The other thing I've thought of regarding wikipedia is to involve an immersion school/language nest if you have one. You could assign kids to do a report on something (owls, for example) and go find information about it in a more widely-spoken language. They use what information they have compiled to create a consolidated article in your language, then put it on wikipedia. They can cite their sources at the bottom of the wiki page, and then for the next class that comes along, you could have them consult the first class's wiki page to learn about owls, then check their sources & see if there's anything they'd like to add/change.
Hello all, it is great reading your comments with regards to this important question.
In the case of many Amazonian people who speak endangered languages, there are many factors leading to their languages not being represented online: lack of access to electricity, personal computers, internet, and computer literacy related to social media. Those living closer to urban centres usually have more access to internet, and have their own emails and Facebook accounts, but tend to interact in the dominant language, Spanish. I have met some Awajun cultural activists in northern Peru who are using the internet very frequently to share news about indigenous activism, and sometimes write in their language. In contrast, in other remote parts of the Amazon, where there is less access to internet, indigenous people use social media much less regularly, and only when they travel to cities.
Also, not having access to computer scripts for their alphabet can also make it tricky to write online. Many of the computers in the Amazon in internet cafes are not equipped with the proper fonts and symbols to be able to write in the indigenous languages, or the users do not know how to access the special symbols on those computers.
In the case of the Yanesha people of the southcentral Peruviahn Amazon, they have recently adapted their alphabet so that it will be much easier to write on computers and send emails in the Yanesha language. Before, writing on computers was slow and arduous, because one had to have special key commands to produce the symbols that were needed, and they would not often transfer properly from computer to computer. With the new alphabet, recently implemented in 2011, writing online has become much easier for the Yanesha people.
all the best,
Our group translated the Google interface into Cebuano a few years ago, and not knowing the email address of Google, we sent hard copies to their management by registered express mail. We were not given the courtesy of a reply. While Spain has a Catalan Google and India has Google in a number of local languages, in the Philippines the only Google other than English is in Tagalog (also called by some as Filipino).That puts all the170 non-Tagalog languages at a disadvanage.