Apache OpenOffice (AOO) Bugzilla – Issue 76153
Locale/charset file for Lingala (ln_CD)
Last modified: 2013-08-07 15:01:20 UTC
Locale for ln_CD (Lingala, Democratic Republic of the Congo), thus ln language. Also the charset to sort ɛ after e and ɔ after o. An alternative collation offers digraphs that are morphologically letters of their own.
Created attachment 44252 [details] Locale file
Created attachment 44253 [details] collation for lingala (just letters)
Created attachment 44254 [details] collation for lingala (morphological)
Grabbing issue.
Hi moyogo, Thank you for your contribution. Please note that to integrate code or data contributed we need a signed Joint Copyright Assignment form (JCA) filled-out, see http://contributing.openoffice.org/programming.html#jca To be able to lookup your name in the list of approved assignments I'd appreciate if you stated your full name here in this issue. Regarding the locale data file attached: 1. The ThousandSeparator (aka group separator) is defined to be empty. It should probably be a '.' dot or ' ' non-breaking space instead. Accordingly, the defined number format codes currently don't make use of a group separator. Note that when changing separators all format codes using them have to be adapted. 2. The ListSeparator is defined as ' ;' including a leading space, this is probably a typo. 3. The currency format codes use the negative form with parentheses <FormatCode>[CURRENCY]###0,00;[RED]([CURRENCY]###0,00)</FormatCode> This is usually only the case for countries that are "influenced" by the USA. Intended? Maybe that should be more something like <FormatCode>[CURRENCY] #.##0,00;[RED]-[CURRENCY] #.##0,00</FormatCode> or wherever the minus sign goes in your locale. 4. The CurrencySymbol is defined identical to the CurrencyID 'CDF'. No problem, but isn't there a distinct currency symbol used? 5. The IndexKey defines only 'A-Z', but the language also uses other characters. This has the effect that in a Writer text document's index table the entries are listed in the order A-Z then followed by other characters in Unicode order. Intended? If another order should take place the definition needs to include the characters, for example A-O ɔ P-Z if ɔ should go between O and P. Regarding the collation data attached: Is that meant to offer _two different_ collation algorithms at the UI in the Sort dialog? Or should the morphological letters with digraphs ln_morph.txt be used? Thanks Eike
Created attachment 44482 [details] fixed ln_CD.xml
Hi Eike, I'll send the JCA form asap. Regarding the locale : 1. The ThousandSeparator is ' ' non-breaking space. Thanks for noticing the empty one. 2. The ListSeparator is ' ; '. 3. I set the negative currency form to <FormatCode>[CURRENCY] # ##0,00;[RED]-[CURRENCY] # ##0,00</FormatCode> 4. The CurrencySymbol is now "F", althought "Fc" is often encountered. 5. The IndexKey is now A-E Ɛ F-O Ɔ P-Z Regarding the collation : the alphabetical order is the most common one I have encountered. The morphological order (ln_morph) is recommended by some linguists so it should be available. But I think the alphabetical order (ln_charset) should be the default, unless there’s an official order that is set by decree or such, which hasn’t happened. Thank you
Hi moyogo, > I'll send the JCA form asap. Good. Btw, what is your full name, so I can look it up in the list of approved assignments? > 1. The ThousandSeparator is ' ' non-breaking space. Thanks for noticing the > empty one. The format codes have to be adapted to use it. I'll do that. > 2. The ListSeparator is ' ; '. The separator should be one character only, I'll remove the surrounding blanks. > 3. I set the negative currency form to > <FormatCode>[CURRENCY] # ##0,00;[RED]-[CURRENCY] # ##0,00</FormatCode> Also the codes not having [RED] negatives probably should be adapted, I'll do. > 4. The CurrencySymbol is now "F", althought "Fc" is often encountered. Which means that also the LC_FORMAT replaceTo attribute should use 'F', will do. Btw, I assigned the MS-LangID 0x0639 to ln-CD, so it reads now replaceTo="[$F-639]". > 5. The IndexKey is now A-E Ɛ F-O Ɔ P-Z Fine. > Regarding the collation : > > the alphabetical order is the most common one I have encountered. > The morphological order (ln_morph) is recommended by some linguists so it should > be available. > But I think the alphabetical order (ln_charset) should be the default, > unless there’s an official order that is set by decree or such, which hasn’t > happened. Since we don't have a "Morphological" collation algorithm yet, not even in the user interface, would that be a proper name? The alphabetical order usually is called "Alphanumeric". Note that most languages don't use a "Character Set" order, but have alphanumeric instead. The morphological order also resembles somewhat that of the hu_HU locale where a "charset" collation is used. As I'm absolutely not familiar with Lingala, could the alphabetical order be called "Alphanumeric" (and the collation data file be named ln_alphanumeric.txt) and the morpholigical order be called "Character Set" (and the file be named ln_charset.txt) instead? That way we wouldn't need an additional algorithm name and UI entry. The IndexKey element then should follow whatever we decide here and we may as well need two elements. I noticed the percent format codes have a blank between digits and the % character. This is usually not the case and the percent character immediately follows the number, like in 0% . Intended? Btw, the Locale element had the attribute allowUpdateFromCLDR="yes", which should only be set if normative locale data is available in the CLDR and the locale data may be updated semi-automatically. As we didn't do a comparison yet I defined that to "no". Eike
Created attachment 44548 [details] corrected
Hi, my name is still not on the list and I haven’t got a reply yet. In any case, my name is Denis (Moyogo) Jacquerye. Thanks for the corrections. About the spaces around ';' (colon) and the space preceeding '%' (percent), it should be the same as for French. For the alphanumeric and the morphological list, the alphanumeric should be the default system. It is was is most often used in published dictionaries. The morphological list has only been discussed among linguists. It should be optional for now, if possible.
Scheduled for OOo2.3
The decret of former president Kabila to creat congolese franc in 1997: http://www.bcc.cd/monai2a.htm (Central bank of Congo) officially it's "FC" for franc and "c" for centimes. But this is french. for lingala there is probably no decret, because only a national language.
Created attachment 44863 [details] ln locale with FC for currency symbol
Created attachment 44864 [details] morphological charset with rare digraphs/trigraphs
I modified ln_CD.xml to use the currency symbol FC as ruedin pointed out. The ln_morph.txt charset now also contains traditional digraphs/trigraph (gb, kp, ts, ngb) and borrowed ones (mf, mv, sh). I also added the digraph 'ny'.
btw, I'm on the JCA list.
I currently don't have time before OOo2.3 to do all necessary steps for the new morphological sort order. So what I'll do is add charset collation and locale data, and shift morphological things to a new issue for OOo2.4.
In CWS locales23: i18npool/source/localedata/data/Attic/ln_CD.xml 1.1.2.1 i18npool/source/collator/data/Attic/ln_charset.txt 1.1.2.1 i18npool/inc/i18npool/lang.h 1.7.22.7 i18npool/source/collator/data/collator_data.map 1.4.82.2 i18npool/source/isolang/isolang.cxx 1.10.22.7 i18npool/source/localedata/localedata.cxx 1.47.10.10 i18npool/source/localedata/data/localedata_others.map 1.13.10.7 i18npool/source/localedata/data/makefile.mk 1.39.2.9 svx/source/dialog/langtab.src 1.72.296.9 Note that I fixed some currrency formats ln_CD.xml regarding positions of blanks, and the replaceTo needed a 'FC' instead of 'F' to align with the currency symbol.
Reassigning to QA for verification.
found fixed on cws locale23 using Windows and Linux build
found integrated on master OOG680m1 using Linux, Solaris and Windows build