Issue 76153 - Locale/charset file for Lingala (ln_CD)
Summary: Locale/charset file for Lingala (ln_CD)
Status: CLOSED FIXED
Alias: None
Product: Internationalization
Classification: Code
Component: i18npool (show other issues)
Version: OOo 2.2
Hardware: All All
: P3 Trivial with 2 votes (vote)
Target Milestone: ---
Assignee: frank
QA Contact: issues@l10n
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-04-06 16:05 UTC by moyogo
Modified: 2013-08-07 15:01 UTC (History)
2 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Locale file (13.52 KB, text/plain)
2007-04-06 16:06 UTC, moyogo
no flags Details
collation for lingala (just letters) (68 bytes, text/plain)
2007-04-06 16:08 UTC, moyogo
no flags Details
collation for lingala (morphological) (393 bytes, text/plain)
2007-04-06 16:08 UTC, moyogo
no flags Details
fixed ln_CD.xml (13.56 KB, text/plain)
2007-04-17 13:25 UTC, moyogo
no flags Details
corrected (13.57 KB, text/plain)
2007-04-19 20:26 UTC, ooo
no flags Details
ln locale with FC for currency symbol (13.58 KB, text/xml)
2007-05-04 13:07 UTC, moyogo
no flags Details
morphological charset with rare digraphs/trigraphs (677 bytes, text/plain)
2007-05-04 13:19 UTC, moyogo
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description moyogo 2007-04-06 16:05:32 UTC
Locale for ln_CD (Lingala, Democratic Republic of the Congo), thus ln language.
Also the charset to sort ɛ after e and ɔ after o.
An alternative collation offers digraphs that are morphologically letters of
their own.
Comment 1 moyogo 2007-04-06 16:06:50 UTC
Created attachment 44252 [details]
Locale file
Comment 2 moyogo 2007-04-06 16:08:16 UTC
Created attachment 44253 [details]
collation for lingala (just letters)
Comment 3 moyogo 2007-04-06 16:08:47 UTC
Created attachment 44254 [details]
collation for lingala (morphological)
Comment 4 ooo 2007-04-12 10:53:51 UTC
Grabbing issue.
Comment 5 ooo 2007-04-12 15:15:11 UTC
Hi moyogo,

Thank you for your contribution. Please note that to integrate code or
data contributed we need a signed Joint Copyright Assignment form (JCA)
filled-out, see
http://contributing.openoffice.org/programming.html#jca
To be able to lookup your name in the list of approved assignments I'd
appreciate if you stated your full name here in this issue.

Regarding the locale data file attached:

1. The ThousandSeparator (aka group separator) is defined to be empty.
   It should probably be a '.' dot or ' ' non-breaking space instead.
   Accordingly, the defined number format codes currently don't make use
   of a group separator. Note that when changing separators all format
   codes using them have to be adapted.

2. The ListSeparator is defined as ' ;' including a leading space, this
   is probably a typo.

3. The currency format codes use the negative form with parentheses
   <FormatCode>[CURRENCY]###0,00;[RED]([CURRENCY]###0,00)</FormatCode>
   This is usually only the case for countries that are "influenced" by
   the USA. Intended? Maybe that should be more something like
   <FormatCode>[CURRENCY] #.##0,00;[RED]-[CURRENCY] #.##0,00</FormatCode>
   or wherever the minus sign goes in your locale.

4. The CurrencySymbol is defined identical to the CurrencyID 'CDF'. No
   problem, but isn't there a distinct currency symbol used?

5. The IndexKey defines only 'A-Z', but the language also uses other
   characters. This has the effect that in a Writer text document's
   index table the entries are listed in the order A-Z then followed by
   other characters in Unicode order. Intended?
   If another order should take place the definition needs to include
   the characters, for example
   A-O ɔ P-Z
   if ɔ should go between O and P.


Regarding the collation data attached:

Is that meant to offer _two different_ collation algorithms at the UI in
the Sort dialog? Or should the morphological letters with digraphs
ln_morph.txt be used?

Thanks
  Eike
Comment 6 moyogo 2007-04-17 13:25:47 UTC
Created attachment 44482 [details]
fixed ln_CD.xml
Comment 7 moyogo 2007-04-17 13:31:22 UTC
Hi Eike,

I'll send the JCA form asap.

Regarding the locale :
1. The ThousandSeparator is ' ' non-breaking space. Thanks for noticing the
empty one.

2. The ListSeparator is  ' ; '.

3. I set the negative currency form to
   <FormatCode>[CURRENCY] # ##0,00;[RED]-[CURRENCY] # ##0,00</FormatCode>

4. The CurrencySymbol is now "F", althought "Fc" is often encountered.

5. The IndexKey is now A-E Ɛ F-O Ɔ P-Z

Regarding the collation :

the alphabetical order is the most common one I have encountered. 
The morphological order (ln_morph) is recommended by some linguists so it should
be available.
But I think the alphabetical order (ln_charset) should be the default, 
unless there’s an official order that is set by decree or such, which hasn’t
happened.

Thank you
Comment 8 ooo 2007-04-19 20:25:07 UTC
Hi moyogo,

> I'll send the JCA form asap.

Good. Btw, what is your full name, so I can look it up in the list of
approved assignments?

> 1. The ThousandSeparator is ' ' non-breaking space. Thanks for noticing the
> empty one.

The format codes have to be adapted to use it. I'll do that.

> 2. The ListSeparator is  ' ; '.

The separator should be one character only, I'll remove the surrounding
blanks.

> 3. I set the negative currency form to
>    <FormatCode>[CURRENCY] # ##0,00;[RED]-[CURRENCY] # ##0,00</FormatCode>

Also the codes not having [RED] negatives probably should be adapted,
I'll do.

> 4. The CurrencySymbol is now "F", althought "Fc" is often encountered.

Which means that also the LC_FORMAT replaceTo attribute should use 'F',
will do. Btw, I assigned the MS-LangID 0x0639 to ln-CD, so it reads now
replaceTo="[$F-639]".

> 5. The IndexKey is now A-E Ɛ F-O Ɔ P-Z

Fine.


> Regarding the collation :
> 
> the alphabetical order is the most common one I have encountered. 
> The morphological order (ln_morph) is recommended by some linguists so it should
> be available.
> But I think the alphabetical order (ln_charset) should be the default, 
> unless there’s an official order that is set by decree or such, which hasn’t
> happened.

Since we don't have a "Morphological" collation algorithm yet, not even
in the user interface, would that be a proper name? The alphabetical
order usually is called "Alphanumeric". Note that most languages don't
use a "Character Set" order, but have alphanumeric instead.

The morphological order also resembles somewhat that of the hu_HU locale
where a "charset" collation is used. As I'm absolutely not familiar with
Lingala, could the alphabetical order be called "Alphanumeric" (and the
collation data file be named ln_alphanumeric.txt) and the morpholigical
order be called "Character Set" (and the file be named ln_charset.txt)
instead? That way we wouldn't need an additional algorithm name and UI
entry.

The IndexKey element then should follow whatever we decide here and we
may as well need two elements.

I noticed the percent format codes have a blank between digits and the
% character. This is usually not the case and the percent character
immediately follows the number, like in 0% . Intended?

Btw, the Locale element had the attribute allowUpdateFromCLDR="yes",
which should only be set if normative locale data is available in the
CLDR and the locale data may be updated semi-automatically. As we didn't
do a comparison yet I defined that to "no".

  Eike
Comment 9 ooo 2007-04-19 20:26:32 UTC
Created attachment 44548 [details]
corrected
Comment 10 moyogo 2007-04-23 22:31:54 UTC
Hi, my name is still not on the list and I haven’t got a reply yet. In any case,
my name is Denis (Moyogo) Jacquerye.

Thanks for the corrections.

About the spaces around ';' (colon) and the space preceeding '%' (percent), it
should be the same as for French.

For the alphanumeric and the morphological list, the alphanumeric should be the
default system. It is was is most often used in published dictionaries. The
morphological list has only been discussed among linguists. It should be
optional for now, if possible.
Comment 11 ooo 2007-04-24 12:56:10 UTC
Scheduled for OOo2.3
Comment 12 ruedin 2007-04-26 01:16:47 UTC
The decret of former president Kabila to creat congolese franc in 1997:
http://www.bcc.cd/monai2a.htm (Central bank of Congo) officially it's "FC" for
franc and "c" for centimes. But this is french. for lingala there is probably no
decret, because only a national language.
Comment 13 moyogo 2007-05-04 13:07:07 UTC
Created attachment 44863 [details]
ln locale with FC for currency symbol
Comment 14 moyogo 2007-05-04 13:19:22 UTC
Created attachment 44864 [details]
morphological charset with rare digraphs/trigraphs
Comment 15 moyogo 2007-05-04 13:22:44 UTC
I modified ln_CD.xml to use the currency symbol FC as ruedin pointed out.
The ln_morph.txt charset now also contains traditional digraphs/trigraph (gb,
kp, ts, ngb) and borrowed ones (mf, mv, sh). I also added the digraph 'ny'.
Comment 16 moyogo 2007-05-04 13:23:12 UTC
btw, I'm on the JCA list.
Comment 17 ooo 2007-06-20 15:08:51 UTC
I currently don't have time before OOo2.3 to do all necessary steps for the new
morphological sort order. So what I'll do is add charset collation and locale
data, and shift morphological things to a new issue for OOo2.4.
Comment 18 ooo 2007-06-20 18:10:20 UTC
In CWS locales23:

i18npool/source/localedata/data/Attic/ln_CD.xml  1.1.2.1
i18npool/source/collator/data/Attic/ln_charset.txt  1.1.2.1
i18npool/inc/i18npool/lang.h  1.7.22.7
i18npool/source/collator/data/collator_data.map  1.4.82.2
i18npool/source/isolang/isolang.cxx  1.10.22.7
i18npool/source/localedata/localedata.cxx  1.47.10.10
i18npool/source/localedata/data/localedata_others.map  1.13.10.7
i18npool/source/localedata/data/makefile.mk  1.39.2.9
svx/source/dialog/langtab.src  1.72.296.9

Note that I fixed some currrency formats ln_CD.xml regarding positions of
blanks, and the replaceTo needed a 'FC' instead of 'F' to align with the
currency symbol.
Comment 19 ooo 2007-06-25 18:42:55 UTC
Reassigning to QA for verification.
Comment 20 frank 2007-07-03 12:51:16 UTC
found fixed on cws locale23 using Windows and Linux build
Comment 21 frank 2007-08-23 15:30:18 UTC
found integrated on master OOG680m1 using Linux, Solaris and Windows build