Issue 70181

Summary: xslt-parser must support xml:lang attributes other than ISO 639-1 / ISO 3166-1 (ISO 639 code for Northern Sotho is wrong in UI localization)
Product: Internationalization Reporter: Andrea Pescetti <pescetti>
Component: codeAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: andre.schnabel, andreas, dwayne, issues, joerg.barfurth, khirano, maho.nakata, murrayc, ooo, stephan.bergmann.secondary
Version: OOo 2.0.4   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 74420    

Description Andrea Pescetti 2006-10-07 16:45:53 UTC
The OOo builds for the language "Pedi; Sepedi; Northern Sotho" contain the
inexistent two-letters (à la ISO 639-1) code "ns" instead of the official
three-letters (ISO 639-2) code "nso".

The wrong code appears, for instance, in
OOD680_m5/solenv/inc/postset.mk:completelangiso=af ar be-BY bg br bn bn-BD bn-IN
bs ca cs cy da de el en-GB en-US en-ZA eo es et eu fa fi fr ga gl gu-IN he hi-IN
hr hu it ja km kn-IN ko ku lo lt lv mk ms ne nb nl nn nr ns pa-IN pl pt pt-BR ru
rw sk sl sh-YU sr-CS ss st sv sw sw-TZ sx ta-IN th tn tr ts ve vi xh zh-CN zh-TW zu
and creates problems in coordinating the QA activities for localized builds, as
our community tool (qatrack.org) relies on ISO codes to extract the language
from filenames.

This problem was accidentally mentioned in issue 51064 as a "temporary hack" for
version 1.1 but is still there in 2.0.4-RC3.

Refer to http://www.loc.gov/standards/iso639-2/php/code_list.php for a list of
valid codes.
Comment 1 ivo.hinkelmann 2006-10-11 15:40:57 UTC
do you tested your builds with that language merged as "nso" ? If I remember
right there was an issue with xml:lang ( xml readme / officecfg ) that only
supports two-letter in old xml parsers ... maybe it is solved already, I will check
Comment 2 Dwayne Bailey 2006-10-11 21:43:53 UTC
I think the correct person to give input to this is er.

If I recall, although we moved to ISO codes from the numeric codes, not
everywhere could handle 3 letter codes. Looking at the list I don't see any 3
letter languages.

I'm not sure if this problem is still true.
Comment 3 ooo 2006-10-12 12:51:40 UTC
Well, actually I can't say much about any changes in the XML parser used for
configuration files and whether it's capable now to understand ISO 639-2 alpha-3
codes (let alone valid language tags complying with RFC  3066 or RFC 4646), I'm
Cc'ing Joerg for this.

Anyway, AFAIK the language codes used in postset.mk don't affect the
configuration or other .xml files using xml:lang, but are related to UI resource
files instead, or am I wrong on this? Ivo? If this is to be changed, also the
codes in all localize.sdf files would have to be changed, IMHO.

  Eike
Comment 4 joerg.barfurth 2006-10-12 13:40:02 UTC
While I am not really responsible for this 
Comment 5 joerg.barfurth 2006-10-12 14:08:08 UTC
[Sorry for the premature submission of the previous comment. Retrying.]

While I'm not really responsible for this any more, I try to track what happens
in the configuration area. I am not aware of a change of parser for
configuration processing. 

I noticed that a new xslt-processor/parser was introduced into the build
environment, but I don't even  know if the parser introduced there is able to
handle ISO639-2. Neither do I know whether xml_apis.jar and xerces.jar from
$SOLARBINDIR (that is what the configuration build uses) were affected by that
change (i.e. the introduction of Xalan).

In any case it has to be ascertained, that not only the vanilla build
environment, but also all variations possible through 'configure' use a parser
that accepts ISO639-2. Otherwise the build may be broken even for people that
don't build for ISO639-2 languages. (There are a few settings that are
localized, but not translated through localize.sdf)
 
@erack: the xcu files *are* a form of UI resource file. AFAIK they use the same
iso language codes as resource files. And yes: to change the language codes used
there, all the localize.sdf files need to be changed (or the extraction tools
hacked to tranform affected codes on the fly).

To the original submitter: the 'temporary hack' is not related to a particular
product revision, and can't really be fixed by product development. It is
contingent on general availability AND integration into the build environment of
XML processing tools that accept ISO639-2 language codes. It has to stay (in all
product source code lines) until the build environment(s) has evolved to this
point. Then I could be fixed in any product branch.

 
Comment 6 ooo 2006-10-26 15:41:32 UTC
Note that it is necessary to not support only ISO 639-2 alpha-3 and ISO 3166-1
alpha-2 codes, but also language tags according to RFC 4646
http://tools.ietf.org/html/rfc4646

language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse]

At least language-script-region-variant must be supported.
Comment 7 ooo 2006-10-26 15:47:25 UTC
Raising priority to P3, as more and more localizations suffer from this. Indeed
a P2 might be justified if the new parser still doesn't handle an entirely valid
ISO 639-2 alpha-3 code and breaks the build.
Comment 8 murrayc 2011-03-16 12:22:17 UTC
Could someone save me some time by giving me a hint at roughly what code would need to be changed to fix this? I might attempt it.
Comment 9 ivo.hinkelmann 2011-03-16 13:19:11 UTC
we are using already some three-letters isocodes like:

brx, dbo, mai , mni ...

I saw the usage in officecfg. I think this issue is fixed then, someone updated the libxml2 parser
Comment 10 ooo 2011-03-16 14:03:08 UTC
@ihi (In reply to comment #9)
While the original three-letter-code problem might had been solved in the mean time, the problem will reappear once BCP47 language tags will have been implemented (issue 109846).
Comment 11 Marcus 2017-05-20 11:31:08 UTC
Reset assigne to the default "issues@openoffice.apache.org".