Issue 103399

Summary: Refine Mutilingual Support with ICU
Product: Internationalization Reporter: jiayanmin
Component: i18npoolAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: issues, khirano
Version: OOo 3.1   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: ENHANCEMENT Latest Confirmation in: ---
Developer Difficulty: ---

Description jiayanmin 2009-07-08 06:19:17 UTC is a leading open-source office software suite which is available 
in many languages and works on all common computers. Some of the features 
related to locales, such as character collation and text boundary analysis are 
implemented by means of ICU (International Components for Unicode) but others 
are not, including lacale data formatting and class of string. ICU is a mature 
and widely portable set of C/C++ and Java libraries providing consistent Unicode 
and Globalization support for software applications on all platforms. In order 
to provide a language experience with same result for our users acrossing 
different platform, it would make sense of improving usability to re-implement 
locale data formatting and class of string with APIs of ICU. And it will be 
never nessarary of maintaining the language, contry, and cultural specific data 
which is defined in XML format so far. Meanwhile, will 
potentially support new features and new languages with the upgrade of ICU.
Comment 1 ooo 2009-07-08 15:15:23 UTC
So far OOo always supported some locales that were not present in ICU. This may
change over time as ICU pulls in more locales from CLDR. Anyway, OOo in its
locale data and the number formatter that uses it has more fine grained control
over number formats than ICU offers. We also have data that ICU doesn't offer at
all, so we will never completely abandon our own locale data. I also doubt we
want to replace our string classes with ICU's (we should get rid of ByteString
and UniString though and use OString and OUString instead), also because the
stable API uses OUString.

However, merging existing ICU data and OOo data during runtime where applicable
can be a long term goal, but not in the foreseeable future.
Comment 2 jiayanmin 2009-07-13 09:21:00 UTC
The newest ICU 4.2 includes data from CLDR 1.7 and supports 468 locales. So I
think most locales that OOo supports would be covered by ICU. OOo could provide
a unified way to supported a new locales by means of ICU. So far adding a new
locale to OOo seems a little confusing and complex.:) Anyway, the locales which
not covered in ICU could be support by patching to ICU. So merging ICU data and
OOo data is really meaningful for refining i18n framework of OOo.
Comment 3 jiayanmin 2009-08-12 08:50:21 UTC
I'm from G11N component of symphony development team. We are working on
re-implementing locale data formatting with ICU API for I really
hope OOo community could be benefited from our work.
Comment 4 ooo 2009-08-12 20:40:23 UTC
Hi yanminjia,
That's great news! Could you please publicly share your development, e.g. by
working on a CWS (childworkspace), so we can discuss changes and code?
Comment 5 jiayanmin 2009-08-14 06:30:31 UTC
Hi Eike,
I just submit my public key for setting up subversion access. Please refer to
issue 104218. :)
Yanmin Jia
Comment 6 jiayanmin 2009-08-21 06:35:32 UTC
Hi Eike,
Actually, the OOo locale data in xml files offer various patterns, for example,
DD/MM/YY for date text or HH:MM:SS for time text, which can get by OOo APIs
easily. These patterns could be used as parameters by icu to output formatted
text in accordance with given locale. And I think this solution would not lost
more fine grained format or pattern control of OOo. Now our work begins with
replacing date/time formatting with icu.:) 
I will creat a CWS as soon as possible.
Comment 7 ooo 2009-08-21 14:31:12 UTC
Please note that patterns may contain calendar and native numbering information,
also mixed representations within one pattern are possible, eras, and some
specific handling that arose of the need to mimic Excel display formatting
behavior when using such patterns. I strongly doubt ICU handles all that. For
reference you may take a look at the number formatter implementation in
svtools/source/numbers/, specifically zformat.cxx 
SvNumberformat::ImpGetDateOutput(), SvNumberformat::ImpGetTimeOutput() and
SvNumberformat::ImpGetDateTimeOutput(). I also don't see the need why we should
replace the current formatting with ICU; instead, we should obtain the necessary
data from ICU where possible.

We then could add some features such as possessive month names, possibly call
ICU where the OOo i18n framework doesn't have a solution yet, and replace the
duplicated i18npool calendar implementations with the ICU calendar.

However, I think obtaining locale data such as day and month names and their
abbreviations from ICU and merge it into OOo locale data layer if not defined in
OOo locale data should be one of the first steps, so we can get rid of
duplicated data definitions in OOo.

In this context you may be interested in two older surveys we did, comparing OOo
and CLDR locale data. The data probably is outdated, but there are some comments
why we kept OOo specific locale data in favor over CLDR locale data in some cases.
from 2005
from 2006
Comment 8 Marcus 2017-05-20 11:13:54 UTC
Reset assigne to the default "".