Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||Refine Mutilingual Support with ICU|
|Component:||i18npool||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Issue Type:||ENHANCEMENT||Latest Confirmation in:||---|
Description jiayanmin 2009-07-08 06:19:17 UTC
OpenOffice.org is a leading open-source office software suite which is available in many languages and works on all common computers. Some of the features related to locales, such as character collation and text boundary analysis are implemented by means of ICU (International Components for Unicode) but others are not, including lacale data formatting and class of string. ICU is a mature and widely portable set of C/C++ and Java libraries providing consistent Unicode and Globalization support for software applications on all platforms. In order to provide a language experience with same result for our users acrossing different platform, it would make sense of improving usability to re-implement locale data formatting and class of string with APIs of ICU. And it will be never nessarary of maintaining the language, contry, and cultural specific data which is defined in XML format so far. Meanwhile, OpenOffice.org will potentially support new features and new languages with the upgrade of ICU.
Comment 1 ooo 2009-07-08 15:15:23 UTC
So far OOo always supported some locales that were not present in ICU. This may change over time as ICU pulls in more locales from CLDR. Anyway, OOo in its locale data and the number formatter that uses it has more fine grained control over number formats than ICU offers. We also have data that ICU doesn't offer at all, so we will never completely abandon our own locale data. I also doubt we want to replace our string classes with ICU's (we should get rid of ByteString and UniString though and use OString and OUString instead), also because the stable API uses OUString. However, merging existing ICU data and OOo data during runtime where applicable can be a long term goal, but not in the foreseeable future.
Comment 2 jiayanmin 2009-07-13 09:21:00 UTC
The newest ICU 4.2 includes data from CLDR 1.7 and supports 468 locales. So I think most locales that OOo supports would be covered by ICU. OOo could provide a unified way to supported a new locales by means of ICU. So far adding a new locale to OOo seems a little confusing and complex.:) Anyway, the locales which not covered in ICU could be support by patching to ICU. So merging ICU data and OOo data is really meaningful for refining i18n framework of OOo.
Comment 3 jiayanmin 2009-08-12 08:50:21 UTC
I'm from G11N component of symphony development team. We are working on re-implementing locale data formatting with ICU API for OpenOffice.org. I really hope OOo community could be benefited from our work.
Comment 4 ooo 2009-08-12 20:40:23 UTC
Hi yanminjia, That's great news! Could you please publicly share your development, e.g. by working on a CWS (childworkspace), so we can discuss changes and code? Thanks Eike
Comment 5 jiayanmin 2009-08-14 06:30:31 UTC
Hi Eike, I just submit my public key for setting up subversion access. Please refer to issue 104218. :) Thanks. Yanmin Jia
Comment 6 jiayanmin 2009-08-21 06:35:32 UTC
Hi Eike, Actually, the OOo locale data in xml files offer various patterns, for example, DD/MM/YY for date text or HH:MM:SS for time text, which can get by OOo APIs easily. These patterns could be used as parameters by icu to output formatted text in accordance with given locale. And I think this solution would not lost more fine grained format or pattern control of OOo. Now our work begins with replacing date/time formatting with icu.:) I will creat a CWS as soon as possible. Thanks. Yanmin
Comment 7 ooo 2009-08-21 14:31:12 UTC
Please note that patterns may contain calendar and native numbering information, also mixed representations within one pattern are possible, eras, and some specific handling that arose of the need to mimic Excel display formatting behavior when using such patterns. I strongly doubt ICU handles all that. For reference you may take a look at the number formatter implementation in svtools/source/numbers/, specifically zformat.cxx SvNumberformat::ImpGetDateOutput(), SvNumberformat::ImpGetTimeOutput() and SvNumberformat::ImpGetDateTimeOutput(). I also don't see the need why we should replace the current formatting with ICU; instead, we should obtain the necessary data from ICU where possible. We then could add some features such as possessive month names, possibly call ICU where the OOo i18n framework doesn't have a solution yet, and replace the duplicated i18npool calendar implementations with the ICU calendar. However, I think obtaining locale data such as day and month names and their abbreviations from ICU and merge it into OOo locale data layer if not defined in OOo locale data should be one of the first steps, so we can get rid of duplicated data definitions in OOo. In this context you may be interested in two older surveys we did, comparing OOo and CLDR locale data. The data probably is outdated, but there are some comments why we kept OOo specific locale data in favor over CLDR locale data in some cases. http://l10n.openoffice.org/nonav/i18n_framework/cldr/LocaleDataAudit_OOo_CLDR.html from 2005 http://l10n.openoffice.org/nonav/i18n_framework/cldr/LocaleDataAudit_OOo202.html from 2006