Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||Add TIS620 encoding missing from sal/textenc/tencinfo.c|
|Component:||code||Assignee:||Stephan Bergmann <stephan.bergmann.secondary>|
|Status:||CLOSED FIXED||QA Contact:||issues@l10n <issues>|
|Priority:||P3||CC:||arthit, hin.stone, issues, jjc|
|Issue Type:||PATCH||Latest Confirmation in:||---|
|Issue Depends on:|
Description samphan 2005-02-28 10:49:47 UTC
The encoding TIS620 is missing from the tables in sal/textenc/tencinfo.c. This prevents, for one thing, Thai spelling-check dictionary file encoded in TIS-620 to work in OOo. The attached patch add support for ISO8859-11, TIS620, TIS620.2529 and TIS620.2533. See http://linux.thai.net/~thep/th-xwindow/#Charsets for info on Thai encodings.
Comment 1 samphan 2005-02-28 10:51:23 UTC
Created attachment 23106 [details] Patch to add TIS620 to sal/textenc/tencinfo.c
Comment 2 arthit 2005-02-28 19:37:32 UTC
confimed. with patch.
Comment 3 Martin Hollmichel 2005-04-01 15:41:04 UTC
Comment 4 Stephan Bergmann 2005-04-04 08:38:17 UTC
Comment 5 samphan 2005-04-08 07:58:52 UTC
Can you make it in OOo 2.0? - the patch is provided. - the patch is tested and used in OfficeTLE, a well-known local version of OOo 1.1.x. - without it, users or localization pack can't add Thai dictionary The inability to use Thai dictionary is crucial because it disables an important feature (spell checking) for Thai.
Comment 6 Stephan Bergmann 2005-04-08 08:48:56 UTC
Comment 7 Stephan Bergmann 2005-04-28 09:03:22 UTC
sb->samphan: Looking at your patch, I'm not sure how exactly rtl_getTextEncodingFromUnixCharset should behave. Applying your patch directly, it would map (1) "TIS620" -> DONTKNOW (2) "TIS620-2529" -> TIS_620 (3) "TIS620-2533" -> TIS_620 (4) "TIS620-1234" -> TIS_620 (5) "TIS620.2529" -> DONTKNOW (6) "TIS620.2529-foobar" -> TIS_620 I wonder whether (4) and (6) are as expected. Can you specify exactly which input shall be accepted as TIS_620 (<ftp://ftp.x.org/pub/DOCS/registry> does not mention TIS620, so I assume the values in question are nonstandard but in general use in Thailand)?
Comment 8 samphan 2005-04-29 02:06:41 UTC
Sorry. I may not understand the code thouroughly. X do support tis-620. See /usr/X11R6/lib/X11/fonts/encoding/tis620-2.enc or http://cvs.freedesktop.org/xorg/xc/fonts/encodings/iso8859-11.enc?rev=18.104.22.168&view=markup STARTENCODING iso8859-11 ALIAS tis620-0 ALIAS tis620.2529-1 ALIAS tis620.2533-1 ALIAS tis620.2533-0 ---- glic also support tis-620. See /usr/share/i18n/charmaps/TIS-620.gz: % alias TIS620 % alias TIS620-0 % alias TIS620.2529-1 % alias TIS620.2533-0 % alias ISO-IR-166 ---- Can you modify the patch to accept these values?
Comment 9 Stephan Bergmann 2005-04-29 09:18:01 UTC
I adapted the patch so that now exactly TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1 (ignoring letter case) are accepted. Additionally accepting the glibc variants that do not have exactly one hyphen would be more tricky; if they turn out to be needed in practice, we have to reopen this issue. Tests in sal/qa/rtl/textenc/rtl_tencinfo.cxx.
Comment 10 arthit 2005-04-30 22:07:23 UTC
I've no idea about the X.org's registry. But TIS-620 is an official industrial standard in Thailand. Also registered with IANA http://www.iana.org/assignments/character-sets Solaris (and its CDE) do has TIS-620 since version 7. http://docs.sun.com/app/docs/doc/806-1360/6jalch36t?a=view
Comment 11 arthit 2005-04-30 22:12:40 UTC
may be we have to register this TIS-620 (along with other standard encodings that currently not there) to email@example.com
Comment 12 Stephan Bergmann 2005-05-02 08:37:32 UTC
sb->arthit: Some clarification: The patch in this issue was only about rtl_getTextEncodingFromUnixCharset, which "obviously" (the documentation unfortunately is litlle more than a bad joke) is about the final two segments of those long X11 font names with lots of hypens in them (e.g., "...-iso8859-1", "...-tis620.2533-0"). Thus, IANA charset names are irrelevant here (however, OOo does know the MIME character name "TIS-620", see rtl_getTextEncodingFromMimeCharset); and all the X11 font names listed at <http://docs.sun.com/app/docs/doc/806-1360/6jalch36t?a=view> indeed end in "tis620.2533-0", which is now understood by rtl_getTextEncodingFromUnixCharset.
Comment 13 arthit 2005-05-02 08:52:04 UTC
arthit->sb: Sorry. Now I got the point. Thank you :)
Comment 14 Stephan Bergmann 2005-06-02 12:27:56 UTC
Comment 15 Stephan Bergmann 2005-07-11 09:33:15 UTC