Issue 43666

Summary: Add TIS620 encoding missing from sal/textenc/tencinfo.c
Product: Internationalization Reporter: samphan
Component: codeAssignee: Stephan Bergmann <stephan.bergmann.secondary>
Status: CLOSED FIXED QA Contact: issues@l10n <issues>
Severity: Trivial    
Priority: P3 CC: arthit, hin.stone, issues, jjc
Version: 680m79Keywords: oooqa
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: PATCH Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 41707    
Attachments:
Description Flags
Patch to add TIS620 to sal/textenc/tencinfo.c none

Description samphan 2005-02-28 10:49:47 UTC
The encoding TIS620 is missing from the tables in sal/textenc/tencinfo.c.
This prevents, for one thing, Thai spelling-check dictionary file encoded in
TIS-620 to work in OOo. The attached patch add support for ISO8859-11, TIS620,
TIS620.2529 and TIS620.2533.

See http://linux.thai.net/~thep/th-xwindow/#Charsets for info on Thai encodings.
Comment 1 samphan 2005-02-28 10:51:23 UTC
Created attachment 23106 [details]
Patch to add TIS620 to sal/textenc/tencinfo.c
Comment 2 arthit 2005-02-28 19:37:32 UTC
confimed.
with patch.
Comment 3 Martin Hollmichel 2005-04-01 15:41:04 UTC
change owner.
Comment 4 Stephan Bergmann 2005-04-04 08:38:17 UTC
accepted
Comment 5 samphan 2005-04-08 07:58:52 UTC
Can you make it in OOo 2.0? 
- the patch is provided.
- the patch is tested and used in OfficeTLE, a well-known local version of OOo
1.1.x.
- without it, users or localization pack can't add Thai dictionary

The inability to use Thai dictionary is crucial because it disables an important
feature (spell checking) for Thai.
Comment 6 Stephan Bergmann 2005-04-08 08:48:56 UTC
No problem.
Comment 7 Stephan Bergmann 2005-04-28 09:03:22 UTC
sb->samphan:  Looking at your patch, I'm not sure how exactly
rtl_getTextEncodingFromUnixCharset should behave.  Applying your patch directly,
it would map

  (1)  "TIS620" -> DONTKNOW
  (2)  "TIS620-2529" -> TIS_620
  (3)  "TIS620-2533" -> TIS_620
  (4)  "TIS620-1234" -> TIS_620
  (5)  "TIS620.2529" -> DONTKNOW
  (6)  "TIS620.2529-foobar" -> TIS_620

I wonder whether (4) and (6) are as expected.  Can you specify exactly which
input shall be accepted as TIS_620 (<ftp://ftp.x.org/pub/DOCS/registry> does not
mention TIS620, so I assume the values in question are nonstandard but in
general use in Thailand)?
Comment 8 samphan 2005-04-29 02:06:41 UTC
Sorry. I may not understand the code thouroughly.
X do support tis-620. See /usr/X11R6/lib/X11/fonts/encoding/tis620-2.enc
or
http://cvs.freedesktop.org/xorg/xc/fonts/encodings/iso8859-11.enc?rev=1.1.1.1&view=markup
STARTENCODING iso8859-11
ALIAS tis620-0
ALIAS tis620.2529-1
ALIAS tis620.2533-1
ALIAS tis620.2533-0
----

glic also support tis-620. See
/usr/share/i18n/charmaps/TIS-620.gz:
% alias TIS620
% alias TIS620-0
% alias TIS620.2529-1
% alias TIS620.2533-0
% alias ISO-IR-166
----
Can you modify the patch to accept these values?
Comment 9 Stephan Bergmann 2005-04-29 09:18:01 UTC
I adapted the patch so that now exactly

  TIS620-0
  TIS620.2529-1
  TIS620.2533-0
  TIS620.2533-1

(ignoring letter case) are accepted.  Additionally accepting the glibc variants
that do not have exactly one hyphen would be more tricky; if they turn out to be
needed in practice, we have to reopen this issue.

Tests in sal/qa/rtl/textenc/rtl_tencinfo.cxx.
Comment 10 arthit 2005-04-30 22:07:23 UTC
I've no idea about the X.org's registry.
But TIS-620 is an official industrial standard in Thailand.
Also registered with IANA
http://www.iana.org/assignments/character-sets

Solaris (and its CDE) do has TIS-620 since version 7.
http://docs.sun.com/app/docs/doc/806-1360/6jalch36t?a=view
Comment 11 arthit 2005-04-30 22:12:40 UTC
may be we have to register this TIS-620 (along with other standard encodings
that currently not there) to xregistry@x.org
Comment 12 Stephan Bergmann 2005-05-02 08:37:32 UTC
sb->arthit:  Some clarification:

The patch in this issue was only about rtl_getTextEncodingFromUnixCharset, which
"obviously" (the documentation unfortunately is litlle more than a bad joke) is
about the final two segments of those long X11 font names with lots of hypens in
them (e.g., "...-iso8859-1", "...-tis620.2533-0").

Thus, IANA charset names are irrelevant here (however, OOo does know the MIME
character name "TIS-620", see rtl_getTextEncodingFromMimeCharset); and all the
X11 font names listed at
<http://docs.sun.com/app/docs/doc/806-1360/6jalch36t?a=view> indeed end in
"tis620.2533-0", which is now understood by rtl_getTextEncodingFromUnixCharset.
Comment 13 arthit 2005-05-02 08:52:04 UTC
arthit->sb:
Sorry. Now I got the point. Thank you :)
Comment 14 Stephan Bergmann 2005-06-02 12:27:56 UTC
verified
Comment 15 Stephan Bergmann 2005-07-11 09:33:15 UTC
close