Issue 54739 - Need Thai support in IndexEntrySupplier
Summary: Need Thai support in IndexEntrySupplier
Alias: None
Product: Internationalization
Classification: Code
Component: code (show other issues)
Version: OOO 2.0 Beta2
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: stefan.baltzer
QA Contact: issues@l10n
Depends on:
Blocks: 41707
  Show dependency tree
Reported: 2005-09-18 04:47 UTC by jjc
Modified: 2013-08-07 15:01 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---

Alphabetic index sorted incorrectly (7.13 KB, application/vnd.sun.xml.writer)
2005-09-18 04:53 UTC, jjc
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description jjc 2005-09-18 04:47:07 UTC
For Thai, when extracting the initial letter of an index entry, it is not
sufficient to take the first letter: leading vowels must be skipped. For
example, given an index entry of เจมส์, the initial letter should be จ not เ. 
This ensures that all entries with the same initial letter will be adjacent in
the sort order.
Comment 1 jjc 2005-09-18 04:53:12 UTC
Created attachment 29633 [details]
Alphabetic index sorted incorrectly
Comment 2 falko.tesch 2005-10-20 16:55:23 UTC
Comment 3 falko.tesch 2005-10-20 20:04:21 UTC
FT: Please take over. Thx a lot.
Comment 4 falko.tesch 2005-10-20 20:07:57 UTC
FT: Please take over. Thx a lot.
Comment 5 karl.hong 2006-03-21 23:05:18 UTC
I extend indexKey string format, you can specify a list of initial chars you
want to skip in square bracket,

<IndexKey unoid="alphanumeric" default="true" phonetic="false">ก-ฮ[ฯ]</IndexKey>

Comment 6 karl.hong 2006-03-22 00:02:49 UTC
I have changed IndexKey field in th_TH.xml as,

<IndexKey unoid="alphanumeric" default="true" phonetic="false">ก-ฮ[เ-ไ]</IndexKey>

which contains 5 leading vowels as skipping characters.
Comment 7 samphan 2006-03-22 03:35:45 UTC
No. This is not the way to handle that.
I think James's description of the problem may not be accurate. What you need to
correctly make index with Thai words is to use Thai collation order, not just
ignore the first initial vowel. The algorithm is a bit more complex than that
(swapping of initial vowel and multi-level weight) but all is specified in the
UCA and implemented in ICU. You can just call ICU, e.g. Calc sort Thai text
Comment 8 karl.hong 2006-03-22 07:35:07 UTC
We do use ICU collator to sort index entry. IndexKey field in locale data is to
generate index key from index entry. For example,

index entry ==> index key

About ==> A
Cat ==> C
clear == > C

Collator does not generate index key, but sort index entry. You have to tell me
how to generate index key from index entry.

For the attached example, index section is as below,

Alphabetical Index
จอย	1
สมชัย	1
เจมส์	1
เสรี	1

After applying the fix, it becomes,

Alphabetical Index
จอย	1
เจมส์	1
สมชัย	1
เสรี	1

Does it make sense? 
Comment 9 jjc 2006-03-22 09:13:44 UTC
I don't understand why you need to explicitly list the skipping characters. 
Can't you just search for the first occurrence of ก-ฮ?
Comment 10 samphan 2006-03-22 09:30:49 UTC
khong <- I've just understand. Yes, your fix should work correctly.
Comment 11 karl.hong 2006-03-22 17:58:47 UTC
james, if I write a Thai specific implementation, I can search first occurrence
of what list in IndexKey field. But I would like to implement a language neutral
algorithm, your suggestion would not work for other languages, like English,
listed IndexKey A-Z, I could not skip lower case 'a'. 
Comment 12 karl.hong 2006-03-25 02:04:06 UTC
fixed in cws i18n25
Comment 13 karl.hong 2006-03-25 02:05:45 UTC
Ready for QA.

re-open issue and reassign to
Comment 14 karl.hong 2006-03-25 02:05:51 UTC
reassign to
Comment 15 karl.hong 2006-03-25 02:05:59 UTC
reset resolution to FIXED
Comment 16 stefan.baltzer 2006-04-05 09:07:33 UTC
SBA: Verified in CWS i18n25
Comment 17 stefan.baltzer 2006-06-12 16:09:04 UTC
SBA: OK in 680m5. Closed.