Issue 50172 - combining characters in indic and keyboard traversal
Summary: combining characters in indic and keyboard traversal
Status: CLOSED FIXED
Alias: None
Product: gsl
Classification: Code
Component: code (show other issues)
Version: 680m104
Hardware: All Linux, all
: P3 Trivial (vote)
Target Milestone: OOo 2.0.1
Assignee: eric.savary
QA Contact: issues@gsl
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-01 13:00 UTC by caolanm
Modified: 2005-10-20 13:00 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
example tamil document (5.52 KB, application/vnd.oasis.opendocument.text)
2005-06-01 13:01 UTC, caolanm
no flags Details
sample tamil font (69.61 KB, application/octet-stream)
2005-06-01 13:02 UTC, caolanm
no flags Details
a demo in 1.1.4 format (5.38 KB, application/vnd.sun.xml.calc)
2005-06-01 13:30 UTC, caolanm
no flags Details
a simple standalone testcase for icu (3.95 KB, text/plain)
2005-06-03 11:16 UTC, caolanm
no flags Details
build script (249 bytes, text/plain)
2005-06-03 11:17 UTC, caolanm
no flags Details
patch (1.33 KB, patch)
2005-06-03 13:49 UTC, caolanm
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description caolanm 2005-06-01 13:00:11 UTC
As https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=157815 describes, and
the following example shows, traversing combining characters in indic is problematic
Comment 1 caolanm 2005-06-01 13:01:22 UTC
Created attachment 26788 [details]
example tamil document
Comment 2 caolanm 2005-06-01 13:02:01 UTC
Created attachment 26789 [details]
sample tamil font
Comment 3 caolanm 2005-06-01 13:30:39 UTC
Created attachment 26791 [details]
a demo in 1.1.4 format
Comment 4 caolanm 2005-06-01 13:31:58 UTC
cmc->fme: know anything about this sort of combining character and keyboard
traversal ? *Seems* to work in 1.1.4 where on opening the .sxc three presses of
"->" takes us from left to right of full sentence, while in 1.9.106 it takes six
Comment 5 caolanm 2005-06-01 19:29:03 UTC
yeah, works in the stock 1.1.4 from openoffice.org and not in a stock 1.9.106.
Comment 6 caolanm 2005-06-03 11:16:48 UTC
Created attachment 26856 [details]
a simple standalone testcase for icu
Comment 7 caolanm 2005-06-03 11:17:23 UTC
Created attachment 26857 [details]
build script
Comment 8 caolanm 2005-06-03 11:19:46 UTC
1.9.106 output (icu 2.6) is...

 Character Boundaries...
----- forward: -----------
 0 1 |AACD|
 1 3 |AACD|
 3 4 |AACD|
 4 5 |AACD|
 5 6 |AACD|
 6 7 |AACD|

while
1.1.4 (icu 2.2) output is...

 Character Boundaries...
----- forward: -----------
 0 1 |AACD|
 1 3 |AACD|
 3 4 |AACD|
 4 6 |AACD|
 6 7 |AACD|

i.e. 2.6 calls it 6 logical characters, while 2.2. calls it 5 characters
Comment 9 caolanm 2005-06-03 11:32:28 UTC
http://www.jtcsv.com/cgibin/icu-bugs?findid=1587 might be relevent
Comment 10 caolanm 2005-06-03 13:49:32 UTC
Created attachment 26860 [details]
patch
Comment 11 caolanm 2005-06-03 13:51:03 UTC
Well that patch reverts the behaviour to 1.1.X, but the current
http://www.unicode.org/reports/tr29/ says that The Grapheme_Cluster_Break
property values are defined in
http://www.unicode.org/Public/UNIDATA/auxiliary/GraphemeBreakProperty.txt and
that list does not list the tamil vowel signs, but the older
http://www.unicode.org/Public/3.2-Update/DerivedCoreProperties-3.2.0.txt did. So
*apparently* icu is following the spec. Unless "Boundaries may be further
tailored for requirements of different languages, such as the addition of “châ€
for Slovak, or Indic, Thai or Tibetan character clusters." implies that it can
be extended to give the patch behaviour. 

Dunno really.
Comment 12 karl.hong 2005-08-27 00:39:18 UTC
I have create a local charactor breakiterator rule in i18npool for Tamil and
applied the patch to the rule.
Comment 13 karl.hong 2005-08-31 21:30:03 UTC
ready for QA.

re-open issue and reassign to oc@openoffice.org
Comment 14 karl.hong 2005-08-31 21:30:08 UTC
reassign to oc@openoffice.org
Comment 15 karl.hong 2005-08-31 21:30:13 UTC
reset resolution to FIXED
Comment 16 karl.hong 2005-09-13 02:29:14 UTC
Ready for QA.

re-open issue and reassign to oc@openoffice.org
Comment 17 karl.hong 2005-09-13 02:29:38 UTC
reassign to oc@openoffice.org
Comment 18 karl.hong 2005-09-13 02:29:57 UTC
reset resolution to FIXED
Comment 19 oc 2005-09-26 15:58:18 UTC
Hi Eric, please take over

re-open issue and reassign to es@openoffice.org
Comment 20 oc 2005-09-26 15:58:26 UTC
reassign to es@openoffice.org
Comment 21 oc 2005-09-26 15:58:37 UTC
reset resolution to FIXED
Comment 22 eric.savary 2005-10-14 12:47:13 UTC
Verified in CWS i18n20
Comment 23 eric.savary 2005-10-20 13:00:22 UTC
Ok in src680m135