Issue 111152 - i18npool: Indic graphic clusters/aksaras with virama
Summary: i18npool: Indic graphic clusters/aksaras with virama
Alias: None
Product: Internationalization
Classification: Code
Component: i18npool (show other issues)
Version: DEV300m77
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: stefan.baltzer
QA Contact: issues@l10n
Depends on:
Reported: 2010-04-26 11:40 UTC by caolanm
Modified: 2017-05-20 11:42 UTC (History)
2 users (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---

example .odt (14.78 KB, application/vnd.oasis.opendocument.text)
2010-04-26 11:40 UTC, caolanm
no flags Details
my take on this (4.63 KB, patch)
2010-04-26 11:41 UTC, caolanm
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description caolanm 2010-04-26 11:40:07 UTC
Attached is a sample .odt with various combining characters in Bengali. If you
cursor through the Bengali text I'm told by my Indic team here that they really
want each of these to be treated as one single graphic cluster by our character
break iterator rules.
Comment 1 caolanm 2010-04-26 11:40:57 UTC
Created attachment 69111 [details]
example .odt
Comment 2 caolanm 2010-04-26 11:41:23 UTC
Created attachment 69112 [details]
my take on this
Comment 3 caolanm 2010-04-26 11:43:45 UTC has...

"Grapheme clusters can be tailored to meet further requirements. Such tailoring
is permitted, but the possible rules are outside of the scope of this document.
One example of such a tailoring would be for the aksaras, or orthographic
syllables, used in many Indic scripts."

"Aksaras may also include one or more additional prefixed consonants, typically
with a virama (halant) character between each consonant in the sequence. Such
consonant cluster aksaras are not incorporated into the default rules"

My attempt here takes "consonant (virama consonant?)*" as a single cluster,
which appears to be meeting with approval so far anyway.
Comment 4 caolanm 2010-04-26 11:46:01 UTC
sample "Lohit Bengali" font at
Comment 5 ooo 2010-04-26 17:57:58 UTC
Well, if the Indic team says so..

@hdu: you might be interested in this.
Comment 6 erack 2010-06-11 00:51:55 UTC
In cws locales33a:

changeset f85d8583c2a6
M i18npool/source/breakiterator/data/char_in.txt

You can observe the progress and possible integration date of CWS locales33a at
Comment 7 ooo 2010-06-11 09:51:48 UTC
Reassigning to QA for verification.
Comment 8 stefan.baltzer 2010-06-14 16:54:32 UTC
Verified in CWS locales33a. 
The Bengali characters from bugdoc as taken as one character when travelling.