Apache OpenOffice (AOO) Bugzilla – Issue 111152
i18npool: Indic graphic clusters/aksaras with virama
Last modified: 2017-05-20 11:42:15 UTC
Attached is a sample .odt with various combining characters in Bengali. If you cursor through the Bengali text I'm told by my Indic team here that they really want each of these to be treated as one single graphic cluster by our character break iterator rules.
Created attachment 69111 [details] example .odt
Created attachment 69112 [details] my take on this
http://www.unicode.org/reports/tr29/ has... "Grapheme clusters can be tailored to meet further requirements. Such tailoring is permitted, but the possible rules are outside of the scope of this document. One example of such a tailoring would be for the aksaras, or orthographic syllables, used in many Indic scripts." "Aksaras may also include one or more additional prefixed consonants, typically with a virama (halant) character between each consonant in the sequence. Such consonant cluster aksaras are not incorporated into the default rules" My attempt here takes "consonant (virama consonant?)*" as a single cluster, which appears to be meeting with approval so far anyway.
sample "Lohit Bengali" font at http://www.openoffice.org/nonav/issues/showattachment.cgi/69113/Lohit-Bengali.ttf
Well, if the Indic team says so.. @hdu: you might be interested in this.
In cws locales33a: changeset f85d8583c2a6 http://hg.services.openoffice.org/cws/locales33a/changeset/f85d8583c2a6 M i18npool/source/breakiterator/data/char_in.txt You can observe the progress and possible integration date of CWS locales33a at http://tools.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Flocales33a
Reassigning to QA for verification.
Verified in CWS locales33a. The Bengali characters from bugdoc as taken as one character when travelling.