Apache OpenOffice (AOO) Bugzilla – Issue 115343
graphite: subtition rule rejected when "ZWSP" inserted beetwen syllable
Last modified: 2017-05-20 11:33:48 UTC
i have created the balinese script font, this is one of the rule WHICH ONLY REJECTED by OpenOffice.org. It doesn't rejected either by Worldpad (SIL Text editor) and Pango-Graphite : cConsAll gAdegAdeg ZeroWidthSpace cConsAll > @1 @2 @4 _; For example : U+1B2D (Consonant RA) + U+1B44 (Virama) + U+200B (ZWSP) + 1B2C (Consonant YA) It should be rendered like Picture1.jpeg. But in OpenOffice.org, it rendered like Picture2.jpeg (picture files and font attached).
Created attachment 72808 [details] "rya" rendered with Abiword (Pango)
Created attachment 72809 [details] "rya" rendered with OpenOffice.org ("ya" doesn't conjucted within "ra")
Created attachment 72810 [details] Balinese Script Unicode Font
Confirmed for WriterEngine but not for EditEngine. @od: Writer seems to treat the ZWSP as a formatting mark, which in itself is a good idea. But when this results in Writer splitting up the text into different portions so the layout engines don't get to see the real text with its context then there is trouble. In this example the text (U+1B2D U+1B44 U+200B U+1B2C) gets split into two distinct portions (U+1B2D U+1B44) and (U+1B2C) This shouldn't be done. The text belongs together. Highlighting the formatting mark is a good idea but this can and should be done even when the text remains undisturbed. @kstribley: IIRC the other layout engines also consider some of the string context before and after the actual requested substring to layout their glyphs. Doing this also for graphite would help to solve this problem. Even with graphite becoming more context aware WriterEngine should be fixed not to split text just because of marks that are important for formatting.
Created attachment 72818 [details] bugdoc WriterEngine vs. EditEngine
Created attachment 72819 [details] snapshot WriterEngine vs. EditEngine
Graphite does already consider the context after that requested in GraphiteLayout::CreateSegment() - see the EXTRA_CONTEXT_LENGTH which is used when CTL is enabled as it is in this case. Unfortunately, I think this example requires the context beforehand, which isn't currently passed in. I am rather surprised to see ZWSP in the middle of a cluster like this since ZWSP would normally indicate that you can insert a line break there. Is it really correct to allow a line break between the Virama and the U+1B2C Ya? If you do get a line break there, then does the modified U+1B2C glyph need to be rendered at the start of the new line taking into account the virama at the end of the previous line? If so, I think we are going to run into more problems.
the rule which i mentioned before is just in the first pass of conjunction process. the second rule which process the "ya" to be conjuncted form is (in the second pass) : cConsAll gAdegAdeg cConsAll > @1 cConsConj$3 _ ; on Balinese script, there is no real "space" so i decide to use ZWSP to distinguish each words. if the last word ended by dead consonant on last syllable, the first syllable in the next word will be conjuncted to the previous syllable. "ya" should be conjucted to "ra". whether it is on the last column of the line.
OK, I'm just surprised that the ZWSP here represents a word break. I really wonder whether these ZWSP are only present in that position because of an erroneous conversion from a non-Unicode font. In scripts that I'm familiar with using a virama (mainly Myanmar) you would have an invisible syllable break where you have placed a ZWSP but you would never have a word break or line break and so you would not insert a ZWSP in that position. If you have use a ZWSP like this, then you have to be sure that you want line breaks in that position in the middle of the conjunct. If you do get a line break, then the Consonant Ya will change its shape which seems unlikely to be desirable. i.e. you will get: [end of some line] U+1B2D U+1B44 U+200B U+1B2C [start of next line, rendered as normal, not as conjunct] I am starting to look at the next generation of Graphite "NG" and as part of this I'm experimenting with caching graphite results on a word by word basis to improve rendering speed. I'm not aware of any scripts which require context across a space boundary, though some do require start/end of line contextualisation. If Balinese does require contextualisation between words (as opposed to between syllables) then it will not be able to take advantage of such optimisations and I will need to add special code to detect cross-space contextualisation rules.
yes of course, it does :D but could you give me some advice for my fonts? maybe i can change the ZWSP with other kind of word separator :)
Sorry to keep questioning you, but I just want to make sure I understand the purpose of using ZWSP properly. What did the "yes of course" refer to? a) ZWSP=word break + possible line break or b) ZWSP=word break but no line break
if a) is the situation the ZWSP is a reasonable solution, but we need to fix OOo if b) then you might want to consider using a character like U+2060 WJ or U+200D ZWJ instead. WJ is probably better. I'm not sure off hand if writer considers WJ, ZWJ as formatting marks. If case b) is the situation, why do you want to mark up words anyway?
word separator is used for transliteration/romanisation process. it should be option a) possible line break but not in conjunction case.
So I would suggest that in the conjunction case you drop the ZWSP completely and improve your transliteration process so that it can handle that case without a ZWSP. If it really needs a marker then perhaps use WJ in the conjunction case and ZWSP between normal syllables. If I'm reading the source correctly, then it looks like OOo only uses SwControlCharPortion for 2 characters, those are ZWSP and WJ, so if you chose WJ then it will probably still have the same problem for rendering, though WJ is not painted with the subscript / as ZWSP is. sw/source/core/text/itrform2.cxx SwTxtFormatter::NewPortion sw/source/core/text/porrst.cxx SwControlCharPortion::Paint I hope to integrate GraphiteNG in OOo 3.4, so I'll try and make sure context is considered in front of the string as well as after it when I do that. However, relying on context outside the requested range is somewhat risky. It can cause characters to be rendered twice: once as part of the portion that OOo puts the character in; and, once as part of a ligature or attachment to a portion that excluded the character, but had the character available as context.
so the milestone for this bug is openoffice.org 3.4? maybe i can use real SPACE width zero-width for temporal purpose :D
Reset assigne to the default "issues@openoffice.apache.org".