Issue 115343 - graphite: subtition rule rejected when "ZWSP" inserted beetwen syllable
Summary: graphite: subtition rule rejected when "ZWSP" inserted beetwen syllable
Status: CONFIRMED
Alias: None
Product: gsl
Classification: Code
Component: code (show other issues)
Version: OOO330m13
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-02 00:41 UTC by initrunlevel0
Modified: 2017-05-20 11:33 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
"rya" rendered with Abiword (Pango) (3.85 KB, image/jpeg)
2010-11-02 00:42 UTC, initrunlevel0
no flags Details
"rya" rendered with OpenOffice.org ("ya" doesn't conjucted within "ra") (4.66 KB, image/jpeg)
2010-11-02 00:43 UTC, initrunlevel0
no flags Details
Balinese Script Unicode Font (56.45 KB, application/octet-stream)
2010-11-02 00:44 UTC, initrunlevel0
no flags Details
bugdoc WriterEngine vs. EditEngine (11.33 KB, application/vnd.oasis.opendocument.text)
2010-11-02 11:06 UTC, hdu@apache.org
no flags Details
snapshot WriterEngine vs. EditEngine (4.35 KB, image/png)
2010-11-02 11:07 UTC, hdu@apache.org
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description initrunlevel0 2010-11-02 00:41:39 UTC
i have created the balinese script font, this is one of the rule WHICH ONLY
REJECTED by OpenOffice.org. It doesn't rejected either by Worldpad (SIL Text
editor) and Pango-Graphite :

cConsAll gAdegAdeg ZeroWidthSpace cConsAll > @1 @2 @4 _;

For example :

U+1B2D (Consonant RA) + U+1B44 (Virama) + U+200B (ZWSP) + 1B2C (Consonant YA)

It should be rendered like Picture1.jpeg. But in OpenOffice.org, it rendered
like Picture2.jpeg (picture files and font attached).
Comment 1 initrunlevel0 2010-11-02 00:42:37 UTC
Created attachment 72808 [details]
"rya" rendered with Abiword (Pango)
Comment 2 initrunlevel0 2010-11-02 00:43:35 UTC
Created attachment 72809 [details]
"rya" rendered with OpenOffice.org ("ya" doesn't conjucted within "ra")
Comment 3 initrunlevel0 2010-11-02 00:44:42 UTC
Created attachment 72810 [details]
Balinese Script Unicode Font
Comment 4 hdu@apache.org 2010-11-02 11:02:01 UTC
Confirmed for WriterEngine but not for EditEngine.
@od: Writer seems to treat the ZWSP as a formatting mark, which in itself is a good idea. But when this 
results in Writer splitting up the text into different portions so the layout engines don't get to see the 
real text with its context then there is trouble. In this example the text
  (U+1B2D  U+1B44 U+200B U+1B2C)
gets split into two distinct portions
  (U+1B2D  U+1B44)  and (U+1B2C)
This shouldn't be done. The text belongs together. Highlighting the formatting mark is a good idea but 
this can and should be done even when the text remains undisturbed.

@kstribley: IIRC the other layout engines also consider some of the string context before and after the 
actual requested substring to layout their glyphs. Doing this also for graphite would help to solve this 
problem.

Even with graphite becoming more context aware WriterEngine should be fixed not to split text just 
because of marks that are important for formatting.
Comment 5 hdu@apache.org 2010-11-02 11:06:54 UTC
Created attachment 72818 [details]
bugdoc WriterEngine vs. EditEngine
Comment 6 hdu@apache.org 2010-11-02 11:07:26 UTC
Created attachment 72819 [details]
snapshot WriterEngine vs. EditEngine
Comment 7 devel 2010-11-02 16:03:27 UTC
Graphite does already consider the context after that requested in
GraphiteLayout::CreateSegment() - see the EXTRA_CONTEXT_LENGTH which is used
when CTL is enabled as it is in this case. Unfortunately, I think this example
requires the context beforehand, which isn't currently passed in.

I am rather surprised to see ZWSP in the middle of a cluster like this since
ZWSP would normally indicate that you can insert a line break there. Is it
really correct to allow a line break between the Virama and the U+1B2C Ya? If
you do get a line break there, then does the modified U+1B2C glyph need to be
rendered at the start of the new line taking into account the virama at the end
of the previous line? If so, I think we are going to run into more problems.
Comment 8 initrunlevel0 2010-11-03 02:04:45 UTC
the rule which i mentioned before is just in the first pass of conjunction 
process. the second rule which process the "ya" to be conjuncted form is (in the 
second pass) :

cConsAll gAdegAdeg cConsAll > @1 cConsConj$3 _ ;

on Balinese script, there is no real "space" so i decide to use ZWSP to 
distinguish each words. if the last word ended by dead consonant on last 
syllable, the first syllable in the next word will be conjuncted to the previous 
syllable.

"ya" should be conjucted to "ra". whether it is on the last column of the line.
Comment 9 devel 2010-11-03 07:05:48 UTC
OK, I'm just surprised that the ZWSP here represents a word break. I really
wonder whether these ZWSP are only present in that position because of an
erroneous conversion from a non-Unicode font. In scripts that I'm familiar with
using a virama (mainly Myanmar) you would have an invisible syllable break where
you have placed a ZWSP but you would never have a word break or line break and
so you would not insert a ZWSP in that position. 

If you have use a ZWSP like this, then you have to be sure that you want line
breaks in that position in the middle of the conjunct. If you do get a line
break, then the Consonant Ya will change its shape which seems unlikely to be
desirable. i.e. you will get:

[end of some line] U+1B2D U+1B44 U+200B
U+1B2C [start of next line, rendered as normal, not as conjunct]

I am starting to look at the next generation of Graphite "NG" and as part of
this I'm experimenting with caching graphite results on a word by word basis to
improve rendering speed. I'm not aware of any scripts which require context
across a space boundary, though some do require start/end of line
contextualisation. If Balinese does require contextualisation between words (as
opposed to between syllables) then it will not be able to take advantage of such
optimisations and I will need to add special code to detect cross-space
contextualisation rules.
Comment 10 initrunlevel0 2010-11-03 13:20:47 UTC
yes of course, it does :D
but could you give me some advice for my fonts? maybe i can change the ZWSP with 
other kind of word separator :)
Comment 11 devel 2010-11-03 14:46:37 UTC
Sorry to keep questioning you, but I just want to make sure I understand the
purpose of using ZWSP properly. What did the "yes of course" refer to?
   a) ZWSP=word break + possible line break
or b) ZWSP=word break but no line break
Comment 12 initrunlevel0 2010-11-03 15:14:36 UTC
yes of course, it does :D
but could you give me some advice for my fonts? maybe i can change the ZWSP with 
other kind of word separator :)
Comment 13 devel 2010-11-03 17:10:41 UTC
if a) is the situation the ZWSP is a reasonable solution, but we need to fix OOo
if b) then you might want to consider using a character like U+2060 WJ or U+200D
ZWJ instead. WJ is probably better.
I'm not sure off hand if writer considers WJ, ZWJ as formatting marks.

If case b) is the situation, why do you want to mark up words anyway?
Comment 14 initrunlevel0 2010-11-04 03:21:22 UTC
word separator is used for transliteration/romanisation process. it should be 
option a) possible line break but not in conjunction case.
Comment 15 devel 2010-11-04 07:17:50 UTC
So I would suggest that in the conjunction case you drop the ZWSP completely and
improve your transliteration process so that it can handle that case without a
ZWSP. If it really needs a marker then perhaps use WJ in the conjunction case
and ZWSP between normal syllables.

If I'm reading the source correctly, then it looks like OOo only uses
SwControlCharPortion for 2 characters, those are ZWSP and WJ, so if you chose WJ
then it will probably still have the same problem for rendering, though WJ is
not painted with the subscript / as ZWSP is. 
sw/source/core/text/itrform2.cxx SwTxtFormatter::NewPortion
sw/source/core/text/porrst.cxx SwControlCharPortion::Paint

I hope to integrate GraphiteNG in OOo 3.4, so I'll try and make sure context is
considered in front of the string as well as after it when I do that. However,
relying on context outside the requested range is somewhat risky. It can cause
characters to be rendered twice: once as part of the portion that OOo puts the
character in; and, once as part of a ligature or attachment to a portion that
excluded the character, but had the character available as context.
Comment 16 initrunlevel0 2010-11-04 09:02:47 UTC
so the milestone for this bug is openoffice.org 3.4?
maybe i can use real SPACE width zero-width for temporal purpose :D
Comment 17 Marcus 2017-05-20 11:33:48 UTC
Reset assigne to the default "issues@openoffice.apache.org".