Issue 74188

Summary: inconsistency on positioning the cursor after deleting components of combined indic characters
Product: Writer Reporter: caolanm
Component: codeAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: frank.meies, issues, ooo
Version: OOo 2.2   
Target Milestone: ---   
Hardware: All   
OS: Linux, all   
Issue Type: PATCH Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
test document
none
pic of results
none
patch to do this
none
bugdoc, one step left starting at end of text
none
update patch to apply
none
update patch to apply against DEV300_m61
none
sample font
none
new sample document
none
difference between positioning
none
sample patch
none
update to remove difference on windows
none
sigh, just the relevant bit this time, not the extra debugging spew
none
don't use this either :-) none

Description caolanm 2007-02-05 10:10:54 UTC
Take the attached document, you may need "Lohit Bengali" installed. Position the
text cursor at the end of the text, press left to move the cursor one character
to the left, press backspace. The cursor gets positioned inside the glyph. 

But if you repeat this procedure on the drawing engine equivalent inside the
text box on the same page, the cursor will position to the left of the sequence,
not inside the glyph. Also if in the main writer text after pressing backspace
you simply click with the mouse on the gray area outside the page area then the
cursor will re-position into the same location as where the drawing area cursor
ends up automatically.

cmc->fme: I took a little look at the DelLeft code but got very quickly
confused. I'm guessing there is a simple little "force recalc/reposition" piece
of code somewhere which clicking the mouse triggers which DelLeft on it's own
doesn't trigger. The EditEngine del code is easier to understand and I can see
the fairly simple functioning "where will I draw the cursor" code.
Comment 1 caolanm 2007-02-05 10:11:23 UTC
Created attachment 42744 [details]
test document
Comment 2 frank.meies 2007-02-05 10:58:39 UTC
fme->cmc: Cannot reproduce on windows. The text string consists of 5 characters,
the 1 and 2 build a character cluster. Where is the cursor located after the
left move and backspace? Can you attach a screenshot?
Comment 3 caolanm 2007-02-05 12:20:20 UTC
Created attachment 42746 [details]
pic of results
Comment 4 caolanm 2007-02-05 13:44:31 UTC
Created attachment 42753 [details]
patch to do this
Comment 5 caolanm 2007-02-05 13:45:40 UTC
This seems to do what I think I want to do, back to you for consideration. I'll
leave patch unset as I'm not convinced it's the right thing to do :-)
Comment 6 frank.meies 2007-02-05 15:08:12 UTC
FME->CMC: Installing the Lohit font, I see that there is no difference between
edit engine and Writer. Comparing the results to Word, I see that the initial
traveling left already results in a wrong position. The string consists of *two*
glyphs. One move left should take us to the position indicated on the left side
of your screen shot. In fact, the cursor is located inside the right glyph, see
attachment. I think the root cause for this issue is some missing functionality
in i18n, because the call to the break iterator in SwCntntNode::GoPrevious()
returns 4. I'll assign this one to Karl.

FME->Karl: Could you please have a look?
Comment 7 frank.meies 2007-02-05 15:09:29 UTC
Created attachment 42757 [details]
bugdoc, one step left starting at end of text
Comment 8 karl.hong 2007-02-06 00:01:38 UTC
BreakIterator provides char/cell navigator, PreviousCharacter and NextCharacter.
I copy 5 characters (glyphs) in the document and put them in StarBasic program
to test breakiterator,

Sub Main
bk=createUnoService("com.sun.star.i18n.BreakIterator")
dim l as new com.sun.star.lang.Locale
l.Language="bn"
s="ক্টরি"
m=com.sun.star.i18n.CharacterIteratorMode.SKIPCELL

npos=0
Do
pos=npos
print pos
npos=bk.NextCharacters(s, pos, l, m, 1, n)
Loop Until nPos = pos

Do
pos=nPos
print pos
npos=bk.PreviousCharacters(s, pos, l, m, 1, n)
Loop Until nPos = pos

End Sub

I got result as 0, 2, 3, 4, 5 and 5, 4, 3, 2, 0. 

We don't have language specific breakiterator for Bengali, it falls back to ICU,
which seems provides limited support for the language. It can take care first
cell, composed by 2 glyphs, but not second one composed by 3 glyphs. 

We need language expert to provide patch for the breakiterator.
Comment 9 caolanm 2008-01-07 08:02:28 UTC
Created attachment 50695 [details]
update patch to apply
Comment 10 ooo 2008-06-10 16:57:50 UTC
What is the status of this issue? Do we need a change in the breakiterator and
expert help? Or does the patch attached suffice, in which case it would be a
Writer issue?
Comment 11 karl.hong 2008-06-17 17:54:55 UTC
khong->fme, The patch is for writer module, could you take a look. 

My opinion is the fix should be in breakiterator rule, but I am not sure if the
patch makes sense to you to fix it in writer. If you think we should not touch
writer's code, re-assign the issue to submittor for the patch of breakiterator rule.
Comment 12 frank.meies 2008-06-18 06:01:33 UTC
fme->khong: I agree with you, this should not be fixed in Writer code but in the
respective break iterator rule.
Comment 13 caolanm 2008-06-18 08:54:45 UTC
I'm not sure of the logic of reassigning this to me. Though I appreciate the
confidence shown in my abilities to figure out how to write a custom break
iterator whatever that might be :-)
Comment 14 frank.meies 2008-06-18 09:03:03 UTC
Yes, I guess Karl should better take care of this ;-)
Comment 15 caolanm 2009-10-11 12:42:06 UTC
Created attachment 65295 [details]
update patch to apply against DEV300_m61
Comment 16 ooo 2009-10-12 12:33:07 UTC
@tl: could you convert the Writer's code patch to a breakiterator rule?
Comment 17 thomas.lange 2009-10-12 12:52:55 UTC
tl->er: don't know. Need to take a closer look at the breakiterator. Last time I
checked the ICU functionality I did not recognize any combined character
specifics. But of course I was looking for something different at that time.

tl: changing type to patch.
Comment 18 thomas.lange 2010-04-08 13:09:32 UTC
.
Comment 19 thomas.lange 2010-04-23 10:03:41 UTC
tl->cmc,er: Meanwhile this bug does not occur anymore. Probably this problem is
now handled by ICU. Can you please check and confirm? If so we can just close
this issue.
Comment 20 thomas.lange 2010-04-23 10:04:54 UTC
Forgot to mention: I checked it in CWS tl80 that is with DEV300_m74.
Comment 21 caolanm 2010-04-26 11:45:41 UTC
Created attachment 69113 [details]
sample font
Comment 22 caolanm 2010-04-26 11:46:26 UTC
Created attachment 69114 [details]
new sample document
Comment 23 caolanm 2010-04-26 11:49:15 UTC
Created attachment 69115 [details]
difference between positioning
Comment 24 caolanm 2010-04-26 11:49:48 UTC
Created attachment 69116 [details]
sample patch
Comment 25 caolanm 2010-04-26 11:52:47 UTC
I originally wrote this some time ago and didn't have the full story, but I've a
better grasp of it now, and my old patch is totally bogus.

So, first is attached a sample font for reference and second is attached a
sample document for reference, and third is a sample picture for reference, and
fourth is a new sample illustrative patch.

There are two different issues going one here and they've gotten mixed up.

The first one is the issue about where does the cluster begin or end, i.e. the
break iterator rules and khongs comments. I'd like to spin that off as a
separate issue. So see issue 111152 for that. (Note I've learned how to write
breakiterator rules in the meantime :-)). Lets assume for demonstration purposes
that the patch at issue 111152 is applied for the rest of this comment, but the
remaining problem is independent of the specific break rules, and doesn't depend
on 111152.

The sample document has the string... 

0995 09CD 099F 09B0 09BF

Now (with the break iterator rules of 111152 in place) the break iterator gives
0, 3, 5 as the character break positions.

In both the writer and draw engine, cursoring forwards and backwards through the
sample text follows these rules correctly.

i.e.
first chunk is "0995 09CD 099F"
second chunk is "09B0 09BF"

Now in both the writer text and in the edit-engine text window, go to the end of
the text, press left arrow, cursor is now like so...
0995 09CD 099F [LOGICAL CURSOR] 09B0 09BF
press backspace (we currently use a model of "delete" removes the entire graphic
cluster, while "backspace" chips away one character at a time).
That leaves...

0995 09CD [LOGICAL CURSOR] 09B0 09BF

What *writer* does to determine where to draw the visual cursor is to take all
the text before the logical cursor i.e. "0995 09CD" and call GetDXArray on that
sequence, and draws the cursor after where 09CD would appear in that string.

What the "editengine" does is to take the full new string of "0995 09CD 09B0
09BF" and call GetDXArray on it, and then find the position of 09CD in those
results and draws the cursor after that. 

"0995 09CD 09B0 09BF" is a single cluster (by issue 111152 rules). In either
case the Lohit Bengali font attached assigns this sequence to one glyph. So
there is a big difference between the position of the "09CD" glyph in "0995
09CD" vs where it appears in "0995 09CD 09B0 09BF". The output is that in 
writer the cursor is drawn basically in the middle of the glyph, and in the
edit-engine it is drawn at the start of the glyph as seen in the screenshot.

So that's the problem I'd like to address in this issue, the inconsistently
between how the editengine and writer calculates the visual cursor when the
logical cursor is between two characters that form a single glyph. I think we
should prefer the writer mode here of calculating it based on taking the string
before the logical cursor in isolation of the text following the cursor. The
sample crude patch attached here illustrates what I'm trying to get at.
Comment 26 thomas.lange 2010-04-26 12:22:35 UTC
Ok, this one turns out to be a Linux-only, or at the very least NOT a Windows,
problem. (Setting platform to Linux, did not check state with Solaris right now)

Since with Windows everything is fine the solution should not be to look for a
breakiterator fix to this issue. Thus I'm going to use the patch.

tl->cmc: Thanks!

Comment 27 caolanm 2010-04-26 15:09:46 UTC
Created attachment 69128 [details]
update to remove difference on windows
Comment 28 caolanm 2010-04-26 15:15:16 UTC
So the windows underlying implementation of GetDXArray is smarter than the
built-in one used on other platforms and takes context into account. So to get
the same results in editeng as sw does we need to do something like pass in the
isolated previous text like sw ends up doing.
Comment 29 caolanm 2010-04-26 15:17:14 UTC
Created attachment 69129 [details]
sigh, just the relevant bit this time, not the extra debugging spew
Comment 30 thomas.lange 2010-04-27 08:28:19 UTC
Patch applied in CWS tl80.
Comment 31 thomas.lange 2010-05-14 06:24:57 UTC
tl->sba: please verify. Thanks!
Comment 32 stefan.baltzer 2010-06-11 14:33:07 UTC
Verified in CWS tl80.
Comment 33 caolanm 2010-06-17 20:48:50 UTC
closing, integrated m83
Comment 34 thomas.lange 2010-07-01 13:52:17 UTC
Reopened, fix failed. See issue 112788 for the problems with the patch.
Comment 35 thomas.lange 2010-07-01 13:53:12 UTC
Setting target to 3.x.
Comment 36 caolanm 2010-07-01 14:08:17 UTC
Created attachment 70338 [details]
don't use this either :-)
Comment 37 devel 2010-09-06 11:22:46 UTC
I came across i112788 while looking into i114222. I wonder if nTextPortionStart
couldn't be used instead of computing nStartCalc. Also, I noticed that a
different font was being used after the 

SvxFont aTmpFont( pParaPortion->GetNode()->GetCharAttribs().GetDefFont() );

from the one that was originally used for the portion in
ImpEditEngine::CreateLines, though I haven't tried your latest patch.
Comment 38 Rob Weir 2013-03-11 15:03:27 UTC
I'm adding this comment to all open issues with Issue Type == PATCH.  We have 220 such issues, many of them quite old.  I apologize for that.  

We need your help in prioritizing which patches should be integrated into our next release, Apache OpenOffice 4.0.

If you have submitted a patch and think it is applicable for AOO 4.0, please respond with a comment to let us know.

On the other hand, if the patch is no longer relevant, please let us know that as well.

If you have any general questions or want to discuss this further, please send a note to our dev mailing list:  dev@openoffice.apache.org

Thanks!

-Rob