Apache OpenOffice (AOO) Bugzilla – Issue 74188
inconsistency on positioning the cursor after deleting components of combined indic characters
Last modified: 2013-08-07 14:44:35 UTC
Take the attached document, you may need "Lohit Bengali" installed. Position the text cursor at the end of the text, press left to move the cursor one character to the left, press backspace. The cursor gets positioned inside the glyph. But if you repeat this procedure on the drawing engine equivalent inside the text box on the same page, the cursor will position to the left of the sequence, not inside the glyph. Also if in the main writer text after pressing backspace you simply click with the mouse on the gray area outside the page area then the cursor will re-position into the same location as where the drawing area cursor ends up automatically. cmc->fme: I took a little look at the DelLeft code but got very quickly confused. I'm guessing there is a simple little "force recalc/reposition" piece of code somewhere which clicking the mouse triggers which DelLeft on it's own doesn't trigger. The EditEngine del code is easier to understand and I can see the fairly simple functioning "where will I draw the cursor" code.
Created attachment 42744 [details] test document
fme->cmc: Cannot reproduce on windows. The text string consists of 5 characters, the 1 and 2 build a character cluster. Where is the cursor located after the left move and backspace? Can you attach a screenshot?
Created attachment 42746 [details] pic of results
Created attachment 42753 [details] patch to do this
This seems to do what I think I want to do, back to you for consideration. I'll leave patch unset as I'm not convinced it's the right thing to do :-)
FME->CMC: Installing the Lohit font, I see that there is no difference between edit engine and Writer. Comparing the results to Word, I see that the initial traveling left already results in a wrong position. The string consists of *two* glyphs. One move left should take us to the position indicated on the left side of your screen shot. In fact, the cursor is located inside the right glyph, see attachment. I think the root cause for this issue is some missing functionality in i18n, because the call to the break iterator in SwCntntNode::GoPrevious() returns 4. I'll assign this one to Karl. FME->Karl: Could you please have a look?
Created attachment 42757 [details] bugdoc, one step left starting at end of text
BreakIterator provides char/cell navigator, PreviousCharacter and NextCharacter. I copy 5 characters (glyphs) in the document and put them in StarBasic program to test breakiterator, Sub Main bk=createUnoService("com.sun.star.i18n.BreakIterator") dim l as new com.sun.star.lang.Locale l.Language="bn" s="ক্টরি" m=com.sun.star.i18n.CharacterIteratorMode.SKIPCELL npos=0 Do pos=npos print pos npos=bk.NextCharacters(s, pos, l, m, 1, n) Loop Until nPos = pos Do pos=nPos print pos npos=bk.PreviousCharacters(s, pos, l, m, 1, n) Loop Until nPos = pos End Sub I got result as 0, 2, 3, 4, 5 and 5, 4, 3, 2, 0. We don't have language specific breakiterator for Bengali, it falls back to ICU, which seems provides limited support for the language. It can take care first cell, composed by 2 glyphs, but not second one composed by 3 glyphs. We need language expert to provide patch for the breakiterator.
Created attachment 50695 [details] update patch to apply
What is the status of this issue? Do we need a change in the breakiterator and expert help? Or does the patch attached suffice, in which case it would be a Writer issue?
khong->fme, The patch is for writer module, could you take a look. My opinion is the fix should be in breakiterator rule, but I am not sure if the patch makes sense to you to fix it in writer. If you think we should not touch writer's code, re-assign the issue to submittor for the patch of breakiterator rule.
fme->khong: I agree with you, this should not be fixed in Writer code but in the respective break iterator rule.
I'm not sure of the logic of reassigning this to me. Though I appreciate the confidence shown in my abilities to figure out how to write a custom break iterator whatever that might be :-)
Yes, I guess Karl should better take care of this ;-)
Created attachment 65295 [details] update patch to apply against DEV300_m61
@tl: could you convert the Writer's code patch to a breakiterator rule?
tl->er: don't know. Need to take a closer look at the breakiterator. Last time I checked the ICU functionality I did not recognize any combined character specifics. But of course I was looking for something different at that time. tl: changing type to patch.
.
tl->cmc,er: Meanwhile this bug does not occur anymore. Probably this problem is now handled by ICU. Can you please check and confirm? If so we can just close this issue.
Forgot to mention: I checked it in CWS tl80 that is with DEV300_m74.
Created attachment 69113 [details] sample font
Created attachment 69114 [details] new sample document
Created attachment 69115 [details] difference between positioning
Created attachment 69116 [details] sample patch
I originally wrote this some time ago and didn't have the full story, but I've a better grasp of it now, and my old patch is totally bogus. So, first is attached a sample font for reference and second is attached a sample document for reference, and third is a sample picture for reference, and fourth is a new sample illustrative patch. There are two different issues going one here and they've gotten mixed up. The first one is the issue about where does the cluster begin or end, i.e. the break iterator rules and khongs comments. I'd like to spin that off as a separate issue. So see issue 111152 for that. (Note I've learned how to write breakiterator rules in the meantime :-)). Lets assume for demonstration purposes that the patch at issue 111152 is applied for the rest of this comment, but the remaining problem is independent of the specific break rules, and doesn't depend on 111152. The sample document has the string... 0995 09CD 099F 09B0 09BF Now (with the break iterator rules of 111152 in place) the break iterator gives 0, 3, 5 as the character break positions. In both the writer and draw engine, cursoring forwards and backwards through the sample text follows these rules correctly. i.e. first chunk is "0995 09CD 099F" second chunk is "09B0 09BF" Now in both the writer text and in the edit-engine text window, go to the end of the text, press left arrow, cursor is now like so... 0995 09CD 099F [LOGICAL CURSOR] 09B0 09BF press backspace (we currently use a model of "delete" removes the entire graphic cluster, while "backspace" chips away one character at a time). That leaves... 0995 09CD [LOGICAL CURSOR] 09B0 09BF What *writer* does to determine where to draw the visual cursor is to take all the text before the logical cursor i.e. "0995 09CD" and call GetDXArray on that sequence, and draws the cursor after where 09CD would appear in that string. What the "editengine" does is to take the full new string of "0995 09CD 09B0 09BF" and call GetDXArray on it, and then find the position of 09CD in those results and draws the cursor after that. "0995 09CD 09B0 09BF" is a single cluster (by issue 111152 rules). In either case the Lohit Bengali font attached assigns this sequence to one glyph. So there is a big difference between the position of the "09CD" glyph in "0995 09CD" vs where it appears in "0995 09CD 09B0 09BF". The output is that in writer the cursor is drawn basically in the middle of the glyph, and in the edit-engine it is drawn at the start of the glyph as seen in the screenshot. So that's the problem I'd like to address in this issue, the inconsistently between how the editengine and writer calculates the visual cursor when the logical cursor is between two characters that form a single glyph. I think we should prefer the writer mode here of calculating it based on taking the string before the logical cursor in isolation of the text following the cursor. The sample crude patch attached here illustrates what I'm trying to get at.
Ok, this one turns out to be a Linux-only, or at the very least NOT a Windows, problem. (Setting platform to Linux, did not check state with Solaris right now) Since with Windows everything is fine the solution should not be to look for a breakiterator fix to this issue. Thus I'm going to use the patch. tl->cmc: Thanks!
Created attachment 69128 [details] update to remove difference on windows
So the windows underlying implementation of GetDXArray is smarter than the built-in one used on other platforms and takes context into account. So to get the same results in editeng as sw does we need to do something like pass in the isolated previous text like sw ends up doing.
Created attachment 69129 [details] sigh, just the relevant bit this time, not the extra debugging spew
Patch applied in CWS tl80.
tl->sba: please verify. Thanks!
Verified in CWS tl80.
closing, integrated m83
Reopened, fix failed. See issue 112788 for the problems with the patch.
Setting target to 3.x.
Created attachment 70338 [details] don't use this either :-)
I came across i112788 while looking into i114222. I wonder if nTextPortionStart couldn't be used instead of computing nStartCalc. Also, I noticed that a different font was being used after the SvxFont aTmpFont( pParaPortion->GetNode()->GetCharAttribs().GetDefFont() ); from the one that was originally used for the portion in ImpEditEngine::CreateLines, though I haven't tried your latest patch.
I'm adding this comment to all open issues with Issue Type == PATCH. We have 220 such issues, many of them quite old. I apologize for that. We need your help in prioritizing which patches should be integrated into our next release, Apache OpenOffice 4.0. If you have submitted a patch and think it is applicable for AOO 4.0, please respond with a comment to let us know. On the other hand, if the patch is no longer relevant, please let us know that as well. If you have any general questions or want to discuss this further, please send a note to our dev mailing list: dev@openoffice.apache.org Thanks! -Rob