Issue 105571

Summary: Surrogate Pair Character selection handling can lose characters while formatting
Product: Writer Reporter: jmaguro
Component: editingAssignee: stefan.baltzer
Status: CLOSED FIXED QA Contact: issues@sw <issues>
Severity: Trivial    
Priority: P2 CC: hdu, issues, maho.nakata, nesshof, shinji.enoki, stefan.baltzer
Version: OOo 3.1.1   
Target Milestone: ---   
Hardware: PC   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 41792, 78162, 102943, 99999    
Attachments:
Description Flags
SurrogatePair.odt
none
SurrogatePair_2.png
none
SurrogatePair_3.png
none
SurrogatePair_4.png
none
SurrogatePair_after.odt none

Description jmaguro 2009-10-03 22:40:01 UTC
1.Open "SurrogatePair.odt".
2.Select center charactor.
3.Ctrl + B(Bold)
  (screenshot attached as "SurrogatePair_2.png")
4.Character representation will abnormal.
  (screenshot attached as "SurrogatePair_3.png")
5.Save and close the document.
6.Reopen the document.
7.Center charactor is lost.
  (screenshot attached as "SurrogatePair_4.png")
Comment 1 jmaguro 2009-10-03 22:40:57 UTC
Created attachment 65108 [details]
SurrogatePair.odt
Comment 2 jmaguro 2009-10-03 22:41:36 UTC
Created attachment 65109 [details]
SurrogatePair_2.png
Comment 3 jmaguro 2009-10-03 22:42:00 UTC
Created attachment 65110 [details]
SurrogatePair_3.png
Comment 4 jmaguro 2009-10-03 22:42:23 UTC
Created attachment 65111 [details]
SurrogatePair_4.png
Comment 5 jmaguro 2009-10-03 22:47:55 UTC
Created attachment 65112 [details]
SurrogatePair_after.odt
Comment 6 hdu@apache.org 2009-10-13 16:11:45 UTC
Indeed, Writer really loses the second character. It is no longer in the content.xml. This might be related 
to issue 78162, where Writer doesn't treat surrogate pairs as a unit.
Comment 7 Martin Hollmichel 2009-10-13 16:27:38 UTC
data loss issue ?
Comment 8 Oliver-Rainer Wittmann 2009-10-14 07:50:38 UTC
First investigation with English OOo version and without the correct font
reveals that the loss of the character which has been formatted as Bold and is
part of a surrogate pair occurs at least since OOo 2.0.1

OD->MH: Yes, this is a data loss issue in my opinion.
Comment 9 Martin Hollmichel 2009-10-14 08:51:45 UTC
set target 3.2 because of data loss
Comment 10 Oliver-Rainer Wittmann 2009-10-14 10:54:45 UTC
Deeper investigation reveals that the following:
- If the selection is made via "cursor-traveling":
(a) open the attached document - cursor is at the beginning of the document
(b) move cursor via key "Right" in front of the center character.
(c) hold key "Shift" and select center character via key "Right"
--> center character selected.
(d) click "Bold" button in the toolbar or hit keys Ctrl + B
--> Everything is fine, even after save-and-load cycle

- If the selection is made via double-click with mouse:
(a) open the attached document - cursor is at the beginning of the document
(b) move mouse pointer over center character
(c) perform double-click with mouse
--> center character selected.
(d) click "Bold" button in the toolbar or hit keys Ctrl + B
--> Everything is fine, even after save-and-load cycle

- If the selection is made via "mouse-movement"
(a) open the attached document - cursor is at the beginning of the document
(b) move mouse pointer in area between first character and center character
(c) click mouse button and hold it
(d) move mouse pointer in area between center character and third character
--> center character selected.
(e) click "Bold" button in the toolbar or hit keys Ctrl + B
--> Described defect occurs.

Thus, workaround until this issue is fixed:
To select to be formatted surrogate pair character use cursor keys or
double-click on mouse
Comment 11 stefan.baltzer 2009-10-14 13:34:28 UTC
Adjsuted summary to reflect the findings. Put myself on CC.
Comment 12 Oliver-Rainer Wittmann 2009-10-14 13:55:29 UTC
fixed in cws oooimprovement5 - changed file:
/sw/source/core/txtnode/fntcache.cxx, rev. 276898
Comment 13 hdu@apache.org 2009-10-14 14:11:51 UTC
> fntcache.cxx, rev. 276898

Looking at the diff it seems that the proper break iterator in that case is now used; previously it was only 
triggered for CTL-scripts now it has also handle CJK-scripts. This is good but not good enough since 
surrogate pairs can happen regardless of script type (e.g. "Gothic" is considered a Roman-script but it has 
codepoints beyond the baseplane U+10330..U+1034A). I suggest to get rid of the script-type test 
altogether and always use the proper break iterator.
Comment 14 Oliver-Rainer Wittmann 2009-10-14 14:31:06 UTC
HDU, You are right, but due to the fact that this fix is a show stopper fix I
decided the following:
- Provide a fix for this issue and assure that it effects are as small as
possible. Thus, the fix stays as it is.
- Submit a new issue for next the release to generalize this fix for all script
types.
Comment 15 Oliver-Rainer Wittmann 2009-10-15 07:31:48 UTC
OD->SBA: Checked in internal installation set of cws oooimprovement5 - please
verify.
Comment 16 stefan.baltzer 2009-10-16 13:07:19 UTC
Verified inCWS oooimprovement5.
Comment 17 stefan.baltzer 2009-11-02 09:12:27 UTC
OK in OOO320_m3. Closed.