Issue 113584

Summary: Crash formatting as "Capitalize Every Word" text with ligatures or similar characters
Product: Writer Reporter: eric.savary
Component: formattingAssignee: stefan.baltzer
Status: CLOSED FIXED QA Contact: issues@sw <issues>
Severity: Trivial    
Priority: P2 CC: issues, jurfinke, stefan.baltzer
Version: OOO330m1Keywords: crash, regression
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 111112    
Attachments:
Description Flags
crash_ligature.odt
none
adding original bugdoc from issue 113558
none
additional bug doc to show broken UPPERCASE and undo none

Description eric.savary 2010-08-02 13:31:45 UTC
- Open attached document
- Select All
- Format - Change Case - Capitalize Every Word
-> Crash

This is due to the "ffi" (U+FB03) and "ffl" (U+FB04) ligatures in the text.
Regression in 3.3 due to implementation of issue 1601
Comment 1 eric.savary 2010-08-02 13:32:26 UTC
Created attachment 70919 [details]
crash_ligature.odt
Comment 2 jurf 2010-08-05 00:42:14 UTC
No crash in OOO330m2, but test string (in odt attachment) disappears completely.

Bug appears to be caused by miscounting character length(s) of ligatures.

e.g.:

The flickering projector

- select "flickering"
- apply Capitalize Every Word

Output is "The  projector"
(two spaces, no flickering; the selection now covers " pro")

The miscounting is corrupting text buffers or whatever the OOo equivalent is, as
shown when if apply "Undo" after the above:

Output is "The flickeringr"

Have fun...
Comment 3 thomas.lange 2010-08-10 13:33:23 UTC
I can't reproduce the problem with my DEV300_m86 and the sample document. Thus
I'm just going to fix the title case implementation that was introduced with the
patch from issue 1601. It seems a major rework of 'title case' (aka capitalize
every word) and 'sentence case' is required, and the use of the breakiterator
can not be avoided in order to fix this.
Comment 4 jurf 2010-08-10 19:16:23 UTC
Problem still there on my DEV300_m86 (test case as I posted on Aug 4 - copy and
paste my example - the 'fl' is a ligature). Text disappears or is mangled by
Capitalize Every Word. It's not a clean install but on top of OOO330m2.

OS-related? (I'm running XP sp2)

A long shot, but perhaps related to Graphite? (though the bug shows up with any
font, Graphite or not).

???
Comment 5 jurf 2010-08-10 19:19:15 UTC
To elaborate on my example:

The flickering projector

On my system, selecting entire sentence, then Capitalize Every Word, results in
entire sentence disappearing.
Comment 6 thomas.lange 2010-08-10 20:45:03 UTC
tl->jurf: No, the problem is due to 
a) some in between function iterating in steps of language portions 
   (which is actually fine for lowercase, uppercase, ...) 
b) failed to correctly initialize an array of offset (which interestingly 
   had no bad effect at all on single language selections)
and
c) not taking changing text size into account when ligatures got involved 

The first two resulting in odd choice of capitalized characters and selection 
and the latter usually in part or whole of the text disappearing. 

Right now the new implementation already works fine but for two problems:
1) a selection including more than one paragraph is not yet coverd
2) undoing a change when ligatures got involved (and are now properly resolved 
into two characters) usually results in garbled text where many of the spaces 
go missing. Seen thanks to your hard-core test in the document from the 
original issue.
Comment 7 thomas.lange 2010-08-10 20:51:51 UTC
BTW: when I wrote 'I can't reproduce the problem' above I was somewhat sloppy. 
What I meant was I could not reproduce the mentioned crash. The 'capitalize 
every word' problems are of course reproducible. 
Comment 8 thomas.lange 2010-08-11 06:11:48 UTC
Created attachment 71030 [details]
adding original bugdoc from issue 113558
Comment 9 thomas.lange 2010-08-11 06:13:57 UTC
TL->QA: For a list of all ligatures see
http://www.unicode.org/charts/PDF/UFB00.pdf. They range from 0xFB00-0xFB06 and
0xFB13-0xFB17.
Comment 10 jurf 2010-08-11 13:09:47 UTC
The list of ligs reminds me: a problem with Sentence case and so presumably also
with Capitalize Every Word, albeit currently masked by the bug described in this
issue, is that all the letters in a ligature at the start of a sentence for
Sentence case (so presumably at the start of any word for Capitalize Every Word)
are converted to all caps, not just the first letter. eg:

find -> FInd
fluke -> FLuke
ſtop -> STop
stop -> STop

(same with ff, ffi and ffl, but those sequences don’t appear at the start of any
words in English).
Comment 11 thomas.lange 2010-08-11 13:29:19 UTC
Adding new bugdoc to demonstrate that UPPERCASE transliteration was also already
broken for ages without anyone noticing, while UPPERCASE correctly resolved the
ligatures the implementation missed to take the modified text length into
account. :-(
Comment 12 thomas.lange 2010-08-11 13:30:12 UTC
Created attachment 71037 [details]
additional bug doc to show broken UPPERCASE and undo
Comment 13 jurf 2010-08-12 22:45:20 UTC
Those ligatures are really getting us in a bind!

I suggest changing the summary to "Change Case mangles output due to miscounted
ligature length", or something like it - the *crash* originally reported by es
appears to have vanished in OOO330m2.
Comment 14 thomas.lange 2010-08-13 07:34:09 UTC
tl->jurf: it was an occasional crash, it would be easy if either everything was
fine or crash right away. Usually it does not work that way. ^_-
Comment 15 thomas.lange 2010-08-13 07:56:19 UTC
Just for the books: it is not only ligatures, at least the uppercase scenario
with the alternating languages in the bug doc above does also apply if you use
the German ß which is written as SS in uppercase. There is no problem with tile
case here because that character does not exist at the start of a word.
Also just in case it was missed: the upper case bug scenario also uses a larger
number of changed language settings which is not too likely to occur in RL.

Thus a correct description would be: problems if changing case of characters
changes the string length. Thus I now added 'or similar characters' to the
description.

Note: there is no such problem with lowercase conversion since, according to
HDU, there are no uppercase characters in any language that do not have a
matching lowercase character of same size.
Comment 16 thomas.lange 2010-08-13 13:20:37 UTC
tl->jurf: what OS do you use?
Comment 17 jurf 2010-08-14 02:31:53 UTC
-> tl

OS is XP sp2 (Portuguese), heavily tweaked and streamlined (eg just 12 services
run on start up).

Uniscribe (usp10.dll) version as used by OOo is 1.626.6000.16386.
Comment 18 thomas.lange 2010-08-17 12:51:10 UTC
.
Comment 19 thomas.lange 2010-08-18 07:51:45 UTC
Fixed in CWS sw33bf08
Comment 20 thomas.lange 2010-08-19 10:00:55 UTC
.
Comment 21 stefan.baltzer 2010-08-23 14:48:57 UTC
Verified in CWS sw33bf08.