Apache OpenOffice (AOO) Bugzilla – Issue 113584
Crash formatting as "Capitalize Every Word" text with ligatures or similar characters
Last modified: 2017-05-20 11:42:19 UTC
- Open attached document - Select All - Format - Change Case - Capitalize Every Word -> Crash This is due to the "ffi" (U+FB03) and "ffl" (U+FB04) ligatures in the text. Regression in 3.3 due to implementation of issue 1601
Created attachment 70919 [details] crash_ligature.odt
No crash in OOO330m2, but test string (in odt attachment) disappears completely. Bug appears to be caused by miscounting character length(s) of ligatures. e.g.: The flickering projector - select "flickering" - apply Capitalize Every Word Output is "The projector" (two spaces, no flickering; the selection now covers " pro") The miscounting is corrupting text buffers or whatever the OOo equivalent is, as shown when if apply "Undo" after the above: Output is "The flickeringr" Have fun...
I can't reproduce the problem with my DEV300_m86 and the sample document. Thus I'm just going to fix the title case implementation that was introduced with the patch from issue 1601. It seems a major rework of 'title case' (aka capitalize every word) and 'sentence case' is required, and the use of the breakiterator can not be avoided in order to fix this.
Problem still there on my DEV300_m86 (test case as I posted on Aug 4 - copy and paste my example - the 'fl' is a ligature). Text disappears or is mangled by Capitalize Every Word. It's not a clean install but on top of OOO330m2. OS-related? (I'm running XP sp2) A long shot, but perhaps related to Graphite? (though the bug shows up with any font, Graphite or not). ???
To elaborate on my example: The flickering projector On my system, selecting entire sentence, then Capitalize Every Word, results in entire sentence disappearing.
tl->jurf: No, the problem is due to a) some in between function iterating in steps of language portions (which is actually fine for lowercase, uppercase, ...) b) failed to correctly initialize an array of offset (which interestingly had no bad effect at all on single language selections) and c) not taking changing text size into account when ligatures got involved The first two resulting in odd choice of capitalized characters and selection and the latter usually in part or whole of the text disappearing. Right now the new implementation already works fine but for two problems: 1) a selection including more than one paragraph is not yet coverd 2) undoing a change when ligatures got involved (and are now properly resolved into two characters) usually results in garbled text where many of the spaces go missing. Seen thanks to your hard-core test in the document from the original issue.
BTW: when I wrote 'I can't reproduce the problem' above I was somewhat sloppy. What I meant was I could not reproduce the mentioned crash. The 'capitalize every word' problems are of course reproducible.
Created attachment 71030 [details] adding original bugdoc from issue 113558
TL->QA: For a list of all ligatures see http://www.unicode.org/charts/PDF/UFB00.pdf. They range from 0xFB00-0xFB06 and 0xFB13-0xFB17.
The list of ligs reminds me: a problem with Sentence case and so presumably also with Capitalize Every Word, albeit currently masked by the bug described in this issue, is that all the letters in a ligature at the start of a sentence for Sentence case (so presumably at the start of any word for Capitalize Every Word) are converted to all caps, not just the first letter. eg: find -> FInd fluke -> FLuke ſtop -> STop stop -> STop (same with ff, ffi and ffl, but those sequences don’t appear at the start of any words in English).
Adding new bugdoc to demonstrate that UPPERCASE transliteration was also already broken for ages without anyone noticing, while UPPERCASE correctly resolved the ligatures the implementation missed to take the modified text length into account. :-(
Created attachment 71037 [details] additional bug doc to show broken UPPERCASE and undo
Those ligatures are really getting us in a bind! I suggest changing the summary to "Change Case mangles output due to miscounted ligature length", or something like it - the *crash* originally reported by es appears to have vanished in OOO330m2.
tl->jurf: it was an occasional crash, it would be easy if either everything was fine or crash right away. Usually it does not work that way. ^_-
Just for the books: it is not only ligatures, at least the uppercase scenario with the alternating languages in the bug doc above does also apply if you use the German ß which is written as SS in uppercase. There is no problem with tile case here because that character does not exist at the start of a word. Also just in case it was missed: the upper case bug scenario also uses a larger number of changed language settings which is not too likely to occur in RL. Thus a correct description would be: problems if changing case of characters changes the string length. Thus I now added 'or similar characters' to the description. Note: there is no such problem with lowercase conversion since, according to HDU, there are no uppercase characters in any language that do not have a matching lowercase character of same size.
tl->jurf: what OS do you use?
-> tl OS is XP sp2 (Portuguese), heavily tweaked and streamlined (eg just 12 services run on start up). Uniscribe (usp10.dll) version as used by OOo is 1.626.6000.16386.
.
Fixed in CWS sw33bf08
Verified in CWS sw33bf08.