Issue 104221 - Incorrect spacing as some characters are wrongly treated as diacritics
Summary: Incorrect spacing as some characters are wrongly treated as diacritics
Alias: None
Product: Writer
Classification: Application
Component: viewing (show other issues)
Version: OOo 3.1
Hardware: All Unix, all
: P3 Trivial (vote)
Target Milestone: ---
Assignee: stefan.baltzer
QA Contact: issues@sw
Keywords: oooqa, regression
: 105021 (view as issue list)
Depends on: 99367
Blocks: 98125 99999
  Show dependency tree
Reported: 2009-08-14 11:00 UTC by anieden
Modified: 2013-08-07 14:44 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---

A document with U+05BE (7.22 KB, application/vnd.oasis.opendocument.text)
2009-08-14 11:01 UTC, anieden
no flags Details
How it gets displayed in Writer (569 bytes, image/png)
2009-08-14 11:03 UTC, anieden
no flags Details
How it should be displayed (298 bytes, image/png)
2009-08-14 11:04 UTC, anieden
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description anieden 2009-08-14 11:00:43 UTC
When using a Hebrew maqaf (U+05BE, the Hebrew dash, basically), it runs into one
of the adjacent characters, although it should be treated as a character of its
own. This is on Ubuntu 9.04. I have tried with three fonts: Ezra SIL, Cardo, and
Bitstream Cyberbit. All show the described behaviour.

I am attaching a document showing the problem, an image showing how it gets
displayed on my machine, and an image of how it should look like (created by
letting Firefox display the text correctly).

Note that working with that character sometimes seems to crash the office, but I
only have anecdotal evidence for this.
Comment 1 anieden 2009-08-14 11:01:39 UTC
Created attachment 64156 [details]
A document with U+05BE
Comment 2 anieden 2009-08-14 11:03:53 UTC
Created attachment 64158 [details]
How it gets displayed in Writer
Comment 3 anieden 2009-08-14 11:04:30 UTC
Created attachment 64159 [details]
How it should be displayed
Comment 4 eric.savary 2009-08-14 11:35:39 UTC
@anieden: what key sequence should I press on an English keyboard with HE
software layout to get the sample you describe?
Comment 5 anieden 2009-08-14 13:29:39 UTC
I don’t use a Hebrew keyboard layout, so I don’t know how you would type the
example. I usually work through the Insert/Special characters menu.
Comment 6 deyoungaza 2009-09-10 11:27:33 UTC
Shift-{hyphen} will type a maqaf when using the Israel Lyx keyboard layout.

This is a regression in OOo version 3.0.
OOo Version 2.4 (packaged with Ubuntu 8.04) worked as it should.
Comment 7 hennerdrewes 2009-09-14 09:10:32 UTC
@hdu: Can you have a look at this? I guess this has been introduced with the
kashidafix changes.
Comment 8 2009-09-14 11:21:09 UTC
Fonts like David CLM, Frank Ruehl CLM, etc still work.
The changed behaviour maybe has to do with changes from upstream ICU:
icu 3.6 was in for OOo30x
icu 3.6-> 4.0 in CWS i18n42 got into DEV300_m35 for OOo310
Comment 9 2009-09-14 11:35:12 UTC
Since View->WebLayout looks good the icu-change is no longer a suspect...
Comment 10 2009-09-14 13:06:50 UTC
Found it: U+05BE was treated as a diacritic
The fix is easy, suggesting this regression issue deserves a fix for OOo32x
Comment 11 hennerdrewes 2009-09-14 13:19:21 UTC
@hdu: just wanted to point you to that.

If am not mistaken, there are maybe more false diacritics in the list in
sallayout.cxx, e.g. 0x05C3. 

@yoramg: Could you have a look at IsDiacritic() in sallayout.cxx ?
Comment 12 2009-09-14 14:06:09 UTC
Indeed, there was also a typo for U+05C4.

This issue shows that maintaining the character properties ourselves is not a good long-term solution => 
followup issue 105058
Comment 13 2009-09-14 14:33:03 UTC
Fixed both hebrew and some arabic diacritics in CWS ooo32gsl01.
Comment 14 2009-09-14 14:35:48 UTC
Adjusting summary to root cause
Comment 15 2009-09-14 14:37:03 UTC
*** Issue 105021 has been marked as a duplicate of this issue. ***
Comment 16 yoramg 2009-09-14 14:50:15 UTC
When correcting the tables in IsDiacritic, I suggest to add 0xFB1E to the list.
U+FB1E, the "Judeo-Spanish-Varika" is a diacritic sign used in Judeo-Spanish
texts instead of the Hebrew-Rafe (U+05BF).
I agree that marks properties should be part of the font and not of OO itself.
Diacritics should be recognized by having zero width.
Comment 17 2009-09-14 15:06:53 UTC
reminder: the handcrafted list was only used to find out which non-zerowidth-glyphs should be treated 
as diacritics, so especially the wrong positives were problematic whereas the false negatives usually didn't 
matter as most fonts have zero-width glyphs for diacritics.
Comment 18 2009-09-14 15:13:08 UTC
Added U+FB1E to the list in CWS ooo32gls01
Comment 19 2009-10-16 10:14:30 UTC
@sba: please verify in CWS ooo32gsl01
Comment 20 stefan.baltzer 2009-10-21 17:49:56 UTC
Verified in CWS ooo32gsl01.
Comment 21 stefan.baltzer 2009-11-03 14:56:22 UTC
OK in OOO320_m3. Closed.