Issue 88376 - X11: Wrong character mapping for some fonts
Summary: X11: Wrong character mapping for some fonts
Status: CLOSED FIXED
Alias: None
Product: gsl
Classification: Code
Component: code (show other issues)
Version: OOo 2.4.1
Hardware: PC Unix, all
: P3 Trivial (vote)
Target Milestone: OOo 3.0
Assignee: wolframgarten
QA Contact: issues@gsl
URL:
Keywords: regression
: 80190 81020 82150 83370 83884 84335 86114 86309 86781 87161 89157 89982 90564 93437 (view as issue list)
Depends on: 72129
Blocks: 89974 88888 89157 92843
  Show dependency tree
 
Reported: 2008-04-17 09:43 UTC by ntsiebel
Modified: 2009-04-28 13:29 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
testcase as described in the bug report. (22.80 KB, application/vnd.oasis.opendocument.presentation)
2008-04-17 09:44 UTC, ntsiebel
no flags Details
To make things clear: screenshots. No. 1: Problem on page 3, see Fufl (should be Fuß = Fuß) etc. (197.79 KB, image/png)
2008-04-17 20:19 UTC, ntsiebel
no flags Details
To make things clear: screenshots. No. 2: Changed page 3: Italics for some (not all) special characters (marked and clicked [I] above), they are correct now! (205.40 KB, image/png)
2008-04-17 20:23 UTC, ntsiebel
no flags Details
To make things clear: screenshots. No. 3: Opened file, removed slides 1+2, saved, closed OOo completely, reopened file, result: No problem on page 3! Nothing edited on this slide! (169.78 KB, image/png)
2008-04-17 20:25 UTC, ntsiebel
no flags Details
quite minimal bugdoc (11.30 KB, application/vnd.oasis.opendocument.text)
2008-08-22 09:35 UTC, hdu@apache.org
no flags Details
a thought (1006 bytes, patch)
2008-08-22 10:33 UTC, caolanm
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description ntsiebel 2008-04-17 09:43:23 UTC
A few days ago I opened an existing document in Presentation that I have not
edited for 1 year.  Back then, the document looked fine in OOo, and its PDF
export from last year supports this.
No, though, some special characters (e.g., ö = &omul; and ä = ä) are
displayed incorrectly; they are replaced by other characters, e.g. an accent-^
and a permille character (0/00).

The characters render OK when you make them italic!

Also, the character rendering seems to be dependent on a type of context not
visible to the user.  This can be seen in the attached file.

Please find attached a testcase that proves this.  The special characters on
page 3 render incorrectly.  Editing in this region is also sometimes impossible.
 Please make the special (non-7bit) characters italic: they will change to their
actual correct content.

When you remove slide 1 from the presentation (without editing anything else),
then save and re-load the file, the exact same characters on the (now) page 2 do
render correctly!

I have confirmed this in releases 2.4.0.3.5-1.1 and 2.3.1.2-3.1, both Linux, and
on both i586 and x86-64.
Comment 1 ntsiebel 2008-04-17 09:44:24 UTC
Created attachment 52957 [details]
testcase as described in the bug report.
Comment 2 wolframgarten 2008-04-17 14:37:33 UTC
The document looks ok here when exported. Did you use the same pdf viewer in
both cases and could you try another? Please attach one of the bad pdf files.
Thanks!
Comment 3 ntsiebel 2008-04-17 20:19:54 UTC
Created attachment 52976 [details]
To make things clear: screenshots. No. 1: Problem on page 3, see Fufl (should be Fuß = Fuß) etc.
Comment 4 ntsiebel 2008-04-17 20:23:45 UTC
Created attachment 52977 [details]
To make things clear: screenshots. No. 2: Changed page 3: Italics for some (not all) special characters (marked and clicked [I] above), they are correct now!
Comment 5 ntsiebel 2008-04-17 20:25:58 UTC
Created attachment 52978 [details]
To make things clear: screenshots. No. 3: Opened file, removed slides 1+2, saved, closed OOo completely, reopened file, result: No problem on page 3!  Nothing edited on this slide!
Comment 6 ntsiebel 2008-04-17 20:42:25 UTC
Sorry to be unclear.  The PDF file was only quoted to show that the exact same
file rendered OK previously.  It is not necessary to export to PDF to see the
bug.  Please see the 3 new attachments -- screenshots this time.  It may be
something to do with rendering, maybe also character coding, or something else ???

The fact that removing the first 2 slides (then save, close OOo, re-open saved
file) changes the rendering/coding on the last slide (which remains unchanged by
the user) may suggest that some invisible character/attribute/mode is
present/initiated somewhere in the first two slides?

Hope this helps to track the problem.

Once again, I used Linux, i586 and x86_64, OpenSuSE 10.3, OOo from OpenSuSE's
repo, today's release 2.4.0.3.5-2.1 (same problem on 2.4.0.3.5-1.1, 2.3.1.2-3.1).

Please let me know if I can give you more info.
Comment 7 ntsiebel 2008-04-20 13:13:48 UTC
The problem does not exist on OpenOffice 2.3.1 DE for Windows.  I do not know
whether the reason is in the different OS/compilation or with the fact that a
(probably) different version of the Andale Sans font is present on that system.
 For the moment the bug should be tracked under Linux, where I use the standard
Andale Sans font /usr/share/fonts/truetype/ans_____.ttf from the
agfa-fonts-2003.03.19-92 package (MD5 sum: e2518c39b4eecd3eb72dc81c956172c5
/usr/share/fonts/truetype/ans_____.ttf).

However, this CANNOT simply be a problem in the font file, as (a) such problem
would have been detected and fixed a very long time ago, (b) the characters are
rendered OK with the same attributes (non-bold, non-italic, same point size)
when slides 1+2 are removed (see screenshots), and (c) the characters in
question look OK in kfontview.

Hope this helps.
Comment 8 ntsiebel 2008-04-24 12:18:41 UTC
Two colleagues of mine have the same problem on their computers, which rules out
personal settings as a source of the problem.  Their machines are x86 (32-bit)
platforms on OpenSUSE 10.3.

Anybody else see the same thing when opening the attached file?  Maybe we can
track down on which systems it looks OK and on which it fails.
Comment 9 wolframgarten 2008-04-25 08:38:43 UTC
And you are all using the OOo from OpenSuSE'srepo? In this case I would
recommend to have a try with the orginal version from OOo. Thanks.
Comment 10 ntsiebel 2008-04-25 09:09:22 UTC
Thanks for the hint.  They did all use OpenSuSE's version.

I have therefore downloaded and installed (as a user)
OOo_2.4.0_LinuxIntel_install_wJRE_en-US.tar.gz from a mirror and ran it.

The problem is still there on slide 3, with characters similar to fl, 0/00, ^
and "," where ß, ä, ü and ö should be.  The degree sign is displayed as an
integral sign.  All of the characters on this page look OK when made italic
(select, press [I] button above).
Comment 11 wolframgarten 2008-04-25 09:44:37 UTC
Ah, now I can reproduce this. Reassigned.
Comment 12 wolframgarten 2008-04-25 09:55:06 UTC
Changed target and owner.
Comment 13 ntsiebel 2008-08-07 09:44:03 UTC
I have checked again and can confirm that the issue is still present in build
2.4.1.6.

Comment 14 hdu@apache.org 2008-08-07 10:09:09 UTC
.
Comment 15 hdu@apache.org 2008-08-13 13:06:57 UTC
I'm sure I've already seen issues with the same root cause in the tracker but I can't find them. Anyway, this 
seems to be a case of ImplFontData.meFamily aliasing => fixed in CWS vcl93
Comment 16 hdu@apache.org 2008-08-13 16:10:59 UTC
@wg: please check in CWS vcl93
Comment 17 hdu@apache.org 2008-08-13 16:11:43 UTC
forgot to reassign
Comment 18 hdu@apache.org 2008-08-22 09:10:31 UTC
Now I found the real root cause:
When fontconfig's FcFreeTypeCharIndex() is called it tries some charmaps on the FT_Face and doesn't reset it to its 
original. In this case the charmap was changed from FT_ENCODING_UNICODE to FT_ENCODING_APPLE_ROMAN, so 
an U+00E4 (a with diaresis) became an APPLE_ROMAN_0xE4 (per mille sign), etc.

Disabling the patch from issue 72129 fixes the bad regression.
Comment 19 hdu@apache.org 2008-08-22 09:35:20 UTC
Created attachment 55933 [details]
quite minimal bugdoc
Comment 20 hdu@apache.org 2008-08-22 09:45:12 UTC
Only certain document fonts were affected by the problem, because FcFreeTypeCharIndex() was only 
called when glyph fallback got involved. ASCII chars did not hit the problem because the problematic call 
usually just set another latin encoding.
Comment 21 caolanm 2008-08-22 10:33:13 UTC
Created attachment 55940 [details]
a thought
Comment 22 caolanm 2008-08-22 10:35:03 UTC
It would have been really good if the FcFreeTypeCharIndex api had any mention
that it did that :-(

The alternative patch there might also work (?) but I can understand once bitten
twice shy, so maybe something like that for a future version
Comment 23 wolframgarten 2008-08-22 10:51:03 UTC
Verified in CWS.
Comment 24 hdu@apache.org 2008-08-22 11:12:07 UTC
@cmc: yes, your patch solves the problem too. I sent a similar patch to the fontconfig list to fix the 
unexpected side effect in the library itself.

But the CWS with the current fix is already closed for development and is really urgently needed for the 
OOo3RC. If the CWS doesn't come back and gets integrated as it is a followup task for the issue 72129-
like problem is due.

self reminder: Since we'll probably link against unfixed versions of libfontconfig for a while it would be 
even better to have psprint's FcFreeTypeCharIndex wrapper fixed, but that would add a new dependency 
to that. So that remains a TODO until we merge psprint into vcl.
Comment 25 hdu@apache.org 2008-08-23 07:27:25 UTC
To help identify duplicates to this issue (e.g. issue 92843 is a candidate) I'd like to point out 
the subtle details to make it understandable out why CJK, Indic and non-ASCII were impacted 
differently by this same root cause:

Fonts nowadays usually have an unicode mapping, many still have an old macos compatibility 
mapping like apple_roman and some important CJK fonts still contain non-unicode legacy 
charmaps, FcFreeTypeCharIndex() changed the FT_Face's charmap by iterating through the 
ones available in the font until it either hit once or until all available charmaps missed. The 
result of that unexpected side effect was that
- non-ASCII latin misses resulted in the FT_Face being changed to apple_roman encoding
- CJK fonts often got the regular unicode mapping back, but sometimes a legacy CJK mapping 
hit first
- Indic, Thai, Hebrew, etc. almost certainly switched back to the unicode mapping at the first 
glyph hit

Now if the scenario resulted in FT_Face being changed back to a unicode mapping everything 
was fine again. If it resulted in e.g. apple_roman then the problem seen in this issue happened. 
In the case of just one legacy CJK mapping being available there usually was no problem too, 
unless the mapping from unicode to legacy mapping differed between OOo and FC. In the 
case of both unicode maps and legacy maps being available in the font, the scenario of many 
glyph misses caused occassional switches between these charmaps. Escpecially since these 
big mappings often have a slightly different coverage.

The root cause is easily understable, but many of resulting bug scenarios are so complex to 
be mind bending... though the mapping trouble outlined above is complex enough it is further 
complicated by an LRU-like caching of the usually expensive mapping results.
Comment 26 hdu@apache.org 2008-08-25 08:01:49 UTC
Correction to the above: in the official fontconfig library not all charmaps of a font are tried, e.g. the 
legacy non-unicode CJK encodings are ignored. Maybe asian distributions have patched up their 
libfontconfig though to enable them. This would allow them to use of important fonts that only have 
legacy encodings. Can anyone confirm this?

If no unicode charmaps are available OOo uses legacy CJK-encodings for the same reason. The side effect 
that the FT_Face's charmap got silently switched is still causing bad problems, but unless the library is 
patched up the mind bending scenarios of the previous comment are much less likely to occur in real life.
Comment 27 caolanm 2008-08-25 08:49:40 UTC
FWIW, I see no custom patches at all in fedora fontconfig (2.1.4) for F10/F9
except a single custom fontconfig configuration rule to set some asian fonts to
embeddedbitmap=false (http://cvs.fedora.redhat.com/viewvc/devel/fontconfig/)
Comment 28 hdu@apache.org 2008-08-25 10:42:06 UTC
FYI, I found that issue so interesting that I blogged about it: 
http://blogs.sun.com/GullFOSS/entry/what_could_possibly_go_wrong
Comment 29 hdu@apache.org 2008-09-03 16:14:03 UTC
*** Issue 87161 has been marked as a duplicate of this issue. ***
Comment 30 hdu@apache.org 2008-09-03 16:14:35 UTC
*** Issue 83370 has been marked as a duplicate of this issue. ***
Comment 31 hdu@apache.org 2008-09-08 14:26:11 UTC
*** Issue 93437 has been marked as a duplicate of this issue. ***
Comment 32 hdu@apache.org 2008-09-08 15:04:10 UTC
*** Issue 83884 has been marked as a duplicate of this issue. ***
Comment 33 hdu@apache.org 2008-09-08 15:05:09 UTC
*** Issue 90564 has been marked as a duplicate of this issue. ***
Comment 34 hdu@apache.org 2008-09-09 09:03:43 UTC
*** Issue 86309 has been marked as a duplicate of this issue. ***
Comment 35 hdu@apache.org 2008-09-09 09:04:51 UTC
*** Issue 89982 has been marked as a duplicate of this issue. ***
Comment 36 hdu@apache.org 2008-09-09 09:06:35 UTC
*** Issue 82150 has been marked as a duplicate of this issue. ***
Comment 37 hdu@apache.org 2008-09-09 15:02:43 UTC
*** Issue 84335 has been marked as a duplicate of this issue. ***
Comment 38 hdu@apache.org 2008-09-09 15:12:35 UTC
*** Issue 86114 has been marked as a duplicate of this issue. ***
Comment 39 hdu@apache.org 2008-09-19 11:15:09 UTC
*** Issue 89157 has been marked as a duplicate of this issue. ***
Comment 40 hdu@apache.org 2008-09-19 11:41:57 UTC
*** Issue 80190 has been marked as a duplicate of this issue. ***
Comment 41 hdu@apache.org 2008-09-24 08:10:02 UTC
Also fixed in CWS chart33 for target 2.4.2
@wg: please verify in CWS chart33
Comment 42 stefan.baltzer 2008-10-06 08:46:40 UTC
SBA: I put ES and myselc on c/c.
Comment 43 wolframgarten 2008-11-12 12:00:01 UTC
Tested in Final. Closed.
Comment 44 hdu@apache.org 2008-12-19 09:41:06 UTC
*** Issue 86781 has been marked as a duplicate of this issue. ***
Comment 45 IngridvdM 2009-04-28 13:29:15 UTC
*** Issue 81020 has been marked as a duplicate of this issue. ***