Issue 78749

Summary: some Latin text needs CTL processing
Product: gsl Reporter: hdu <hdu>
Component: codeAssignee: hdu <hdu>
Status: ACCEPTED --- QA Contact:
Severity: trivial    
Priority: P3 CC: benjamin.dev, fonts-bugs, frank.meies, harshula, issues, moyogo, mreimer, munzirtaha, nicolas.mailhot, ruedin, simos.bugzilla
Version: OOo 2.2.1   
Target Milestone: OOo 3.x   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation on: ---
Developer Difficulty: ---
Issue Depends on: 115003    
Issue Blocks: 79658, 84305    
Attachments:
Description Flags
text requiring ccmp, mark and mkmk
none
output of text with Doulos SIL in Gedit with Pango and
none
output of sample text with DejaVu Sans in Gedit
none
Wrong output of sample text with Doulos SIL in OO Write
none
Wrong output of sample text with DejaVu Sans in OO Write
none
ouput of sample text in OOo with Doulos SIL on Windows
none
output of sample text in OOo with SegoeUI on Windows none

Description hdu@apache.org 2007-06-21 12:47:55 UTC
Issue 16032 (support OpenType features) is way too broad, so this issue has been created to split one 
specific aspect of: some Latin text needs to be treated as complex text.

@fme: what problems do you see if the script detector handled some scripts that are currently detected as 
simple text as complex (e.g. "i_ogonek+combining_accent")? Wouldn't this cause quite some trouble in the 
Latin/CJK/CTL font selection... do you see a way out of this?

@moyogo/ruedin: our quality assurance team prefers concrete examples (docs,screenshots,...) of what 
does not work now, so they can assess the priority of the problem and check the success of a fix...
Comment 1 hdu@apache.org 2007-06-21 12:51:56 UTC
.
Comment 2 moyogo 2007-06-21 13:14:09 UTC
Created attachment 46139 [details]
text requiring ccmp, mark and mkmk
Comment 3 moyogo 2007-06-21 13:15:32 UTC
Created attachment 46140 [details]
output of text with Doulos SIL in Gedit with Pango and
Comment 4 moyogo 2007-06-21 13:18:41 UTC
Created attachment 46141 [details]
output of sample text with DejaVu Sans in Gedit
Comment 5 moyogo 2007-06-21 13:19:41 UTC
Created attachment 46142 [details]
Wrong output of sample text with Doulos SIL in OO Write
Comment 6 moyogo 2007-06-21 13:20:32 UTC
Created attachment 46143 [details]
Wrong output of sample text with DejaVu Sans in OO Write
Comment 7 moyogo 2007-06-21 13:38:26 UTC
http://www.openoffice.org/nonav/issues/showattachment.cgi/46139/text.utf8
- the first line just needs 'mark' to position the mark below correctly -
- the second line needs 'mark' for marks above, and 'ccmp' to decompose i and j
when followed by combining mark above
- the third line shows ligatures, not obligatory but the user should be able to
enable them
- the fourth line needs 'mark' for marks, 'ccmp' for i, and 'mkmk' to stack the
marks. You’ll notice there’s something wrong with the first g-with-a-hook, the
marks are not in a Unicode canonical order, to handle this properly the shaper
should normalize base+marks before shaping.
- the fifth, mark stacking below
- the sixth line should have ligature tie at different high but DejaVu and
Doulos SIL don’t do that. Junicode handles this with the 'kern' feature.

DejaVu fonts are installed by default on many Linux systems, the latest (MS
Office, Vista) Tahoma, Arial, and Times New Roman have similar feature for
simple diacritics placement.
http://dejavu.sourceforge.net

Doulos SIL is a font for linguistics or languages using stacked diacritics
http://scripts.sil.org/DoulosSILfont

Junicode is a medievalist font with good support for stacked diacritics.
http://junicode.sourceforge.net/
Comment 8 hdu@apache.org 2007-06-21 15:08:14 UTC
Thank you for the nice samples!

Especially the problem with the latest versions of the common Dejavue font is important for selecting the 
appropriate priority. To get an even better overview: which languages are most impacted by the problem? 
Do you happen to know if these languages are already represented in the OOo NLC (http://native-
lang.openoffice.org/)?
Comment 9 moyogo 2007-06-21 18:53:15 UTC
btw: this should be named "some Standard scripts text needs CTL processing"

Any language that doesn't benefit from legacy encodings with precomposed
characters in Unicode is affected by this bug. This includes many African
languages or other minority languages, Malagasy is an example in NLC. The 'locl'
language specific feature affect languages like Serbian and Macedonian.

In theory any language using marks could be affected, if composed forms are used
instead of the legacy precomposed forms. For example in French "École" and
"École" will no be rendered the same way in OO when they should.

Comment 10 iorsh 2007-06-21 21:06:28 UTC
This issue is not restricted to Latin texts. Hebrew text needs 'ccmp' and 'mark'
to display diacritical marks properly. Perhaps Arabic needs these too, and
possibly some more, as it more complicated than Hebrew.

A free Hebrew font with these features can be found at
http://culmus.sourceforge.net/devel/FrankOT.tar.gz

This issue is not like 'making text prettier'. Without these features diacritics
are absolutely unusable.
Comment 11 hdu@apache.org 2007-06-22 08:07:52 UTC
> Hebrew text needs 'ccmp' and 'mark' to display diacritical marks properly.

@iorsh: are you aware of any Hebrew script that is improperly handled because these tables don't work?
If yes: please write a seperate issue (because Hebrew already gets CTL processing, whereas this issue is 
about text that didn't get CTL-processing though it might need it). Then please assign the new Hebrew 
issue to me and cc mreimer and ayaninger...
Comment 12 moyogo 2007-06-22 14:11:56 UTC
Created attachment 46182 [details]
ouput of sample text in OOo with Doulos SIL on Windows
Comment 13 moyogo 2007-06-22 14:16:07 UTC
Created attachment 46183 [details]
output of sample text in OOo with SegoeUI on Windows
Comment 14 moyogo 2007-06-22 14:21:34 UTC
On Windows, with a recent Uniscribe, these features are enabled by default and are handled properly.
As you can see in the attachment 46182 [details] and attachment 46183 [details], Doulos SIL and Segoe UI have those 
features properly used in OpenOffice in Windows. The first g-with-a-hook even has the marks at the 
right places since Uniscribe reorders the characters before shaping.

This means that making a document on Windows will be correct, but not if made on Linux.
Comment 15 iorsh 2007-06-23 14:05:49 UTC
I'm sorry - I was wrong with my earlier comment. TrueType fonts with Hebrew
'mark'/'ccmp' features work fine.
Comment 16 dbachmann 2007-06-28 13:24:09 UTC
resolution of this issue is crucial for making oo.o usable for writing
linguistic literature. MS Word doesn't support this either, so it isn't all that
important for "market share" at present, but this is a severe problem for anyone
working with orthographies not addressed by "precomposed" legacy encodings
adopted into Unicode.

Comment 17 djart 2007-06-28 13:52:01 UTC
Excuse me, I cannot understand your logic sometimes. Since when do you follow
exactly what MS Word is doing ?? It's ok to keep an eye on it for important
features that are still missing from OO and for ideas, but sometimes I'm
struggling to decide the impression of whether you are eventually INDEED trying
to produce a better product than M$ Office or not.
Comment 18 simos.bugzilla 2007-06-28 14:03:51 UTC
@dbachmann: Do you think it would be good to get extra developers on this? Is
there some description of what files need to be touched and what should be added?
Comment 19 dbachmann 2007-07-16 14:16:00 UTC
@simos, I am afraid I am not familiar enough with oo.o internals to know "which
files should be touched". This appears to be a platform-dependent issue. As
moyogo notes (June 22), the required features apparently work fine for Windows
with a recent Uniscribe, but not on X11 (Linux / OS X), although X11 seems well
capable of supporting OTF, as shown by gedit or yudit etc. Consequently, this
issue  should probably be assigned to an X11 wizard.

for the typographers' view see also http://www.typophile.com/node/17517
http://typophile.com/node/28539
Comment 20 hdu@apache.org 2007-08-07 15:43:24 UTC
Setting a target.
Comment 21 hdu@apache.org 2008-07-25 15:26:24 UTC
target
Comment 22 grakic 2008-12-16 15:48:29 UTC
"locl" is required for Serbian. There is an example of this on pango.org:
http://www.pango.org/ScriptGallery?action=AttachFile&do=get&target=OpenTypeLanguage.png
Comment 23 lohmaier 2010-03-18 17:38:40 UTC
*** Issue 96123 has been marked as a duplicate of this issue. ***
Comment 24 eric.savary 2010-05-06 00:55:33 UTC
*** Issue 110477 has been marked as a duplicate of this issue. ***
Comment 25 eric.savary 2010-05-06 00:56:23 UTC
*** Issue 111378 has been marked as a duplicate of this issue. ***
Comment 26 yaoziyuan 2010-05-06 02:23:39 UTC
IMO, this issue is related to OpenOffice's text rendering engine (Graphite).
Firefox and GTK+ apps use another text rendering engine Pango which is not
affected, and KDE/Qt apps use Qt which is OK too.
Comment 27 stevanwhite 2010-08-05 13:33:42 UTC
yaoziyuan, you are right.  This has to do with the OO rendering engine.

It makes OO look very bad.  Since OO is so important to the free software
community, it makes the whole community look bad.

You mentioned the rendering engines used by KDE4 and Gnome.  But also Windows
Vista+ and the Mac OS have good support for these features.

The effect is, any graphical web browser, and much simpler word processors such
as KWord, and even the lowliest text processor on these systems renders mark
placement fairly well (never mind MS Word).  But not OpenOffice.

OpenOffice Writer is primarily about text display of high quality.  Without
that, all the fancy bells and whistles are for the garbage.  Anything else you
can do with it, could be done better with some other tool.

Crucial features that still seem to be absent in OO 
   'ccmp' ligature composition (as opposed to the ccmp decomposition table)
   'mark'
   'mkmk'
certainly there are others.

Scripts that would use these features (and may be illegible without them)
include: Hebrew, Arabic, Thai, Vietnamese, but there are many more.

I don't know if OO supports 'abvm' and 'blwm', but these are used a lot with
Indic scripts, as well as Tibetan.

But mark positioning is used for fine placement in general for languages that
use marks.  And that includes the *majority* of European languages.  It makes
the difference between text being very ugly, and looking great.

See for example:
http://partners.adobe.com/public/developer/opentype/index_table_formats2.html

This issue makes the product look stupid and useless for a large fraction of
potential users in the world, yet since this report was opened in 2007, it has
been marked priority P3 "Of interest, but not planned or expected in this
release" (and many related reports preceded it.)

Colleagues, let's get our priorities straight.
Comment 28 harshula 2011-01-07 02:59:46 UTC
On GNU/Linux Open Office uses ICU for text layout. If there's a problem with
OpenType features, look at the ICU code.

cya,
#