Issue 4638 - ligature support
Summary: ligature support
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 1.1 Beta2
Hardware: All All
: P3 Trivial with 38 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
: 62291 93584 (view as issue list)
Depends on:
Blocks:
 
Reported: 2002-05-08 12:28 UTC by maccy
Modified: 2017-05-20 10:44 UTC (History)
8 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
example generated by OpenOffice.org 3.2 with automatic ligature handling (47.49 KB, application/pdf)
2010-06-04 13:46 UTC, nemeth.lacko
no flags Details
Font features of Linux Libertine G and Linux Biolinum G (107.76 KB, application/pdf)
2010-09-27 15:20 UTC, nemeth.lacko
no flags Details
Test file with ligatures. (11.83 KB, application/octet-stream)
2015-05-17 05:52 UTC, JC Ahangama
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description maccy 2002-05-08 12:28:28 UTC
it would be nice to be able to enable ligatures (for fi, fl, ff, ffl, ffi). You
can a look at Indesign which has the same feature.
I would additionally like to have a plugin which can switch off ligatures for
certain words taking respect of a kind of blacklist of words which don't may
have ligatures - in German there are lots of such exceptions. Doing the plugin
which does the same as my "rmligs" script for LaTeX files will however be a
later step, after ligature support is there.
Comment 1 stefan.baltzer 2002-05-13 17:31:47 UTC
Reassigned to Christian.
Comment 2 christian.jansen 2003-03-24 07:50:21 UTC
Reassiged to Bettina.
Comment 3 mwdiers 2003-08-27 05:10:11 UTC
There are more ligatures than the "f" combinations which should be
supported if ligatures are to be supported. "Th" is another common
ligature. 

Adobe InDesign 2.0 has implemented full ligature support, in which
ligatures are placed during typing. Backspacing over ligatures
correctly replaces the ligature with the remaining characters (except
in the case of "ffi" and "ffl", in which case, backspacing over it
first replaces the ligature with the "ff" ligature), so that, to the
user, ligatures are transparently implemented. I believe this would be
the preferred method, rather than implementing some sort of scripted
solution.

InDesign also specifies two strata of ligatures. The common ones:
"fi", "fl", "ff", "ffi", "ffl", and "Th" are in the first strata. "ct"
and other fairly uncommon ligatures are in the second strata. The two
strata can be enabled independently of one another. This allows one to
use the ligatures which are in common use, without being forced to use
the ones which are fairly old, and out of use.

Note: The Adobe Pro series of OpenType fonts contain complete ligature
support internally. I have quite succesfully used these fonts in
OpenOffice (Linux) by converting them to TrueType. Obviously, this
will not be necessary in the future, when OpenType support is implemented.
Comment 4 maccy 2003-08-31 02:27:47 UTC
of course, the way you described it it should be done, however
there have to be exception dictionaries for words, where certain
ligatures are typographically incorrect. In German there are many
of them, usually word barriers of compound words.
Comment 5 msundman 2004-04-09 02:47:16 UTC
It's been 2 years since this issue was filed and it still has the "NEW" status. 
It's extremely annoying to have to find&replace all ligatures manually, and then 
OOo doesn't even understand the ligatures when doing a spellcheck, which renders 
the spellchecker pretty useless.
Comment 6 rblackeagle 2004-04-09 04:41:03 UTC
Agreed.  If I had any votes for word processor issues, I'd vote for it.  My main
irritation is how spellchecker treats ligatures.  I have given up on using them
in OOo, however, as it simply does not handle them correctly.
Comment 7 tangent 2005-09-19 19:58:38 UTC
I can confirm that using ligatures confuses the spell checker.  For instance, if
you replace the "fi" in "configure" with the fi ligature, it's flagged as a
spelling mistake.  I can add the "new word" to the dictionary, but the spell
checker should be smart enough to break the ligature apart so this isn't necessary.

While it would be nice to have InDesign-level ligature handling, I don't expect
that from a word processor.  If just the spell checking issue and Issue 54825*
were fixed, I'd be satisfied.

* Summary of Issue 54825: if you say Insert | Special Character -- such as to
insert a ligature -- then change the style for the paragraph to use a different
font, the special character remains in the old font.  It affects more than just
ligatures, so I filed it as a separate bug.
Comment 8 lohmaier 2006-02-22 22:05:28 UTC
*** Issue 62291 has been marked as a duplicate of this issue. ***
Comment 9 hagar_de_lest 2007-02-09 09:25:36 UTC
The statistics have to be improved also :
Try a word with a ligature like 'enfin' (French) 'fi' = F001.
-> OOo counts 4 characters (correct, even if 'fi' is in fact 2 letters) and 2
words (incorrect, there is only one word).
-> MS Word counts 4 characters and 1 word.
Comment 10 msundman 2007-02-09 12:00:24 UTC
> Try a word with a ligature like 'enfin' (French) 'fi' = F001.
> -> OOo counts 4 characters (correct, even if 'fi' is in fact 2 letters)

Huh? Even if the two letters "fi" are replaced by a ligature on output they are 
still two letters, not one. The statistics should count letters, not glyphs. 
What on earth would be the point of counting glyphs instead?
Comment 11 pesala 2007-02-12 09:29:33 UTC
Word and character count is correct for fi and fl ligatures mapped to the Private 
Use Area, but not for ff fi fl ffi and ffl ligatures mapped to the Alphabetic 
Presentation forms. 

This should be updated as Unicode encoding is the new standard. 

"ve owers" is counted correctly, but not "five flowers."
Comment 12 pesala 2007-02-12 09:29:53 UTC
Word and character count is correct for fi and fl ligatures mapped to the Private 
Use Area, but not for ff fi fl ffi and ffl ligatures mapped to the Alphabetic 
Presentation forms. 

This should be updated as Unicode encoding is the new standard. 
Comment 13 bettina.haberer 2007-09-26 16:01:13 UTC
Hi Mathias, I have changed the current owner to your owner. Please take the
ownership of these enhancements.
Comment 14 joaopaulo1511 2009-10-20 05:51:23 UTC
I have been using Microsoft Office 2010 Technical Preview, and Word now supports
the advanced OpenType properties (ligatures, contextual alternatives and so on).
It would be nice to have them on OpenOffice.org too.
Comment 15 Olaf Felka 2009-12-01 06:51:14 UTC
*** Issue 93584 has been marked as a duplicate of this issue. ***
Comment 16 nemeth.lacko 2010-06-04 13:37:21 UTC
Some of the Graphite SIL fonts (eg. Charis SIL) support ligature replacement
with OpenOffice.org 3.2. I have made a Graphite font version from the Linux
Libertine font with small caps, old style numbers etc. and two-level ligature
support, see http://numbertext.org/linux/.

The Graphite font has optional features to change the digits to the number
names. These features can be switched on by the OpenOffice.org typography
toolbar (http://extensions.services.openoffice.org/en/project/typo), or eg.
adding manually the feature ids to the font name in character formatting:

Magyar Linux Libertine G:204=0 (no ligatures)
Magyar Linux Libertine G:204=1 (default: Qu and f ligatures)
Magyar Linux Libertine G:204=2 (also the old st and ct ligatures)

There are some hyphenation problems in Graphite integration of OpenOffice.org
(Issue 111272) yet, but Graphite has excellent possibilities in typography and
solving some interesting issues, like missing numbering types (Issue 92730).
Comment 17 nemeth.lacko 2010-06-04 13:41:12 UTC
> The Graphite font has optional features to change the digits to the number
names

to change the character sequences to ligatures and much more (Graphite is a
programmable font format with its GDL language)
Comment 18 nemeth.lacko 2010-06-04 13:46:23 UTC
Created attachment 69801 [details]
example generated by OpenOffice.org 3.2 with automatic ligature handling
Comment 19 nemeth.lacko 2010-09-27 15:18:35 UTC
The new Linux Libertine G supports all OpenType ligatures of the original Linux
Libertine (Th, fk, tt etc.) with the more confortable 4-letter feature ids.

http://www.numbertext.org/linux
Comment 20 nemeth.lacko 2010-09-27 15:20:29 UTC
Created attachment 71863 [details]
Font features of Linux Libertine G and Linux Biolinum G
Comment 21 davidasf 2011-01-30 00:21:39 UTC
joaopaulo1511 said the following:
"I have been using Microsoft Office 2010 Technical Preview, and Word now 
supports the advanced OpenType properties (ligatures, contextual alternatives 
and so on). It would be nice to have them on OpenOffice.org too."

I second that wholeheartedly. Type foundries should start applying OpenType 
standard (2003) to their fonts.

MS Office 2010 does ligatures correctly. Notepad did it since 2006, Adobe since 
2004. I made a font with 400 (FOUR HUNDRED) plus ligatures and they form 
perfectly inside the browsers, Firefox, Safari, Lunascape and Arora plus Adobe 
InDesign and PhotoShop:
http://www.lovatasinhala.com/
The font:
http://www.lovatasinhala.com/avazyabadu/samagana.ttf

The input is raw Latin letters within ISO-8859-1. All the ligatures are in the 
PUA of the font. I think OOw programmers should consult Mozilla on how they do 
this in Firefox and Thunderbird's mail editor. (IE makes the ligatures halfway 
and gives up and picks up again seeminlgy randomly. It is slow, and the letters 
are scratchy looking if you have Clear Type turned on, aaiyaiyaa! OOw is 
similar. Install my font and copy a para from the web site into OO-Write to see 
what I mean).

I use the following CSS3 rule in the web pages:
text-rendering: geometricPrecision / optimizeLegibility (both work)
I have no clue why it works except that the Lookup Tables in the font actually 
offer the ligatures. There is no problem in searching component letters 
embedded inside a ligature. (My base letters are, of course, awfully different 
from their common Latin shapes, sorry).

I have also extended JavaScript sort() to sort this text according to Sinhala 
(extended Sanskrit) collation order and it works perfectly in spite of ligaures.

There is one Unicode character that is suppoosed to be intervened between two 
letters when you do not want them to join: u200C called ZWNJ. I use that inside 
my web pages.

Thanks.
Comment 22 CodeLurker 2013-08-08 05:39:43 UTC
When I create a new document, set the font to "Linux Libertine G" and try to type the sentence, "So I finally have ligatures ...", when I get to the fi, Writer crashes.  This happens with or without the Typography Toolbar installed.  I'm running Win7 on Intel.
Comment 23 Tor-Ivar Krogsæter 2015-04-23 15:44:16 UTC
This now is a 13-year-old issue. With OpenType finally getting the hold it should have, and even Microsoft Office supporting ligatures, it is getting ridiculous that OpenOffice seems to get renegated to a secon… tertiary platform for office programs. Even though not many people have voted for this, I do believe this is because of the lack of knowledge, rather than a disinterest. I do hope this issue gets resolved soon. (I was going to say ‘sooner, rather than later’, but considering this has been an issue for so long, I would say we are way into the ‘later’ by now.)

I am for the first time, since I started using the OOo package (whilst it was in version 1.0.2), seriously considering changing to Microsoft Office, all because of this.
Comment 24 JC Ahangama 2015-05-17 05:43:06 UTC
Open Type (OT) standard is now more than 10 years and its successor Open Font is several years too. If any font has ligatures, they should be created in the Private Use Area (PUA) and addressed using lookup tables. These lookup tables can be categorized under OT features 'liga', 'rlig or 'dlig'. The features in turn come under a script. For example, English and German are Latin scripts (latn for OT). rlig and dkig are for special ligatures used in high end programs. (MS Word has selection criteria for them, but read on).

OT says that standard ligatures should be displayed by default. <<= NOTE

Now let's create one font with the English F ligatures and another with German Fraktur ligatures both under the 'liga' feature. The ligatures are then standard ligatures. They are still Latin script fonts. Depending on the font, we can show this text you are reading either in English or classic German -- perfectly inside Linux but, unfortunately not inside Windows. Office 2007, 2010 and 2013 all have the capability to show ligatures but a Windows update prevents it.

Standard ligatures of a font will show inside Windows Notepad, Linux Geany, Abiword, All Adobe programs, gNumric, Excel and any popular browser. 

****MS Word, OO and LO Write will all fail inside Windows.****

 OO and LO do not have that problem inside Linux!

If you logically analysze this problem, we see that MS Word, OO and LO tell the OS to render the font. If they just took the glyphs that the font handed them they would not have this problem. That is what Geany, a simple TEXT editor does and Excel does too that does not care about decorating texts any special way.

I think as a start, OO and LO should just assume that the font conforms to OT and if it has regular ligatures that it will hand them and they can be used for line justification. A ligature is still more than one character, hence the cursor will step into it.

I created a font that is 99% ligatures. It shows romanized Singhala as complex Singhala script.

Here is the font.
http://smartfonts.net/ttf/aruna.ttf
If you install that font and view the following ODT file inside Windows and Linux environments you'll see what I mean. (No problems with Abiword)
http://smartfonts.net/ttf/aruna.ttf.

A web site of romanized Singhala but looking like complex Singhala with that orthographic smartfont above:
http://lovatasinhala.com/
Comment 25 JC Ahangama 2015-05-17 05:52:58 UTC
Created attachment 84744 [details]
Test file with ligatures.

This is the file I did attach to the earlier message
JC
Comment 26 Tor-Ivar Krogsæter 2015-09-18 13:06:50 UTC
This issue seems to have gotten renewed interest. For the sake of providing all relevant information, I would like to point out that also the fj-ligature is a standard ligature in Norwegian and similar languages.
Comment 27 JC Ahangama 2015-09-18 17:38:01 UTC
Where is 'fj' ligature?

I think the 'renewed' interest in ligatures is because techies newly entering the field are catching up with Open Type (now Open Font) technology. Some are understandably confused.

That ligatures are separate things outside letters was not an issue during letterpress days. Those days, the printer (as I was then) just composed the text including ligature types if the author required them and the reader saw them. (A type is a stamp of a letter. Anybody seen a typewriter?). I remember seeing those 'odd' ligatures like 'fj' in type cases. That was 1960s.

The Open Type standard is an implementation of the Unicode Standard. At least for the Latin script, more correctly called Simple Script, you get numeric codes only for basic, alphabetic letters. The Latin Script in computer jargon means the set of numeric codes that were assigned first come first basis to Europeans starting with English. For instance, Icelanders requested and got the Old English letters like þ, ð, æ encoded into the single-byte code set. The term character was borrowed from Computer Science to mean basic letters, graphical shapes, spaces, some diacritics and so on that have their own publicly known numeric codes.

Then people started requesting codes for ligatures. The 'f' ligatures of English were issued the first few slots in the Private User Area. It was only a stop gap measure until font makers understood how to make character substitution tables that point to glyphs of ligatures. Ligatures do not have publicly declared numeric codes. Instead, they have private codes declared within fonts.

Ligatures are NOT characters. They are shapes representing joint letters or shapes of those transformed into their own unique shapes that the user of a particular script recognizes.

Where does the 'fj' ligature go? It goes into a font made specifically to show the script of the language that has that ligature (Norwegian?). Just like the good old days of letterpress, fonts are for the author's purpose.
Comment 28 JC Ahangama 2015-09-18 17:53:36 UTC
I forgot to say:
The type of a ligature, Standard, Contextual, Discretionary are defined in each font, not by Unicode. Programs like word processors are supposed to show the Standard ligatures BY DEFAULT and it happens without any effort by the program.  All browsers, Windows Notepad, all basic editors in Linux follow this rule.

Apache has arbitrarily decided not to do it for Write.
Comment 29 JC Ahangama 2015-09-18 18:03:20 UTC
CORRECTION:
Pardon the forgetfulness of the old man. Browsers need declaration of the type of ligatures to be shown. But why do 'plain text' editors show standard ligatures? Should be an outrage, nein?
Comment 30 JC Ahangama 2015-09-20 22:27:26 UTC
There are three objections to my suggestion that Write should follow the OT Standard when treating Standard ligatures. 

The first objection is that the request is too trivial to bother about. The next two objections are specifically directed against my solution for Indic scripts. My solution eliminates complexities of Unicode Indic leveling the languages with Western European languages tremendously helping to reduce illiteracy in India. My solution does not violate any technical or linguistic standard but upholds them.

I have demonstrated that Indic can be romanized without loss or distortion. This fact is proven by simply displaying romanized Singhala in the native script by using  an orthographic smart font. You test by phonetically typing the text. 

The objections are as below.

Objection 1 -- Trivial issue

It is indeed trivial for those who see no difference between implementation of Standard ligatures and no ligatures at all. This is the case with users of the Latin script except some who think there are language level ligatures. What they do not realize is that Write actually interferes with the display of ligatures prematurely or unnecessarily. There is no requirement in the Standard to prevent display of Standard ligatures provided by a font. They should show by default. In the case of Desktop Publishing software, you provide options to the author for selecting non-standard ligature types.


Objection 2 -- The language of the text cannot be discerned from codes

Although it sounds technical, no engineer would say it because they know immediately that there is no definitive way to identify languages just from the underlying code set because many languages share the same script, notoriously, Latin.

The way to identify the language in the case of web pages is simple. The language tag will tell. (e.g. lang="en-US"). In other cases, you compare frequencies of character codes to known code frequency charts of languages and increase the confidence level by identifying frequently occurring words. (Google won't confuse Singhala and Indonesian / Icelandic if they do this).


Objection 3 -- Rendering large number of ligatures is math intensive and could cause the computer to crash

Again, this is not a statement by an engineer. A font responds to a keystroke by supplying the drawing coordinates and glyph drawing instructions. If the glyph returned is a base character, add the Advance Width of the already painted glyph to its origin of coordinates to get the origin for the new glyph. If the new glyph is a ligature, the existing origin is the origin of the new glyph as well. You might wonder if a font full of ligatures would paint the text faster everything else being equal. (Mixed Singhala text with my font is already 98% ligatures and shows no difference in performance. Mozilla in 2008 reported that the difference is 2% without saying which way).

=====================================
My plea is simply to get rid of code that handles OT features that I suspect relies on the OS, and to reserve them for an advanced version of Write, may be WriteAndMore made for desktop publishing. Trusting Uniscribe in Windows adds a level of unknown which, in this case, is faulty.
Comment 31 Marcus 2017-05-20 10:44:51 UTC
Reset the assignee to the default "issues@openoffice.apache.org".