Apache OpenOffice (AOO) Bugzilla – Issue 4638
ligature support
Last modified: 2017-05-20 10:44:51 UTC
it would be nice to be able to enable ligatures (for fi, fl, ff, ffl, ffi). You can a look at Indesign which has the same feature. I would additionally like to have a plugin which can switch off ligatures for certain words taking respect of a kind of blacklist of words which don't may have ligatures - in German there are lots of such exceptions. Doing the plugin which does the same as my "rmligs" script for LaTeX files will however be a later step, after ligature support is there.
Reassigned to Christian.
Reassiged to Bettina.
There are more ligatures than the "f" combinations which should be supported if ligatures are to be supported. "Th" is another common ligature. Adobe InDesign 2.0 has implemented full ligature support, in which ligatures are placed during typing. Backspacing over ligatures correctly replaces the ligature with the remaining characters (except in the case of "ffi" and "ffl", in which case, backspacing over it first replaces the ligature with the "ff" ligature), so that, to the user, ligatures are transparently implemented. I believe this would be the preferred method, rather than implementing some sort of scripted solution. InDesign also specifies two strata of ligatures. The common ones: "fi", "fl", "ff", "ffi", "ffl", and "Th" are in the first strata. "ct" and other fairly uncommon ligatures are in the second strata. The two strata can be enabled independently of one another. This allows one to use the ligatures which are in common use, without being forced to use the ones which are fairly old, and out of use. Note: The Adobe Pro series of OpenType fonts contain complete ligature support internally. I have quite succesfully used these fonts in OpenOffice (Linux) by converting them to TrueType. Obviously, this will not be necessary in the future, when OpenType support is implemented.
of course, the way you described it it should be done, however there have to be exception dictionaries for words, where certain ligatures are typographically incorrect. In German there are many of them, usually word barriers of compound words.
It's been 2 years since this issue was filed and it still has the "NEW" status. It's extremely annoying to have to find&replace all ligatures manually, and then OOo doesn't even understand the ligatures when doing a spellcheck, which renders the spellchecker pretty useless.
Agreed. If I had any votes for word processor issues, I'd vote for it. My main irritation is how spellchecker treats ligatures. I have given up on using them in OOo, however, as it simply does not handle them correctly.
I can confirm that using ligatures confuses the spell checker. For instance, if you replace the "fi" in "configure" with the fi ligature, it's flagged as a spelling mistake. I can add the "new word" to the dictionary, but the spell checker should be smart enough to break the ligature apart so this isn't necessary. While it would be nice to have InDesign-level ligature handling, I don't expect that from a word processor. If just the spell checking issue and Issue 54825* were fixed, I'd be satisfied. * Summary of Issue 54825: if you say Insert | Special Character -- such as to insert a ligature -- then change the style for the paragraph to use a different font, the special character remains in the old font. It affects more than just ligatures, so I filed it as a separate bug.
*** Issue 62291 has been marked as a duplicate of this issue. ***
The statistics have to be improved also : Try a word with a ligature like 'enfin' (French) 'fi' = F001. -> OOo counts 4 characters (correct, even if 'fi' is in fact 2 letters) and 2 words (incorrect, there is only one word). -> MS Word counts 4 characters and 1 word.
> Try a word with a ligature like 'enfin' (French) 'fi' = F001. > -> OOo counts 4 characters (correct, even if 'fi' is in fact 2 letters) Huh? Even if the two letters "fi" are replaced by a ligature on output they are still two letters, not one. The statistics should count letters, not glyphs. What on earth would be the point of counting glyphs instead?
Word and character count is correct for fi and fl ligatures mapped to the Private Use Area, but not for ff fi fl ffi and ffl ligatures mapped to the Alphabetic Presentation forms. This should be updated as Unicode encoding is the new standard. "ve owers" is counted correctly, but not "five flowers."
Word and character count is correct for fi and fl ligatures mapped to the Private Use Area, but not for ff fi fl ffi and ffl ligatures mapped to the Alphabetic Presentation forms. This should be updated as Unicode encoding is the new standard.
Hi Mathias, I have changed the current owner to your owner. Please take the ownership of these enhancements.
I have been using Microsoft Office 2010 Technical Preview, and Word now supports the advanced OpenType properties (ligatures, contextual alternatives and so on). It would be nice to have them on OpenOffice.org too.
*** Issue 93584 has been marked as a duplicate of this issue. ***
Some of the Graphite SIL fonts (eg. Charis SIL) support ligature replacement with OpenOffice.org 3.2. I have made a Graphite font version from the Linux Libertine font with small caps, old style numbers etc. and two-level ligature support, see http://numbertext.org/linux/. The Graphite font has optional features to change the digits to the number names. These features can be switched on by the OpenOffice.org typography toolbar (http://extensions.services.openoffice.org/en/project/typo), or eg. adding manually the feature ids to the font name in character formatting: Magyar Linux Libertine G:204=0 (no ligatures) Magyar Linux Libertine G:204=1 (default: Qu and f ligatures) Magyar Linux Libertine G:204=2 (also the old st and ct ligatures) There are some hyphenation problems in Graphite integration of OpenOffice.org (Issue 111272) yet, but Graphite has excellent possibilities in typography and solving some interesting issues, like missing numbering types (Issue 92730).
> The Graphite font has optional features to change the digits to the number names to change the character sequences to ligatures and much more (Graphite is a programmable font format with its GDL language)
Created attachment 69801 [details] example generated by OpenOffice.org 3.2 with automatic ligature handling
The new Linux Libertine G supports all OpenType ligatures of the original Linux Libertine (Th, fk, tt etc.) with the more confortable 4-letter feature ids. http://www.numbertext.org/linux
Created attachment 71863 [details] Font features of Linux Libertine G and Linux Biolinum G
joaopaulo1511 said the following: "I have been using Microsoft Office 2010 Technical Preview, and Word now supports the advanced OpenType properties (ligatures, contextual alternatives and so on). It would be nice to have them on OpenOffice.org too." I second that wholeheartedly. Type foundries should start applying OpenType standard (2003) to their fonts. MS Office 2010 does ligatures correctly. Notepad did it since 2006, Adobe since 2004. I made a font with 400 (FOUR HUNDRED) plus ligatures and they form perfectly inside the browsers, Firefox, Safari, Lunascape and Arora plus Adobe InDesign and PhotoShop: http://www.lovatasinhala.com/ The font: http://www.lovatasinhala.com/avazyabadu/samagana.ttf The input is raw Latin letters within ISO-8859-1. All the ligatures are in the PUA of the font. I think OOw programmers should consult Mozilla on how they do this in Firefox and Thunderbird's mail editor. (IE makes the ligatures halfway and gives up and picks up again seeminlgy randomly. It is slow, and the letters are scratchy looking if you have Clear Type turned on, aaiyaiyaa! OOw is similar. Install my font and copy a para from the web site into OO-Write to see what I mean). I use the following CSS3 rule in the web pages: text-rendering: geometricPrecision / optimizeLegibility (both work) I have no clue why it works except that the Lookup Tables in the font actually offer the ligatures. There is no problem in searching component letters embedded inside a ligature. (My base letters are, of course, awfully different from their common Latin shapes, sorry). I have also extended JavaScript sort() to sort this text according to Sinhala (extended Sanskrit) collation order and it works perfectly in spite of ligaures. There is one Unicode character that is suppoosed to be intervened between two letters when you do not want them to join: u200C called ZWNJ. I use that inside my web pages. Thanks.
When I create a new document, set the font to "Linux Libertine G" and try to type the sentence, "So I finally have ligatures ...", when I get to the fi, Writer crashes. This happens with or without the Typography Toolbar installed. I'm running Win7 on Intel.
This now is a 13-year-old issue. With OpenType finally getting the hold it should have, and even Microsoft Office supporting ligatures, it is getting ridiculous that OpenOffice seems to get renegated to a secon… tertiary platform for office programs. Even though not many people have voted for this, I do believe this is because of the lack of knowledge, rather than a disinterest. I do hope this issue gets resolved soon. (I was going to say ‘sooner, rather than later’, but considering this has been an issue for so long, I would say we are way into the ‘later’ by now.) I am for the first time, since I started using the OOo package (whilst it was in version 1.0.2), seriously considering changing to Microsoft Office, all because of this.
Open Type (OT) standard is now more than 10 years and its successor Open Font is several years too. If any font has ligatures, they should be created in the Private Use Area (PUA) and addressed using lookup tables. These lookup tables can be categorized under OT features 'liga', 'rlig or 'dlig'. The features in turn come under a script. For example, English and German are Latin scripts (latn for OT). rlig and dkig are for special ligatures used in high end programs. (MS Word has selection criteria for them, but read on). OT says that standard ligatures should be displayed by default. <<= NOTE Now let's create one font with the English F ligatures and another with German Fraktur ligatures both under the 'liga' feature. The ligatures are then standard ligatures. They are still Latin script fonts. Depending on the font, we can show this text you are reading either in English or classic German -- perfectly inside Linux but, unfortunately not inside Windows. Office 2007, 2010 and 2013 all have the capability to show ligatures but a Windows update prevents it. Standard ligatures of a font will show inside Windows Notepad, Linux Geany, Abiword, All Adobe programs, gNumric, Excel and any popular browser. ****MS Word, OO and LO Write will all fail inside Windows.**** OO and LO do not have that problem inside Linux! If you logically analysze this problem, we see that MS Word, OO and LO tell the OS to render the font. If they just took the glyphs that the font handed them they would not have this problem. That is what Geany, a simple TEXT editor does and Excel does too that does not care about decorating texts any special way. I think as a start, OO and LO should just assume that the font conforms to OT and if it has regular ligatures that it will hand them and they can be used for line justification. A ligature is still more than one character, hence the cursor will step into it. I created a font that is 99% ligatures. It shows romanized Singhala as complex Singhala script. Here is the font. http://smartfonts.net/ttf/aruna.ttf If you install that font and view the following ODT file inside Windows and Linux environments you'll see what I mean. (No problems with Abiword) http://smartfonts.net/ttf/aruna.ttf. A web site of romanized Singhala but looking like complex Singhala with that orthographic smartfont above: http://lovatasinhala.com/
Created attachment 84744 [details] Test file with ligatures. This is the file I did attach to the earlier message JC
This issue seems to have gotten renewed interest. For the sake of providing all relevant information, I would like to point out that also the fj-ligature is a standard ligature in Norwegian and similar languages.
Where is 'fj' ligature? I think the 'renewed' interest in ligatures is because techies newly entering the field are catching up with Open Type (now Open Font) technology. Some are understandably confused. That ligatures are separate things outside letters was not an issue during letterpress days. Those days, the printer (as I was then) just composed the text including ligature types if the author required them and the reader saw them. (A type is a stamp of a letter. Anybody seen a typewriter?). I remember seeing those 'odd' ligatures like 'fj' in type cases. That was 1960s. The Open Type standard is an implementation of the Unicode Standard. At least for the Latin script, more correctly called Simple Script, you get numeric codes only for basic, alphabetic letters. The Latin Script in computer jargon means the set of numeric codes that were assigned first come first basis to Europeans starting with English. For instance, Icelanders requested and got the Old English letters like þ, ð, æ encoded into the single-byte code set. The term character was borrowed from Computer Science to mean basic letters, graphical shapes, spaces, some diacritics and so on that have their own publicly known numeric codes. Then people started requesting codes for ligatures. The 'f' ligatures of English were issued the first few slots in the Private User Area. It was only a stop gap measure until font makers understood how to make character substitution tables that point to glyphs of ligatures. Ligatures do not have publicly declared numeric codes. Instead, they have private codes declared within fonts. Ligatures are NOT characters. They are shapes representing joint letters or shapes of those transformed into their own unique shapes that the user of a particular script recognizes. Where does the 'fj' ligature go? It goes into a font made specifically to show the script of the language that has that ligature (Norwegian?). Just like the good old days of letterpress, fonts are for the author's purpose.
I forgot to say: The type of a ligature, Standard, Contextual, Discretionary are defined in each font, not by Unicode. Programs like word processors are supposed to show the Standard ligatures BY DEFAULT and it happens without any effort by the program. All browsers, Windows Notepad, all basic editors in Linux follow this rule. Apache has arbitrarily decided not to do it for Write.
CORRECTION: Pardon the forgetfulness of the old man. Browsers need declaration of the type of ligatures to be shown. But why do 'plain text' editors show standard ligatures? Should be an outrage, nein?
There are three objections to my suggestion that Write should follow the OT Standard when treating Standard ligatures. The first objection is that the request is too trivial to bother about. The next two objections are specifically directed against my solution for Indic scripts. My solution eliminates complexities of Unicode Indic leveling the languages with Western European languages tremendously helping to reduce illiteracy in India. My solution does not violate any technical or linguistic standard but upholds them. I have demonstrated that Indic can be romanized without loss or distortion. This fact is proven by simply displaying romanized Singhala in the native script by using an orthographic smart font. You test by phonetically typing the text. The objections are as below. Objection 1 -- Trivial issue It is indeed trivial for those who see no difference between implementation of Standard ligatures and no ligatures at all. This is the case with users of the Latin script except some who think there are language level ligatures. What they do not realize is that Write actually interferes with the display of ligatures prematurely or unnecessarily. There is no requirement in the Standard to prevent display of Standard ligatures provided by a font. They should show by default. In the case of Desktop Publishing software, you provide options to the author for selecting non-standard ligature types. Objection 2 -- The language of the text cannot be discerned from codes Although it sounds technical, no engineer would say it because they know immediately that there is no definitive way to identify languages just from the underlying code set because many languages share the same script, notoriously, Latin. The way to identify the language in the case of web pages is simple. The language tag will tell. (e.g. lang="en-US"). In other cases, you compare frequencies of character codes to known code frequency charts of languages and increase the confidence level by identifying frequently occurring words. (Google won't confuse Singhala and Indonesian / Icelandic if they do this). Objection 3 -- Rendering large number of ligatures is math intensive and could cause the computer to crash Again, this is not a statement by an engineer. A font responds to a keystroke by supplying the drawing coordinates and glyph drawing instructions. If the glyph returned is a base character, add the Advance Width of the already painted glyph to its origin of coordinates to get the origin for the new glyph. If the new glyph is a ligature, the existing origin is the origin of the new glyph as well. You might wonder if a font full of ligatures would paint the text faster everything else being equal. (Mixed Singhala text with my font is already 98% ligatures and shows no difference in performance. Mozilla in 2008 reported that the difference is 2% without saying which way). ===================================== My plea is simply to get rid of code that handles OT features that I suspect relies on the OS, and to reserve them for an advanced version of Write, may be WriteAndMore made for desktop publishing. Trusting Uniscribe in Windows adds a level of unknown which, in this case, is faulty.
Reset the assignee to the default "issues@openoffice.apache.org".