Issue 79498

Summary: No support for ligatures in spellcheck
Product: General Reporter: aleksandersen <aleksandersen>
Component: spell checkingAssignee: AOO issues mailing list <issues>
Status: ACCEPTED --- QA Contact:
Severity: Trivial    
Priority: P4 CC: issues, Mathias_Bauer, openofficeissuezilla
Version: 3.3.0 or older (OOo)   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: ENHANCEMENT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on: 113785    
Issue Blocks:    

Description aleksandersen 2007-07-11 19:41:45 UTC
OpenOffice.org spell checking does not understand ligature characters.

As an example ‘find’ and ‘find’ means the same. However OpenOffice.org flags the first 
as invalid because it does not understand that ‘fi’ means ‘f’ and ‘i’.

For information about ligature characters at Wikipedia: http://en.wikipedia.org/wiki/
Typographical_ligature
Comment 1 milek_pl 2007-10-17 18:44:41 UTC
This is a speller enhancement, setting the component correctly, and adjusting
priority.

@nemeth -> there could be some hunspell setting in the affix file, as adding all
ligature words to the dictionary seems only a hack. As of now, I can see no such
setting. For en_US.aff, it could be:

EQUAL 2 #treat characters as equal
EQUAL fi fi
EQUAL fl fl

Comment 2 milek_pl 2007-10-17 18:45:50 UTC
.
Comment 3 nemeth.lacko 2008-12-07 03:24:04 UTC
Good news: Handling fi, fl etc. ligatures can be solved with the new ICONV
feature of Hunspell 1.2.8 in the near future. But we need Unicode en_US
dictionary and extended word breaking (ligatures are not word characters in OOo
yet) to handle ligatures as word characters.

Milek: Thanks. The syntax is quite similar to your suggestion:

# input conversion
ICONV 2
ICONV fi fi
ICONV fl fl

Moreover, optionally you can also add the following lines to the affix file to
get suggestions with ligatures:

# output conversion
OCONV 2
OCONV fi fi
OCONV fl fl

It would be nice to make the OOo-typography extension:

- automatic ligature conversion by autocorrection (bug: autocorrection
decapitalizes capitalized not sentence-starting words)
- automatic/manual ligature conversion by a macro (after file loading, before
file saving or printing)
- extended hyphenation dictionary with non-standard hyphenation patterns to
hyphenate the words with ligatures (later we need to correct the hyphenator to
calculate precise character counts with ligatures, too)
- spelling dictionary with default ICONV and OCONV

We might have OpenOffice.org with this extension looking like a
semi-professional DTP program.
Comment 4 decamps 2009-11-30 22:15:41 UTC
Is that not a duplicate of #4638 ?
Comment 5 nemeth.lacko 2010-03-10 08:51:41 UTC
I have solved the en/em dash problem of OpenOffice.org 3.2 with an improved
English dictionary extension setting the ICONV for f ligatures, too:

http://extensions.services.openoffice.org/en/project/dict-en-fixed

With this English dictionaries the spell checker recognizes the words with
Unicode f ligatures. Moreover, the improved hyphenation patterns hyphenate
correctly the words with f ligatures, too.

I plan to add an option to the English module of the Lightproof grammar checker
extension to suggest ligatures (semiautomatic ligature handling). This is not an
automatic OpenType solution, but it is a big help to use OpenOffice.org for more
advanced tasks.
Comment 6 maccy 2010-03-10 09:24:43 UTC
but what is the real benefit for OOo? I think nobody will enter ligatures
manually into a text processor. Maybe a handful of people will do that but
probably only for headlines not for every word in running text.

So doesn't it make more sense to make ligatures a display-only thing (#4638) in
OOo and internally keep the text literally letter-by-letter which would also not
affect spell checking?
Comment 7 nemeth.lacko 2010-03-10 10:45:34 UTC
Yes, this is for a handful of people. I write a book about OpenOffice.org and
DTP, and I have positive experience with the semiautomatic ligature handling. F
ligatures are not too frequent (for example, 1,3% of the English words of
Orwell's 1984 contain fi, fl, ff, ffi or ffl characters). Lightproof underlines
these words and calling the local menu by Shift-F10 and choosing the alternative
word form with Unicode f ligature are not annoying (at least for Hungarian
texts, maybe with lesser ligatures). But for languages with minimal morphology
(like English) the full automatic ligature replacement can be made an
Autocorrect extension. 

OpenOffice.org has already supported Unicode f-ligatures in searching and
capitalization, so recognizing them by the spell checker is a natural extension.
Finally, we need a temporarily alternative for the upcoming Microsoft Office
2010 (with ligature handling). (I have no information about the OpenOffice.org
development to this direction.)

By the way, the automatic OpenType solution of ligature handling has also
potential problems: some languages, for example German doesn't use ligatures at
word part boundaries in compound words. Also the HYPHENMIN values depends from
the usage of ligatures. The fi- can be in the end of the lines in Hungarian, but
this hyphenation is deprecated with ligatures.

Related issues:

Issue 109543 (Update Hyphen hyphenation library (improved hyphenation) and en_US
hyphenation patterns)

Issue 71608 (Bad non-standard hyphenation of diaeresis and Unicode f ligatures)

Issue 56348 (Special letter characters in first letter position is not handled
by spell checking in Writer)

Comment 8 thomas.lange 2010-08-23 14:03:39 UTC
Adding reference to ligatures, there are Latin and Armenian ligatures in Unicode:
  http://www.unicode.org/charts/PDF/UFB00.pdf. 
They range from 0xFB00-0xFB06 and 0xFB13-0xFB17
Comment 9 Marcus 2017-05-20 11:29:25 UTC
Reset assigne to the default "issues@openoffice.apache.org".