Issue 54980 - Add œ and æ ligatures in the FR_fr dictionary
Summary: Add œ and æ ligatures in the FR_fr dictionary
Status: RESOLVED FIXED
Alias: None
Product: General
Classification: Code
Component: spell checking (show other issues)
Version: 3.3.0 or older (OOo)
Hardware: All All
: P3 Trivial with 15 votes (vote)
Target Milestone: ---
Assignee: nemeth.lacko
QA Contact: issues@lingucomponent
URL:
Keywords:
: 67557 (view as issue list)
Depends on:
Blocks:
 
Reported: 2005-09-22 19:03 UTC by cbrunet
Modified: 2013-02-24 20:42 UTC (History)
6 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
List of oe and ae ligatures etract from fr_FR dict. (8.24 KB, application/vnd.oasis.opendocument.text)
2005-10-27 14:29 UTC, cbrunet
no flags Details
Web statistics about French words with ligature (4.49 KB, text/plain)
2006-05-08 11:08 UTC, nemeth.lacko
no flags Details
Proposed patch for fr_FR dict, hyph and thes (517.48 KB, patch)
2006-05-10 15:34 UTC, cbrunet
no flags Details | Diff
New patch and autocorrection file. (630.22 KB, application/x-gzip)
2006-05-11 01:28 UTC, nemeth.lacko
no flags Details
Test data (30.48 KB, application/vnd.oasis.opendocument.text)
2006-05-11 01:35 UTC, nemeth.lacko
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description cbrunet 2005-09-22 19:03:07 UTC
In French typography, there is a ligature between oe and between ae when those 
2 letters are in the same syllable. For example, we should write 'cœur' 
and 'œuvre', but 'coexister'. Words with the œ and æ ligatures should be added 
to the dictionary.
Comment 1 nemeth.lacko 2005-10-20 00:35:30 UTC
MySpell and Hunspell haven't supported the ISO-8859-15 (extended Latin-1)
character encoding (encoding of French MySpell dictionary), yet.

I will fix it in Hunspell.
Comment 2 nemeth.lacko 2005-10-26 13:54:14 UTC
Hi,

I have added the ISO-8859-15 character encoding to Hunspell, but there are only
"coeur" and "oeuvre" in the French dictionary.

Perhaps it is not problem, because OpenOffice.org is not a DTP system.

This question depends from the official French _ortographical_ rules.

For example, the following issue is a typical orthographical problem for French:
http://qa.openoffice.org/issues/show_bug.cgi?id=29085

`The ligature œ is a mandatory contraction of oe in certain words'
(http://en.wikipedia.org/wiki/French_(language))

If œ usage is obligatory, we can make a "sctricter" French dictionary only with
the œ forms. (We can also add default automatic œ conversions to the AutoCorrect
Replace table.) 

Could you describe the French orthographical rules about ligatures
(and about uppercase letters at issue #29085)?

Thanks.

Laci

Comment 3 cbrunet 2005-10-27 14:29:11 UTC
Created attachment 30919 [details]
List of oe and ae ligatures etract from fr_FR dict.
Comment 4 cbrunet 2005-10-27 14:45:59 UTC
I don't know if those ligatures should be mandatory or not. Many French 
speakers ignore they exists. In MS Word 2000, they are mandatory, and there is 
some autocorrections for the most commons of them, but not for all, and there 
are many errors in the dictionary (some ligature aren't there).

What is sure, it's that if they are mandatory, we must autocorrect them. And we 
cannot simply replace all oe by œ, because the ligature isn't in all words. The 
rule is that if the o and the e are in the same syllable, there is a ligature. 
But some words that come from other languages don't have the ligature.

In the attached document, I extracted all the words with oe and ae from the 
fr_FR dict, and I corrected all the words with the ligature. I put a = in front 
of words without the ligature, a + in front of some words I added and a ? in 
front of words I didn't find.
Comment 5 cbrunet 2006-05-05 16:31:17 UTC
Now that Hunspell support extended Latin-1, I think dictionnaries files should
be updated:
- replace oe with œ and ae with æ in fr-FR.dic , where it applies.
- add œ Œ æ and Æ to the characters list in fr-FR.aff .
- add autocorrections to change oe and ae to œ and æ when it is needed (If the
list is too big, it should be done at least for the most commons words).
- update hyphenator and thesaurus (do they support ISO-8859-15 ?)
Comment 6 nemeth.lacko 2006-05-08 11:06:38 UTC
> Now that Hunspell support extended Latin-1, I think dictionnaries files should
> be updated:
> - replace oe with œ and ae with æ in fr-FR.dic , where it applies.
- add œ Œ æ and Æ to the characters list in fr-FR.aff .

Plus some REP rules for the right suggestion:
REP œ oe
REP oe œ
REP æ ae
REP ae æ

- add autocorrections to change oe and ae to œ and æ when it is needed (If the
list is too big, it should be done at least for the most commons words).

All words with all suffixed forms are OK (<1000 word forms).

- update hyphenator and thesaurus (do they support ISO-8859-15 ?)

Yes, they support both of ISO-8859-15 and UTF-8.

Regards,

Laci

PS.. I have made some web statistics. It seems, there are some dubious words:
angstroem, coelacanthe, coenure, Groenland, loess, noethérienne, oedémateuse,
oedieneme, oeillere, oeilleton*, oenanthe, oenothera-oenothere (perhaps the
scientific name of the flower overlaps with the French name, I don't know),
oersted, oerstite, oestre, oestrus, poecile, roentgen, soeurette, and
caecale, elaeis, taenia, uraeus. I would be fine to check these words in the
official French orthographical dictionary (plus the proposed new words). I also
attach the web data.

Comment 7 nemeth.lacko 2006-05-08 11:08:44 UTC
Created attachment 36312 [details]
Web statistics about French words with ligature
Comment 8 cbrunet 2006-05-10 15:34:55 UTC
Created attachment 36375 [details]
Proposed patch for fr_FR dict, hyph and thes
Comment 9 cbrunet 2006-05-10 15:40:16 UTC
I verified all word in a French dictionnary. There are only Toeplitz and Waerden
that I didn't found in any dictionnary, so I kept them without the ligature. I
neither found the verb boetter and its variations, but I'm pretty sure they
don't take the ligature if they exist (it's a matter of pronounciation).

Please verify the AFX file. I hope I didn't forgot cases for PFX and SFX. I only
changed the encoding in the hyph file, since it already contains œ ligature and
I didn't really understood its format.
Comment 10 nemeth.lacko 2006-05-11 01:25:38 UTC
Thank you! I have made a fixed patch (add missing affix conditions, and generate
new thesaurus index file with the th_gen_idx.pl MyThes utility) and extended the
French autocorrection file ([OOo]/share/autocorr/acor_fr-FR.dat).

Please, check the following test file in OOo. Thanks, Laci
Comment 11 nemeth.lacko 2006-05-11 01:28:19 UTC
Created attachment 36385 [details]
New patch and autocorrection file.
Comment 12 nemeth.lacko 2006-05-11 01:35:54 UTC
Created attachment 36386 [details]
Test data
Comment 13 nemeth.lacko 2006-05-11 02:10:19 UTC
Targeted to 2.0.4.

Changed to DEFECT, and P2 priority:

"In French, "œ" is a true linguistic ligature, not just a typographic one (like
the fi or fl ligatures), reflecting etymology. It is most prominent in the words
cœur ("heart"), sœur ("sister") and œil ("eye")." (Source:
http://en.wikipedia.org/wiki/OE_ligature). Handling ligatures is a competitive
feature, too (MS Office does it).

"However, many documents are prepared either on older systems incapable of
handling this character, by users who do not know how to enter it in their word
processor or by users who can not be bothered using special entry methods to get
it when oe will do. Handwriting generally does not make the distinction between
"oe" and "œ"."

OpenOffice.org is capable of handling French ligatures, and users don't need
special entry method, thanks for the autocorrection tool and the spell checker
suggestions of OOo.
Comment 14 cbrunet 2006-05-11 14:23:43 UTC
Thanks for your help, Laci. It seems to work well. The only problem I found is
that if I type a word in UPPERCASE, or with the first letter in Capital, it will
be autocorrected all in lowercase. Maybe is it an other issue?
Comment 15 nemeth.lacko 2006-05-15 12:06:17 UTC
Yes, it is an other issue: unfortunatelly, autocorrection is case independent
now. So for problematic (ie. typical or frequently sentence beginning) words we
need modify the autocorrection lexicon. (By the way, uppercasing with Uppercase
character format works well.)

If I know correctly, French ligature support is default, but can switch off in
MS Office, so we may need an optional permissive Frech dictionary with both of
ae, oe and œ, æ for DictOOo. (Old documents contain bad ae, oe forms.)

(Thanks for your thank! I think, for OOo conference in Lyon, we need better
French support.:)
Comment 16 pavel 2006-07-14 17:43:25 UTC
What is the status of this issue?
Comment 17 maison.godard 2006-07-14 18:03:05 UTC
this issue must be tested carrefully before integration  -a cal will be made to
french team
But, ready to deploy packages would be usefull for users that want to test

- is the ooo1 version targeted ?
- the dictionnary files are the same between ooo1 and ooo2 but autocorrection
files not 
- there may be a misunderstanding of spellcheckers between versions
- thesaurus difference is already taken into account, no problem 
Comment 18 sgautier.ooo 2006-07-25 06:21:07 UTC
*** Issue 67557 has been marked as a duplicate of this issue. ***
Comment 19 nemeth.lacko 2006-08-03 01:50:40 UTC
It seems, fr_FR dictionary hasn't been integrated in OOo's source yet, so this
patch is for French OOo. Improved autocorrection file works only with the new
dictionaries, so firstly I suggest to use this patch in French OOo 2.0.4,
temporarily with default mixed (with and without ligatures) dictionaries. It
would be fine to sign the mixed dictionaries as "deprecated" in DicOOo. After a
waiting period, autocorrection file would be added to the main OOo's source,
too, maybe with default strict dictionaries.

Please, decide about French OOo target, because I can help in developing mixed
dictionaries. Thanks, Laci
Comment 20 nemeth.lacko 2006-08-03 02:03:21 UTC
I'm sorry, there are French dictionaries in DicOOo and OOo site, so we can
change them.
Comment 21 nemeth.lacko 2006-08-03 02:42:50 UTC
I think, mixed dictionaries can be made with a simple diff between the original
and the patched versions of the dictionaries, if anybody wants to make them
before 6 August.
Comment 22 pavel 2006-08-08 09:25:52 UTC
target 2.x.
Comment 23 hagar_de_lest 2006-12-14 08:40:53 UTC
For information, for those coming on this issue, there is a temporary method to
add the ligatures, based on the files given above. See this thread (in French) :
http://www.forum-openoffice.org/forum/sutra15783.html#15783
Comment 24 auberon 2007-12-12 11:46:49 UTC
Words with ligatures have been added to the new french dictionary. The correct
spelling is considered as mandatory.
REP rules for suggestions have been added too.

http://qa.openoffice.org/issues/show_bug.cgi?id=83224

I think you can close this issue.
Comment 25 maison.godard 2007-12-12 11:50:06 UTC
solved in new dictionary