Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Update Hyphen hyphenation library (improved hyphenation) and English hyphenation patterns | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | lingucomponent | Reporter: | nemeth.lacko | ||||||
Component: | other | Assignee: | stefan.baltzer | ||||||
Status: | CLOSED FIXED | QA Contact: | issues@lingucomponent <issues> | ||||||
Severity: | Trivial | ||||||||
Priority: | P3 | CC: | cmr, issues, rene, thomas.lange, timar74 | ||||||
Version: | OOo 3.2 | ||||||||
Target Milestone: | 3.4.1 | ||||||||
Hardware: | Unknown | ||||||||
OS: | All | ||||||||
URL: | https://sourceforge.net/projects/hunspell/files/Hyphen/2.7/ | ||||||||
Issue Type: | DEFECT | Latest Confirmation in: | --- | ||||||
Developer Difficulty: | --- | ||||||||
Attachments: |
|
Description
nemeth.lacko
2010-02-23 16:19:06 UTC
nemethl: sorry, but it doesn't build; missed a file in the tarball? [...] make[1]: Entering directory `/tmp/hyphen-2.5' perl ./substrings.pl hyphen.us3 hyphen.us4 UTF-8 2 3 >/dev/null cat hyphen.us4 | /bin/sed -f ./ooopatch.sed >hyph_en_US.dic /bin/sed: couldn't open file ./ooopatch.sed: No such file or directory make[1]: *** [hyph_en_US.dic] Error 4 make[1]: Leaving directory `/tmp/hyphen-2.5' make: *** [all-recursive] Error 1 Rene: thanks, I have updated the file. I have found a newly introduced problem in hyphenation of OpenOffice.org 3.2. I have solved the ugliest 1-character distance hyphenation from dashes (eg. something—t=wo, ad=d-on) by the new release of the improved English dictionaries (http://extensions.services.openoffice.org/hu/project/dict-en-fixed), but I will make a new Hyphen release to solve the others. Confirmed by the Slovenian NLP. (2010/7/13 Martin Srebotnjak <miles@filmsi.net>: > Hello, Laszlo and Caolan, > > Slovenian users reported having problems with Slovenian hyphenation in > OpenOffice.org. Mojca Miklavec who worked on updates of LaTeX > hyphenation reported it already some time ago. Now we tested it and we > are baffled. > > We first noticed problems with words, that have syllables starting > with our special characters like "č", "š" and "ž" (words like > "zaživeti", "načeloma" and "rešitev"). Openoffice.org does not offer > hyphenation before those syllables; some common words with hyphenation > would be: "za-ži-ve-ti", "na-če-lo-ma", "re-ši-tev"). But we found > words also without č,š or ž that are not hyphenated properly, like > "poleteti"; OpenOffice.org splits it like "pole-te-ti", while it > should be "po-le-te-ti". Here is a nice online tool can display > current OpenOffice.org hyphenation for Slovenian: > http://www.ushuaia.pl/hyphen/?ln=en > > The same patterns are used in LaTeX and reportedly work fine. We > checked the file and noticed it was in ISO-1 and not in UTF, but that > does not seem to be the problem, as I converted them to UTF and had > same problem. I even created a test dict pack with it (with UTF-8 > hyphenation patterns) here: > http://dl.dropbox.com/u/4316668/pack-sl.oxt > > We tested this on OOO330m0 and on 3.2.1 and on older versions and the > problems are the same. Obviously this goes on from the start just no > one noticed it. I first contacted Thomas Lange, and after checking > that the patterns do include the rules for above mentioned words and > that the encoding itself might not be the problem, he mentioned, that > the hyphenation included in OpenOffice.org might not be equal to the > LaTeX hyphenation. So I looked up who the owners of the > Hunspell/Hyphen project are and found you. :) > > So, I have a plea for help - could you look into Slovenian hyphenation > rules and Hyphen code at least for these few words and see what the > problem might be? If it is something trivial we would try to run for > 3.3 release, otherwise we need to plan needed work for future > versions.) The new NOHYPHEN feature of Hyphen 2.7 can fix the hyphenation problem of words with hyphen characters, also the old one with the apostrophes. Created attachment 75148 [details]
Improved English hyphenation dictionaries
Attached English hyphenation dictionaries (improved version of the last English hyphenation patterns of OOo) have solved both of the hyphenation problems with hyphen and apostrophe characters (1. missing word boundary patterns, ie. TeX "1foo." pattern didn't match the "1foo's" in OpenOffice.org (but this was not problem for TeX). 2. bad hyphenmin values). (A little correction: "1foo." pattern matched the "barfoo's" word in OOo, thanks to a difficult trick, but this was not true for the words with hyphen characters, or words with other apostrophe position and combinations.) tl->nemeth: Is there some action for me to take right now? nemeth->tl: If I right know, this is an important fix for some Indic languages with UTF-8 encoded hyphenation patterns. Moreover, the new English hyphenation patterns solve several hyphenation problems, too. I would be glad of your help. Test cases (words with hyphen) for the improved Hyphen library and English dictionaries: old: en=glish-speaker new: eng=lish-speaker old: non-metropolitan new: non-met=ro=pol=i=tan old: un=der-sh=er=iff new: un=der-sher=iff old: twen=ty-one new: twenty-one Created attachment 75168 [details]
A list with ~450 words with bad (<) and fixed (>) hyphenation.
There is a new bug fix release from the library: http://sourceforge.net/projects/hunspell/files/Hyphen/2.7/hyphen- 2.7.1.tar.gz/download tl->nemeth: We use your extension from http://extensions.services.openoffice.org/en/project/dict-en-fixed with OOo. That one has a hyph_en_US.dic and a hyph_en_GB.dic, am I correct to assume that the word list diff should be applied to both of them? Also I'm going to keep the extension identifier but will just modify the version entry to match the current date. nemeth->tl: you are correct, these are the replacements of the latest hyphenation patterns. Thanks in advance for the extension modification, too. Updated hyphen library in OOo to v2.7.1. Files changed: M hyphen\makefile.mk M ooo.lst A hyphen\hyphen-2.7.1.patch R hyphen\hyphen-2.4.patch Applying nohyphenfix.txt patch file still outstanding. . . nemeth->tl: many thanks for it. tl->nemeth: Since there seems to be nothing left to do for this I'm setting this to fixed. If there is something left to do for me with that word list patch let me know. And thanks for the update! ^_^ TL->SBA: Please verify. Thanks! Verified in CWS tl84. nemeth->tl,sba: many thanks again for the integration and the verification. |