Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Bad non-standard hyphenation of diaeresis and Unicode f ligatures | ||
---|---|---|---|
Product: | Writer | Reporter: | nemeth.lacko |
Component: | programming | Assignee: | AOO issues mailing list <issues> |
Status: | CONFIRMED --- | QA Contact: | |
Severity: | Trivial | ||
Priority: | P3 | CC: | bart.knubben, fonts-bugs, issues, simonbr, thomas.lange |
Version: | OOo 2.0.4 | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Issue Type: | DEFECT | Latest Confirmation in: | --- |
Developer Difficulty: | --- | ||
Attachments: |
Description
nemeth.lacko
2006-11-16 09:09:57 UTC
Created attachment 40618 [details]
Test document
Created attachment 40619 [details]
screenshot
Created attachment 40620 [details]
hyphenation pattern for test document (ISO 8859-1, only for Dutch exampel)
Created attachment 40621 [details]
hyphenation patterns for test data (Unicode, Dutch and Greek)
Created attachment 40622 [details]
dictionary.lst (link Unicode hyphenation patterns to en_GB (language of the test document)
Reassigned to SBA. Created attachment 40623 [details]
Better ISO-8859-1 hyphenation patterns: extended with patterns for omaatje and cafeetje.
SBA-TL: Please proceed. When looking at it with SRC680 m200 I found the following: - in SO the hyphenation position for reëel is re=ëel and the hyphenated word becomes re=eel. As of m202 the hyphenated word is re=ëel. - OOo the hyphenated word is also re=ëel The above results were directly obtained from the hyphenator. (You may use the Basic script below to check) Thus it is a problem of the specific implementations. As for SO there can nothing be done but report this to the vendor, and for OOo someone needs to patch the hyphenation patterns. Thus I'm reassigning this issue to lingucomponent. Sub Main xH = createUnoService("org.openoffice.lingu.LibHnjHyphenator") 'xH = createUnoService("com.sun.star.lingu2.Proximity.Hyphenator") dim nl_NL as new com.sun.star.lang.Locale nl_NL.Language = "nl" nl_NL.Country = "NL" xHW = xH.hyphenate( "reëel", nl_NL, 3, DimArray() ) 'xHW = xH.hyphenate( "Hundefutter", nl_NL, 3, DimArray() ) msgtxt = " " + xHW.getHyphenatedWord() + " " + xHW.getHyphenPos() msgbox msgtxt End Sub TL: Many thanks for the test and the example. Nemeth->TL: I have tried the script with the attached data, and I have got "re=eel" (reeel 1) and oma=tje (omatje 2), so it seems for me, it is the bug of OpenOffice.org's implementation, not the LibHnj non standard hyphenation extension. Maybe hyphenpos=1 wrongly forbidden by the 2-characters limit. Please, check my example, not the default Dutch hyphenation pattern. LibHnj executable works well on my example. Thanks in advance, Laci Created attachment 48684 [details]
screenshot (OOo messagebox with "reeel 1")
Created attachment 48685 [details]
screenshot (OOo messagebox with "omatje 2")
Testing with SRC680 m227: - SO: reëel gets hyphenated in the document as ree-el but the hypenator say it should be re-eel - OOo: reëel gets hyphenated as re-ëel and the hyphenator says the same. I don't know which hyphenator is right or wrong (and if the SO hyphenator result is wrong it can't be fixed on our side, it needs to be reported to the vendor). But clearly since the SO hyphenator says re-eel an actual document should behave similar. Thus we have a problem with the algorithm here. I don't see any problem with OOo hyphenator unless someone says that the result from the OOo hyphenator should not be re-ëel because that one is wrong. Does someone have input on the correct hyphenation of reëel? For the time being I will keep this issue and with since there seems to be a problem with the code for evaluating alternative spellings (as already expected). TL-Nemeth: I missed that the correct hyphenation for reëel was already listed as being re-eel. Thus the OOo hyphenator or it's dictionary file needs to be fixed. Since I will use this issue to fix the problem in the code for evaluating alternative spellings please submit a new one for either or both of the above changes in OOo. Nemeth->TL: Thanks for your check and comment. This bug report was only a theoretically problem with attached test data, because nobody worked on Dutch or Greek non-standard hyphenation patterns a years ago, when I checked my alternative/non-standard hyphenator patch in OpenOffice.org. But now here is the result of OpenTaal project, the extended Dutch hyphenation patterns, and OpenOffice.org (and StarOffice) can't handle half of the Dutch non-standard hyphenation described by the hyphenation patterns correctly. I believe, OpenTaal's activity and result (see http://www.linux.com/feature/116697 for example) and collaboration with OpenTaal is very important for the future of OpenOffice.org, because we would have official certificated spell checking and hyphenation in OpenOffice.org at least for one language. I have modified the language specifics summary according to your plan. Thanks in advance, Laci When checking this I found the problem is not the SvxGetAltSpelling function (which I suspected to be at fault). Instead it is with the actual implementation that evaluates that result and does the line breaking. That has two consequences: a) If that one is to be fixed it needs to be fixed in each application separately. Thus specific issues for Calc and Draw/Impress are required. b) I was told the area that is effected by the required change is quite tricky and troublesome to change. Also it looks to me that the actual problem itself is not about the diaeresis at all. But about the position of the text to be changed: When comparing it to alternative spelling in the now outdated German pre-reform spelling the problem is this - in German Bäc-ker changed to Bäk-ker when getting hyphenated - and in Dutch re-ëel should become re-eel The difference is that in the German example the char left to the hyphenation position changes (which is sufficient for German) where in the Dutch example it is the one to the right. The code parts that take care of alternative spellings in Writer are rather old and were probably implemented for German at that time. No one needed text changes to the right and thus it was never implemented... :-( If that one gets fixed it should be done future safe. That is: - the text change need not be directly next to the hyphen - it should allow for more than one letter changes to the left - it should allow for more than one letter changes to the right - it should allow for all of the above at the same time Basically speaking it should be able to handle all possible results that the function SvxGetAltSpelling may return. (And that one is flexible enough to allow for complete new words...) Please take over. Thanks! Nemeth->TL: Many thanks for your help. Also I had started to analyze this problem in the Writer a few years ago, but I had to finish, when I found the problem out of the linguistic modules. The most important component is the Writer, so it would be fantastic, if we will have a partial solution for the text processor. Thanks, Laci . This is problem for the hyphenation of f ligatures. efficiency -> ef-ficiency (Nor a simple fi -> f=i hyphenation doesn't work.) (By the way, the automatic OpenType solution of ligature handling has also potential problems: some languages, for example German doesn't use ligatures at word part boundaries in compound words. Also the HYPHENMIN values depends from the usage of ligatures. The fi- can be in the end of the lines in Hungarian, but this hyphenation is deprecated with ligatures.) |