Issue 56348

Summary: Special letter characters in first letter position is not handled by spell checking in Writer
Product: Writer Reporter: nemeth.lacko
Component: codeAssignee: AOO issues mailing list <issues>
Status: REOPENED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: issues, Mathias_Bauer, ooo, rbircher, thomas.lange, timar74
Version: OOo 3.2   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
Unicode ligatures is not recognized in first letter position by the word breaking algorithm for spell checking none

Description nemeth.lacko 2005-10-21 02:30:44 UTC
New Hungarian breakiterator patch
(http://www.openoffice.org/issues/show_bug.cgi?id=56347) contains some Unicode
characters in first letter position in words (ALetter). Breakiterator doesn't
handle these Unicode characters. For example, suffixed forms of euro sign is not
accepted by breakiterator, and

€-val  (eg. "25 €-val" = "with 25 €" in Hungarian)

is broken two parts: € and -val, despite its ALetter declaration.
Comment 1 ooo 2006-05-02 17:54:52 UTC
Karl,

Can we do something about this in the 2.0.4 code line?

  Eike
Comment 2 ooo 2006-05-03 10:58:44 UTC
I thought I reassigned this issue.. so again,

Karl,

Can we do something about this in the 2.0.4 code line?

  Eike
Comment 3 karl.hong 2006-07-26 19:28:52 UTC
I use following StarBasic code to test hu breakiterator, it works as expect, the
boundary is 0, 5.

Sub Main

breakiterator=createUnoService("com.sun.star.i18n.BreakIterator")
dim locale as new com.sun.star.lang.Locale
locale.Language="hu"
wordtype=com.sun.star.i18n.WordType.DICTIONARY_WORD
word="€-val"

boundary=breakiterator.getWordBoundary(word, 0, locale, wordtype, true)

print word, boundary.startPos, boundary.endPos

End Sub

There two breakiterators, DICTIONARY_WORD is for spell checker, for cursor
traveling, you should create edit_word_hu.txt. 
Comment 4 stefan.baltzer 2006-07-27 11:23:51 UTC
SBA->Karl: As discussed via mail, this issue does not fit in the OOo 2.04 time
frame.
Set Target to OOo 2.x. (Means "next mile stone, but there is no 2.05 target
available yet)
Comment 5 nemeth.lacko 2006-07-27 11:31:56 UTC
nemeth->khong: many thanks for the instruction. I will make the edit_word_hu.txt
file for OOo 2.0.5.
Comment 6 karl.hong 2006-10-10 22:39:14 UTC
Fixed.
Comment 7 karl.hong 2006-10-13 17:09:48 UTC
ready for QA.
Comment 8 karl.hong 2006-10-20 18:23:16 UTC
For testing this feature, enter a word with dash, like "re-send", set language
to Hungarian, move cursor over the word by control-arrow key, it shoudl treat
"re-send" as a word. Previous version treated it as 3 words. 
Comment 9 stefan.baltzer 2006-10-23 15:32:01 UTC
SBA: Thanks, Karl. I was looking "via spellchecker" and saw no difference. But
when I do the cursor-travelling as described, I can see the difference.
Verified in CWS i18n27.
Comment 10 stefan.baltzer 2006-11-03 13:57:48 UTC
SBA: Correcting target to OOo 2.1
Comment 11 stefan.baltzer 2006-11-03 13:59:09 UTC
SBA: Correcting target to OOo 2.1
Comment 12 stefan.baltzer 2006-11-27 11:53:49 UTC
SBA: OK in OOE680m5 Build 9093.
Closed.
Comment 13 nemeth.lacko 2010-02-15 13:36:07 UTC
This bug is not resolved by the breakiterator patterns. 

A more trivial example is the bad Unicode fi-ligature handling.

Please, check the following patterns in the default English language:

finite infinite

Only the second form (inner U+FB01 ligature position) is recognized by the break
iterator for spell checking. This is a real bug for the new Hungarian spelling
dictionary with automatic ligature handling (input character conversion).




Comment 14 nemeth.lacko 2010-02-15 13:41:14 UTC
Created attachment 67833 [details]
Unicode ligatures is not recognized in first letter position by the word breaking algorithm for spell checking
Comment 15 nemeth.lacko 2010-02-15 13:55:15 UTC
This is a Writer specific bug. Impress handles well these Unicode characters in
spell checking.
Comment 16 nemeth.lacko 2010-03-10 10:44:42 UTC
Summary: -> in Writer
Comment 17 thomas.lange 2010-09-01 16:57:16 UTC
tl->nemeth,sba: In this issue are multiple different problems mentioned. :-(
As far as I see everything but the ligature problem mentioned in the posting
from 'Mon Feb 15' are already fixed. If that is true and the ligature problem is
the only item still missing please either close this issue or set it as a
duplicate to issue 113785 which was used to fix the ligature problem only.
Comment 18 Raphael Bircher 2011-10-04 22:34:01 UTC
assegned to the default contact

rbircher > sba feel free to reassigne to your self

@all This issue is maybe solved. Can sameone tell more about it?