Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||Quotes in Hebrew workbreaking don't work during spellcheck|
|Component:||code||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Priority:||P3||CC:||elisko, issues, Mathias_Bauer, yba|
|Version:||OOo 2.0 Beta|
|Issue Type:||PATCH||Latest Confirmation in:||---|
Description alan 2005-07-10 10:49:32 UTC
At present, OOo considers quotes mark to be the end of a word during spellcheck. This should be changed. Here is background (provided by Jonathan Ben-Avraham): ----------- The Hebrew writing system uses one double quotation mark as the penultimate character of a lexical item in order to indicate that the lexical item is either an acronym or a Hebrew number, rather than a normal word. The double quote mark is always preceded and followed by a Hebrew character, never a whitespace character, punctuation mark or single quote. The reader uses domain and contextual knowledge in order to distinguish between acronyms and numbers. When the character set allows distinct opening and closing double quote glyphs, then Hebrew uses the closing (slanting from upper right to lower left) double quotation mark. The Hebrew writing system uses one single quote mark after (visually to the left of) a Hebrew consonant as an accent mark to indicate that the consonant should be pronounced in an alternative way (usually to indicate a foreign pronunciation for a letter that does not exist in Hebrew), or to indicate a contraction. The single quote can be after any character of a word, including in word final position (followed by whitespace or a punctuation mark. Words that use the single quote as either an accent mark or contraction indicator are not listed in common Hebrew dictionaries. In addition, Hebrew also uses double and single quotation marks in pairs to indicate quotations in the same way that Western languages do. The above explanations unfortunately reflect the way the key mappings are set up in Israel today for historical reasons but is not the way things should really be in the ideal world. A real Unicode purist would use \u05F4 (HEBREW_GERSHAYIM) instead of a \u0022 (They look the same), and \u05F3 (HEBREW_GERESH) instead of \u0027. The break iterator code in OOo should be fixed to deal with *both* the common and the correct usages. Hebrew words can be hyphenated between any two characters. There are no syllable based hyphenation rules as in English. There is no Hebrew hyphen (yes, \u05BE HEBREW_MAQAF is not a hyphen). --------- This is only an issue during spellchecking. When moving from word to word using Ctrl/Right or Ctrl/Left quote×© ×©×¨×§ *not* treated as the end of a word. This is correct. However, during spellchecking, the behavior is not correct. There is more on this subject at: http://l10n.openoffice.org/servlets/BrowseList?list=dev&by=thread&from=936644 and the patch submitted at: http://www.openoffice.org/issues/show_bug.cgi?id=51661
Comment 1 michael.ruess 2005-07-11 14:08:43 UTC
Reassigned to SBA.
Comment 2 stefan.baltzer 2005-10-12 13:58:28 UTC
*** Issue 55809 has been marked as a duplicate of this issue. ***
Comment 3 stefan.baltzer 2005-10-12 14:08:20 UTC
SBA->FME: As discussed, yours. Note: The closed duplicate (issue 55809 "Script type change is always regarded as a word boundary") has an attachment with several quote characters in Hebrew words. It was a follow-up of issue 51661 that was was fixed within break iterator.
Comment 4 stefan.baltzer 2005-10-12 14:20:13 UTC
SBA: Summary adjusted.
Comment 5 alan 2005-12-01 20:18:03 UTC
Created attachment 31966 [details] Changes handling of RTL numstrings, and adjusts X coordinate for RTL in PaintBullet
Comment 6 alan 2005-12-01 20:20:22 UTC
Sorry, I attached the patch to the wrong issue (as the name of the patch indicates).
Comment 7 alan 2008-11-12 07:43:00 UTC
Created attachment 57916 [details] Changes script type of quote, geresh, gershayim to WEAK in Hebrew context
Comment 8 alan 2008-11-12 07:45:54 UTC
I've attached a patch which changes the script type of double-quote, apostrophe, geresh, gershayim to WEAK in Hebrew context, thus not breaking a Hebrew word at those characters. See the discussion at http://l10n.openoffice.org/servlets/BrowseList?list=dev&by=thread&from=2146651
Comment 9 frank.meies 2008-11-13 07:57:51 UTC
fme->khong: This only affects break iterator code. Please take over.
Comment 10 karl.hong 2008-12-15 22:43:41 UTC
Karl->fme, breakiterator seems doing right thing, following program show Hebrew dictionary word breakiterator considers double quote as part of the word, Sub Main xBI = createUnoService("com.sun.star.i18n.BreakIterator") Dim aLocale as new com.sun.star.lang.Locale aLocale.Language = "he" nWordType = 2 ' WordType::DICTIONARY_WORD aTxt = CHR$(&H5d0) +CHR$(&H22)+CHR$(&H5d0) aBoundary = xBI.getWordBoundary( aTxt, 0, aLocale, nWordType, true ) print aBoundary. StartPos, aBoundary.EndPos End Sub It print (0,3). Something must be wrong in Writer to send the word to spellchecker.
Comment 11 frank.meies 2009-01-05 14:08:18 UTC
@ayaniger: Well, for issue 16354 I already implemented some code that changes the script type obtained from i18n to COMPLEX in case the direction of the character run is RTL, see porlay.cxx. For a couple of tasks (e.g., spell checking, word count etc.) the SwScanner::NextWord method is used. This method contains some code that clips the words at script type boundaries. Now in my opinion the problem is that the SwScanner::NextWord function does not use the ScriptInfo data structure (which contains the 'changed' script type) but rather directly used the break iterator to find the script boundaries. What do you think?
Comment 12 alan 2009-01-06 11:44:26 UTC
@fme: Yes, that seems correct.
Comment 13 Mathias_Bauer 2009-05-07 15:45:42 UTC
Just for book keeping - is this patch still worked on? Or shall we reject it and set the issue type to "DEFECT"?
Comment 14 alan 2009-05-07 16:48:10 UTC
If Frank is not working on this, I will try to work on it next week.
Comment 15 Mathias_Bauer 2009-05-25 16:23:30 UTC
Setting target 3.x for the time being
Comment 16 Rob Weir 2013-03-11 15:04:50 UTC
I'm adding this comment to all open issues with Issue Type == PATCH. We have 220 such issues, many of them quite old. I apologize for that. We need your help in prioritizing which patches should be integrated into our next release, Apache OpenOffice 4.0. If you have submitted a patch and think it is applicable for AOO 4.0, please respond with a comment to let us know. On the other hand, if the patch is no longer relevant, please let us know that as well. If you have any general questions or want to discuss this further, please send a note to our dev mailing list: firstname.lastname@example.org Thanks! -Rob