Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Quote marks in 2.0 Hebrew workbreaking | ||
---|---|---|---|
Product: | Internationalization | Reporter: | alan |
Component: | code | Assignee: | stefan.baltzer |
Status: | CLOSED FIXED | QA Contact: | issues@l10n <issues> |
Severity: | Trivial | ||
Priority: | P3 | CC: | issues, ooo, yba |
Version: | OOo 2.0 Beta | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Issue Type: | PATCH | Latest Confirmation in: | --- |
Developer Difficulty: | --- | ||
Attachments: |
Description
alan
2005-07-07 09:46:24 UTC
Created attachment 27755 [details]
Changes to existing breakiterator code
Created attachment 27756 [details]
New file rules for Hebrew wordbreaking
Grabbing issue. Hi Karl, As I won't find the time to dive into this the next days/weeks, could you please have a look at this one? See also the mails on the dev@l10n list of the thread starting with the message mentioned above. If there is an easy solution, just add it to my CWS locales201. Thanks Eike I assume that both edit and dictionary modes need to treat double quote as part of the word. I create two files edit_word_he.txt and dict_word_he.txt. I test it on other language, if someone could upload a Hebrew file with double quote for me to test, that will be great. Thanks in advance. Created attachment 29837 [details]
Hebrew test file withquotes for Karl
Thanks, Alan. Ready for QA. re-open issue and reassign to oc@openoffice.org reassign to oc@openoffice.org reset resolution to FIXED Hi Stefan, please take over re-open issue and reassign to sba@openoffice.org reassign to sba@openoffice.org reassign to sba@openoffice.org reset resolution to FIXED Karl, there still seems to be a problem when I try to spellcheck the sample document. I'm attaching a screenshot. Note that in the 6th and 7th lines, toward the left, there are two identical words, one above the other, that are marked as misspelled. Those words have double-quotes in the middle, but the red line stops at the double-quote. It should continue past the quote, to the end of the word. The same problem exists in the text in the top-right cell of the table. Also in the left cell of the table's second row. Created attachment 29960 [details]
Sample doc - note misspelled words broken by a double-quote
SBA->ayaninger: When I compare the CWS build and an OOo installation WITHOUT this break iterator patch, I see no difference in treatment of quotes. I will attach a document with a couple of "quotes" (single and double). Their Unicode IDs are 2018, 2019, 201B, 201C, 201E, 05F2, 05F4, 05D9, 05F3. Please comment (1) wich ones should be treated as "character" and wich ones as "quote" (=word limiter). (2) Wich ones are commonly used (= can be inserted directly) when typing Hebrew? Subsequently (difference=none), I must regard this issue as "not fixed". -> Back to NEW and reassigned to Karl. re-open issue and reassign to khong@openoffice.org reassign to khong@openoffice.org reset resolution to FIXED Reopened. Created attachment 30187 [details]
9 different quote characters in Hebrew words
Karl->SBA, None of your quotes is what they want. They want english, or ASCII, double quote (0022). You can see it as $MidLetter in the attachment of "New file rules for Hebrew wordbreaking". I made both word type mode, dictionary and edit modes, take (0022) as mid letter. In Alan's attached document , HebrewQuoteTest.odt, when you do word travel by (Cntr->Arrow key), you will see (") is part of a word as mid letter. Karl->Alan, I don't have Hebrew spellchecker, I could not see what you see in your screen shot. As to test word break in spellchecker, which uses DICTIONARY_WORD mode, here is StarBasic program, you can change to different language and get different word boundary, Sub Main bi=createUnoService("com.sun.star.i18n.BreakIterator") dim lo as new com.sun.star.lang.Locale lo.Language="he" ty=com.sun.star.i18n.WordType.DICTIONARY_WORD st="aa"+chr$(34)+"b" bd=bi.getWordBoundary(st, 0, lo, ty, TRUE) print st, bd.startPos, bd.endPos st="כע"+chr$(34)+"ז" bd=bi.getWordBoundary(st, 0, lo, ty, TRUE) print st, bd.startPos, bd.endPos End Sub ayaniger->sba, Yes, Karl is correct, we are referring to the English ASCII double-quote. However, it would be proper to treat all the other characters you listed in the same way, as "characters", and not as word-breakers. Take a look at Jonathan Ben-Avraham's background explanation, which I quoted in my comments to Issue 51772. ayaniger->Karl, Word travel using Ctrl-<Arrow> does jump over the quote marks. I ran your StarBasic program, and saw the results, which also show that the quote marks do not break the word. Nevertheless, in spell checking the word is broken at the quote marks. I am attaching Hebrew dictionaries and dictionary.lst, which you can install in share/dict/ooo, so you can take a look. Created attachment 30203 [details]
Hebrew dictionary files and dictionary.lst
SBA: I correct the status to "Fixed". Thomas Lange is digging a little into Karls code in order to find out why the hebrew spellchecker is not accepting the entire word (with ASCII 0022) while cursor travelling behaves like "this is one word". Tho outcome will probably lead to another issue that will not be fixed within this CWS. SBA: Verified in CWS i18n20. Follow up is issue 51772. SBA: OK in Master (and still OK in OOo 2.02). Closed |