Issue 51661

Summary:

Quote marks in 2.0 Hebrew workbreaking

Product:

Internationalization

Reporter:

alan

Component:

code

Assignee:

stefan.baltzer

Status:

CLOSED FIXED

QA Contact:

issues@l10n <issues>

Severity:

Trivial

Priority:

CC:

issues, ooo, yba

Version:

OOo 2.0 Beta

Target Milestone:

---

Hardware:

All

OS:

All

Issue Type:

PATCH

Latest Confirmation in:

---

Developer Difficulty:

---

Attachments:

Description	Flags
Changes to existing breakiterator code	none
New file rules for Hebrew wordbreaking	none
Hebrew test file withquotes for Karl	none
Sample doc - note misspelled words broken by a double-quote	none
9 different quote characters in Hebrew words	none
Hebrew dictionary files and dictionary.lst	none

Description alan 2005-07-07 09:46:24 UTC

Hebrew workbreaking in 1.1 does not see a double quote mark as the end of a
word. This is correct behavior. In 2.0 beta m104, this does not happen, and a
quote make *is* seen as as a word-breaker.

See the thread at:
http://l10n.openoffice.org/servlets/BrowseList?list=dev&by=thread&from=935705
for details.

I have tried to integrate my changes to 1.1 into m104, but unsuccesfully. I'm
posting the changes I made to m104, so that others can examine them, and find
out what's wrong or missing.

Comment 1 alan 2005-07-07 09:49:23 UTC

Created attachment 27755 [details]
Changes to existing breakiterator code

Comment 2 alan 2005-07-07 09:50:35 UTC

Created attachment 27756 [details]
New file rules for Hebrew wordbreaking

Comment 3 ooo 2005-07-07 12:16:47 UTC

Grabbing issue.

Comment 4 ooo 2005-09-05 18:09:36 UTC

Hi Karl,

As I won't find the time to dive into this the next days/weeks, could you please
have a look at this one? See also the mails on the dev@l10n list of the thread
starting with the message mentioned above. If there is an easy solution, just
add it to my CWS locales201.

Thanks
  Eike

Comment 5 karl.hong 2005-09-23 01:03:08 UTC

I assume that both edit and dictionary modes need to treat double quote as part
of the word. I create two files edit_word_he.txt and dict_word_he.txt.

I test it on other language, if someone could upload a Hebrew file with double
quote for me to test, that will be great. Thanks in advance.

Comment 6 alan 2005-09-23 09:11:18 UTC

Created attachment 29837 [details]
Hebrew test file withquotes for Karl

Comment 7 karl.hong 2005-09-23 19:13:08 UTC

Thanks, Alan. 

Ready for QA.

re-open issue and reassign to oc@openoffice.org

Comment 8 karl.hong 2005-09-23 19:13:14 UTC

reassign to oc@openoffice.org

Comment 9 karl.hong 2005-09-23 19:13:25 UTC

reset resolution to FIXED

Comment 10 oc 2005-09-26 15:50:50 UTC

Hi Stefan, please take over

re-open issue and reassign to sba@openoffice.org

Comment 11 oc 2005-09-26 15:51:02 UTC

reassign to sba@openoffice.org

Comment 12 oc 2005-09-26 15:51:13 UTC

reassign to sba@openoffice.org

Comment 13 oc 2005-09-26 15:51:21 UTC

reset resolution to FIXED

Comment 14 alan 2005-09-28 12:53:25 UTC

Karl, there still seems to be a problem when I try to spellcheck the sample
document. I'm attaching a screenshot. Note that in the 6th and 7th lines, toward
the left, there are two identical words, one above the other, that are marked as
misspelled. Those words have double-quotes in the middle, but the red line stops
at the double-quote. It should continue past the quote, to the end of the word.
The same problem exists in the text in the top-right cell of the table. Also in
the left cell of the table's second row.

Comment 15 alan 2005-09-28 12:56:52 UTC

Created attachment 29960 [details]
Sample doc - note misspelled words broken by a double-quote

Comment 16 stefan.baltzer 2005-10-06 17:07:08 UTC

SBA->ayaninger: When I compare the CWS build and an OOo installation WITHOUT
this break iterator patch, I see no difference in treatment of quotes. 
I will attach a document with a couple of "quotes" (single and double). Their
Unicode IDs are 2018, 2019, 201B, 201C, 201E, 05F2, 05F4, 05D9, 05F3. 

Please comment
(1) wich ones should be treated as "character" and wich ones as "quote" (=word
limiter). 
(2) Wich ones are commonly used (= can be inserted directly) when typing Hebrew? 

Subsequently (difference=none), I must regard this issue as "not fixed".
-> Back to NEW and reassigned to Karl.

re-open issue and reassign to khong@openoffice.org

Comment 17 stefan.baltzer 2005-10-06 17:07:31 UTC

reassign to khong@openoffice.org

Comment 18 stefan.baltzer 2005-10-06 17:07:38 UTC

reset resolution to FIXED

Comment 19 stefan.baltzer 2005-10-06 17:09:33 UTC

Reopened.

Comment 20 stefan.baltzer 2005-10-06 17:14:46 UTC

Created attachment 30187 [details]
9 different quote characters in Hebrew words

Comment 21 karl.hong 2005-10-06 18:46:29 UTC

Karl->SBA, None of your quotes is what they want. They want english, or ASCII,
double quote (0022). You can see it as $MidLetter in the attachment of "New file
rules for Hebrew wordbreaking".

I made both word type mode, dictionary and edit modes,  take (0022) as mid
letter. In Alan's attached document , HebrewQuoteTest.odt, when you do word
travel by (Cntr->Arrow key), you will see (")  is part of a word as mid letter. 

Karl->Alan, I don't have Hebrew spellchecker, I could not see what you see in
your screen shot. As to test word break in spellchecker, which uses
DICTIONARY_WORD mode, here is StarBasic program, you can change to different
language and get different word boundary,

Sub Main
bi=createUnoService("com.sun.star.i18n.BreakIterator")
dim lo as new com.sun.star.lang.Locale
lo.Language="he"
ty=com.sun.star.i18n.WordType.DICTIONARY_WORD

st="aa"+chr$(34)+"b"
bd=bi.getWordBoundary(st, 0, lo, ty, TRUE)
print st, bd.startPos, bd.endPos

st="כע"+chr$(34)+"ז"
bd=bi.getWordBoundary(st, 0, lo, ty, TRUE)
print st, bd.startPos, bd.endPos

End Sub

Comment 22 alan 2005-10-07 12:02:05 UTC

ayaniger->sba,
Yes, Karl is correct, we are referring to the English ASCII double-quote.
However, it would be proper to treat all the other characters you listed in the
same way, as "characters", and not as word-breakers. Take a look at Jonathan
Ben-Avraham's background explanation, which I quoted in my comments to Issue 51772.

ayaniger->Karl,
Word travel using Ctrl-<Arrow> does jump over the quote marks. I ran your
StarBasic program, and saw the results, which also show that the quote marks do
not break the word. Nevertheless, in spell checking the word is broken at the
quote marks. I am attaching Hebrew dictionaries and dictionary.lst, which you
can install in share/dict/ooo, so you can take a look.

Comment 23 alan 2005-10-07 12:05:12 UTC

Created attachment 30203 [details]
Hebrew dictionary files and dictionary.lst

Comment 24 stefan.baltzer 2005-10-11 11:45:32 UTC

SBA: I correct the status to "Fixed". Thomas Lange is digging a little into
Karls code in order to find out why the hebrew spellchecker is not accepting the
entire word (with ASCII 0022) while cursor travelling behaves like "this is one
word". Tho outcome will probably lead to another issue that will not be fixed
within this CWS.

Comment 25 stefan.baltzer 2005-10-12 15:28:42 UTC

SBA: Verified in CWS i18n20.
Follow up is issue 51772.

Comment 26 stefan.baltzer 2006-03-22 17:17:52 UTC

SBA: OK in Master (and still OK in OOo 2.02).
Closed