Issue 103402

Summary:

need to skip diacritics in Hebrew spellchecking

Product:

General

Reporter:

alan

Component:

spell checking

Assignee:

AOO issues mailing list <issues>

Status:

CONFIRMED ---

QA Contact:

Severity:

Trivial

Priority:

CC:

amiadb, elisko, issues, kaplanlior, nemeth.lacko, okhayat, yba

Version:

3.3.0 or older (OOo)

Target Milestone:

---

Hardware:

Unknown

OS:

All

Issue Type:

PATCH

Latest Confirmation in:

---

Developer Difficulty:

---

Attachments:

Description	Flags
proposed patch	none
revised - changed a < to <=	none

Description alan 2009-07-08 07:18:20 UTC

Hebrew is usually written without diacritics. However, sometimes the diacritics
are written as special marks located within, above, or below consonants. The
diacritics are represented internally as separate Unicode characters. Hebrew
dictionaries check for words without diacritics and will continue to do so for
the foreseeable future.

This patch filters the diacritics out of a word, before spellchecking it.
(Using breakiterator is not appropriate, since we don't want word-breaking at
the diacritics)

I don't know whether this functionality is needed for other languages as well,
perhaps Arabic or Persian, or maybe some LTR languages. The patch is written in
a generalized way, so that adding a language is fairly easy:

1) add another "case LANGUAGE_WHATEVER" to the "switch (nLanguage)" statement,
and create a string with the diacritics to be skipped 

2) add "|| nLanguage == LANGUAGE_WHATEVER" to the assignments of the boolean
variables

Comment 1 alan 2009-07-08 07:19:35 UTC

Created attachment 63421 [details]
proposed patch

Comment 2 alan 2009-07-08 07:35:24 UTC

Created attachment 63425 [details]
revised - changed a < to <=

Comment 3 elisko 2009-07-08 10:34:58 UTC

As I understand it, Sanskrit-based languages such as Hindi also employ diacritics.

Comment 4 kaplanlior 2010-08-14 19:02:09 UTC

#99796 has a very similar problem, I think the two should be fixed together
(probably the same code). Notice this is not the same problem, just a similar one.

Comment 5 thomas.lange 2010-08-18 07:45:17 UTC

taking ownership as well.

tl->ayaniger: If you provide patches for the linguistic please assign them
directly to me, if by bad luck I may not see them in the issues ML and nobody
else is assigning them to me they will just loiter around, probably until
someone else makes a new comment and have them appear in the ML once more.

tl->nemeth: won't it be possible to take care of this in the spell check
dictionary or hunspell itself? I'm just asking because removing them in the
SpellCheckerDispatcher will have the following two side effects:

a) the replacement word will probably also not provide diacritics which may look
somewhat odd if all the surrounding text is using them.

b) if there ever were another spell checker implementation for Hebrew that could
properly work with diacritics and provide them in replacements as well, then the
patch will effectively suppress that feature. 

Thus I'm a little hesitant until told this patch actually has to be the solution
to take.

Comment 6 kaplanlior 2010-08-21 18:16:21 UTC

#51772 also has a very similar problem, I think the two should be fixed together
(probably the same code). Notice this is not the same problem, just a similar one.

Comment 7 Martin Hollmichel 2011-03-16 11:56:13 UTC

set target 3.x not relevant for 3.4 release

Comment 8 Rob Weir 2013-03-11 15:01:35 UTC

I'm adding this comment to all open issues with Issue Type == PATCH.  We have 220 such issues, many of them quite old.  I apologize for that.  

We need your help in prioritizing which patches should be integrated into our next release, Apache OpenOffice 4.0.

If you have submitted a patch and think it is applicable for AOO 4.0, please respond with a comment to let us know.

On the other hand, if the patch is no longer relevant, please let us know that as well.

If you have any general questions or want to discuss this further, please send a note to our dev mailing list:  dev@openoffice.apache.org

Thanks!

-Rob

Comment 9 kaplanlior 2013-03-12 11:44:15 UTC

The patch is Hebrew specific, I think it should be more general.