Issue 59960 - Add barbarisms correction to spellchecking
Summary: Add barbarisms correction to spellchecking
Status: UNCONFIRMED
Alias: None
Product: General
Classification: Code
Component: spell checking (show other issues)
Version: 3.3.0 or older (OOo)
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: needhelp
Depends on:
Blocks:
 
Reported: 2006-01-02 16:46 UTC by unjoan
Modified: 2014-02-24 17:52 UTC (History)
2 users (show)

See Also:
Issue Type: FEATURE
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Diff for MySpell code (2.76 KB, application/x-gzip)
2006-01-02 16:47 UTC, unjoan
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description unjoan 2006-01-02 16:46:13 UTC
REQ: Add barbarisms correction
Well, sorry for long text :-(

* Some info about Abiword barbarisms support, my
explanation about what is a barbarism is taken from here:
Mail thread:
http://www.abisource.com/mailinglists/abiword-dev/02/Sep/0498.html


* What is a barbarism

Barbarism is a problem that manly concerns to
minority languages, i.e. languages that are
competing, in the same territory, with a more
powerful one, called "rooflanguage", for example
Welsh, Catalan, Occitan, and others.

When two languages compete in the same territory
comes up interferences, but they are not symmetric.
The roof language is weakly affected but the
minority one can be strongly affected, and can
disappear (glottophagy). One of these
interferences is barbarism.

* Example:

In Catalan: "tamany" is taked from Spanish "tamaño"
and should be corrected by "mida" or "grandària", means
"size", in English. Any spellchekcer without barbarism
support doesn't suggest "mida" or "grandària" when
tamany is checked.

* Other cases of use for the same feature:

Another possiblity for this feature is what I call "custom
user suggestions".

Example: If an user types wrongly the same word again and
again, but hunspell can't suggest the correct word, then
the
user can add this wrongly typed word to barbarisms data file
with the correctly word as suggestion.

An other example: Imagine a common very very large text (as
company name, or any text), then user can create a dummy
wrong word (as myword01) in the barbarims data file and add
the correct word as suggestion (my real very large company
name).


* How to implement it (idea or aproach)

Using the MyThesaurus, with an
special thesaurus file where entries are barbarisms and
their synonyms are the correct suggestions.

Adding something like this in suggest() function
(suggestmgr.cxx):

if ((nsug < maxSug) && (nsug > -1))
nsug = barbarims(wlst, word, nsug);

And coding a barbarims function:
barbarisms() must check the word in the barbarism
'thesaurus' file
If word is in 'thesaurus' file then barbarisms()
must add 'synonyms' of word as suggestions in wlst.
barbarisms() must update nsug properly.


* Known problems in this aproach:

- Working at word level, not sentence level. We are
just hacking a spell checker, not doing a grammar
checker. So, some barbarims can't be corrected. It
can't be solved.

- Currently, words that can be declined have to be
coded several times (plurals, verbs declinations, etc).
It's reported as a enhancement of MyThesaurus in OOo
(issue 19563)
http://www.openoffice.org/issues/show_bug.cgi?id=19563
Comment 1 unjoan 2006-01-02 16:47:50 UTC
Created attachment 32842 [details]
Diff for MySpell code
Comment 2 unjoan 2006-01-02 16:50:15 UTC
The attached file (32842) is just a working example code, not a patch. I'm just
a beginner programer.
Comment 3 stefan.baltzer 2006-04-28 14:45:05 UTC
SBA: issue 64246 is another example of "one word - two spelling options" .
acknowledgement <-> acknowledgment.

In Germany, too, we have the problem of "The New German spelling" vs. "Many old
documents PLUS a workforce of people who did NOT learn about the new spelling.
Therefore there is a long list of words that are still "tolerated" in the old
spelling.

Please note that with an "Exception dictionary", the user can easily edit
certain proposals for certain words. In my opinion this is a more convenient way
in daily use than to enhance or implement a Thesaurus-like thingie.