Issue 91198 - Language fingerprint for Luxembourgish (File attached)
Summary: Language fingerprint for Luxembourgish (File attached)
Status: VERIFIED FIXED
Alias: None
Product: lingucomponent
Classification: Code
Component: other (show other issues)
Version: OOo 3.0 Beta
Hardware: Unknown All
: P3 Trivial (vote)
Target Milestone: OOo 3.2
Assignee: stefan.baltzer
QA Contact: issues@lingucomponent
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-29 20:31 UTC by michel_w
Modified: 2009-05-07 09:38 UTC (History)
1 user (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Language fingerprint for Luxembourgish (1.42 KB, text/plain)
2008-06-29 20:32 UTC, michel_w
no flags Details
Luxembourgish text sample (from lb.wikipedia.org) (9.55 KB, application/vnd.oasis.opendocument.text)
2009-02-02 13:30 UTC, michel_w
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description michel_w 2008-06-29 20:31:10 UTC
Hi

I've built a fingerprint file for Luxembourgish (a minority language spoken in 
an around Luxembourg). The file is encoded in UTF-8. It would be nice if this 
could make it into the OOo3 final.

Regards,
Michel Weimerskirch
Comment 1 michel_w 2008-06-29 20:32:15 UTC
Created attachment 54814 [details]
Language fingerprint for Luxembourgish
Comment 2 thomas.lange 2008-06-30 09:04:49 UTC
TL: Sorry it is a little bit late for OOo 3.0 since code freeze is already dead
ahead and all CWS basically should go into QA by today (which is already quite
late).

Setting target to OOo 3.1 and taking ownership since I probably have to do some
code changes as well.

TL->michel_w: Would a guessed locale with the language part 'lb' only be
sufficient or are there variants to Luxembourgish and we need to set e.g. the
country part as well?
Comment 3 michel_w 2008-06-30 09:21:14 UTC
@TL: Well, too bad... I guess I should have paid attention to the release plan. 
But well, better late than never ;-)

As to your question: The language part "lb" is sufficient. There are no 
variants.

Regards,
Michel
Comment 4 thomas.lange 2008-09-09 13:36:09 UTC
.
Comment 5 thomas.lange 2009-01-20 10:53:12 UTC
As discussed with SBA setting target to OOo 3.2 as time is already short and may
CWS are still in the queue.
Comment 6 michel_w 2009-01-23 10:28:13 UTC
That's really a pitty. I have implemented a proofreader for Luxembourgish using
the new grammar checker API of OOo 3.0.1. It will be released during the next
weeks. A spell checking dictionary is already available.
I was hoping that language guessing for Luxembourgish could at least be part of
OOo 3.1, as it is an integral part of the user experience for proofreading
multilingual texts. Delaying that feature by a few more months is unfortunate.
Comment 7 thomas.lange 2009-01-23 10:41:52 UTC
Yes it. But AFAIK there are already more then 60 CWS in the queue for 3.1 and we
already need to cut short on those. Thus it was decided that only serious issues
should be fixed now. :-(
Comment 8 thomas.lange 2009-02-02 12:07:30 UTC
tl->michel_w: Since we missed OOo 3.1 I want to take care of this right away now.
Can you attach some short Luxembourgish text sample that can be used to verify
the result?
Comment 9 michel_w 2009-02-02 13:29:28 UTC
michel_w->tl: Ok, thanks.
Basically any text from the Luxembourgish Wikipedia will do: http://lb.wikipedia.org
I'm going to attach one that I've more-or-less randomly selected. (You said
"short", so I hope it's ok).
Comment 10 michel_w 2009-02-02 13:30:39 UTC
Created attachment 59831 [details]
Luxembourgish text sample (from lb.wikipedia.org)
Comment 11 thomas.lange 2009-02-03 13:28:26 UTC
Fixed in CWS tl66.

Files changed:
- scp2\source\ooo\file_ooo.scp
- scp2\source\ooo\module_hidden_ooo.scp
- libtextcat\data\new_fingerprints\fpdb.conf
- libtextcat\data\new_fingerprints\lm\luxembourgish.lm
Comment 12 thomas.lange 2009-02-03 13:48:23 UTC
tl->michel_w: first I worried a bit that the fingerprint file might not work
since it is rather different from the other ones used in Ooo (see e.g.
Sun\StarOffice 9\Basis\share\fingerprint\swedish.lm) since it dis not have the
second column with the numbers. But still it seems to work fine without them.

Just curious: Where did you get it? Or with what tool did you create it?
I'm asking because in OOo there are still some broken fingerprints that can not
be used. And maybe, those can be recreated to do their work... 

Comment 13 michel_w 2009-02-03 17:10:01 UTC
michel_w->tl: I created it myself using text_cat
(http://odur.let.rug.nl/~vannoord/TextCat/). Text_cat is based on this paper:
http://citeseer.ist.psu.edu/68861.html (Figure 3 explains the algorithm quite well).
The numbers in the second column represent the number of times the n-grams
appears in the original sample. They are not used by the algorithm and can thus
be safely removed AFAIK (which is what I did).
Comment 14 thomas.lange 2009-05-05 06:31:57 UTC
.
Comment 15 stefan.baltzer 2009-05-07 09:38:02 UTC
Verified in CWS tl66.