Apache OpenOffice (AOO) Bugzilla – Issue 91198
Language fingerprint for Luxembourgish (File attached)
Last modified: 2017-05-20 09:01:32 UTC
Hi I've built a fingerprint file for Luxembourgish (a minority language spoken in an around Luxembourg). The file is encoded in UTF-8. It would be nice if this could make it into the OOo3 final. Regards, Michel Weimerskirch
Created attachment 54814 [details] Language fingerprint for Luxembourgish
TL: Sorry it is a little bit late for OOo 3.0 since code freeze is already dead ahead and all CWS basically should go into QA by today (which is already quite late). Setting target to OOo 3.1 and taking ownership since I probably have to do some code changes as well. TL->michel_w: Would a guessed locale with the language part 'lb' only be sufficient or are there variants to Luxembourgish and we need to set e.g. the country part as well?
@TL: Well, too bad... I guess I should have paid attention to the release plan. But well, better late than never ;-) As to your question: The language part "lb" is sufficient. There are no variants. Regards, Michel
.
As discussed with SBA setting target to OOo 3.2 as time is already short and may CWS are still in the queue.
That's really a pitty. I have implemented a proofreader for Luxembourgish using the new grammar checker API of OOo 3.0.1. It will be released during the next weeks. A spell checking dictionary is already available. I was hoping that language guessing for Luxembourgish could at least be part of OOo 3.1, as it is an integral part of the user experience for proofreading multilingual texts. Delaying that feature by a few more months is unfortunate.
Yes it. But AFAIK there are already more then 60 CWS in the queue for 3.1 and we already need to cut short on those. Thus it was decided that only serious issues should be fixed now. :-(
tl->michel_w: Since we missed OOo 3.1 I want to take care of this right away now. Can you attach some short Luxembourgish text sample that can be used to verify the result?
michel_w->tl: Ok, thanks. Basically any text from the Luxembourgish Wikipedia will do: http://lb.wikipedia.org I'm going to attach one that I've more-or-less randomly selected. (You said "short", so I hope it's ok).
Created attachment 59831 [details] Luxembourgish text sample (from lb.wikipedia.org)
Fixed in CWS tl66. Files changed: - scp2\source\ooo\file_ooo.scp - scp2\source\ooo\module_hidden_ooo.scp - libtextcat\data\new_fingerprints\fpdb.conf - libtextcat\data\new_fingerprints\lm\luxembourgish.lm
tl->michel_w: first I worried a bit that the fingerprint file might not work since it is rather different from the other ones used in Ooo (see e.g. Sun\StarOffice 9\Basis\share\fingerprint\swedish.lm) since it dis not have the second column with the numbers. But still it seems to work fine without them. Just curious: Where did you get it? Or with what tool did you create it? I'm asking because in OOo there are still some broken fingerprints that can not be used. And maybe, those can be recreated to do their work...
michel_w->tl: I created it myself using text_cat (http://odur.let.rug.nl/~vannoord/TextCat/). Text_cat is based on this paper: http://citeseer.ist.psu.edu/68861.html (Figure 3 explains the algorithm quite well). The numbers in the second column represent the number of times the n-grams appears in the original sample. They are not used by the algorithm and can thus be safely removed AFAIK (which is what I did).
Verified in CWS tl66.