Apache OpenOffice (AOO) Bugzilla – Issue 44793
No Thesaurus for British English (UK, GB)
Last modified: 2013-02-24 20:40:16 UTC
Currently a glaring hole in OOo for the half of the English speaking population that uses British English is that there is no thesaurus. None at all. Not even a straight US English port. That is unless you're smart enough to hack it and copy the thesaurus files, or download something from http://www.8daysaweek.co.uk/downloads.htm#ukthesaurus. This is pretty poor and is the main thing that prevents me and others from using OOo. I use the thesaurus all the time, but I also need British English spelling. It is a glaring hole that needs to be fixed for OOo to be adopted in India, Britain and Australia. There are thesauruses for almost all languages except British English, which is really very odd.
It turns out that the thesaurus being used in the 2.0 beta is in fact ambidextrous and supports both British English and American English. The list for color for example, also contains colour. So, all that needs to be done to completely solve this problem is add the line THES en GB th_en_US_new to C:\Program Files\OpenOffice.org1.9.79\share\dict\ooo\dictionary.lst , presuming that that th_en_US_new is still the filename once 2.0 leaves beta stage. This will solve the whole problem for half the English speaking word and let us use synonyms as much as we like! Who can get this added in for 2.0?
Am I right that the line "THES en GB th_en_US_v2" just needs to be added to this file?: external/addons/dictionaries/en_US/dictionary.lst. Laci, do you have write access to that file?
That is all that needs to be added I believe. This really has to go in before the release next week!
Likewise for the_en_CA and probably for all English incarnations.
.
Fixed in the CWS "britishthesau". CWS description: `Add en_US thesaurus to the British English language. The new en_US thesaurus based on WordNet. ``WordNet is a lexical database for the English language.'' (http://wordnet.princeton.edu) Using this theasurus for British is a reasonable request.' Laurent: Could you add the en_US theasurus to other English regions in the DicOOo installer similar to French and German? Thanks, Laci
stardiv has been updated will be available after propagating Laurent
Testing: Now also ENGB uses WordNet thesaurus. Test log with comments: # configure only for British (ENGB) [root@lalilili OOo_2.0beta2] cd ../config_office [root@lalilili OOo_2.0beta2] ./configure --with-jdk-home=/usr/java/j2sdk1.4.2_07/ --disable-mozilla -with-dict=ENGB # build dictionaries project [root@lalilili config_office]# cd .. [root@lalilili config_office]# . LinuxIntelEnv.Set.sh [root@lalilili config_office]# cd dictionaries [root@lalilili dictionaries]# build # dictionary.lst with thesaurus [root@lalilili dictionaries]# cat unxlngi4.pro/bin/dictionary.lst # List of All Dictionaries to be Loaded by OpenOffice # --------------------------------------------------- # Each Entry in the list have the following space delimited fields # # Field 1: Entry Type "DICT" - spellchecking dictionary # "HYPH" - hyphenation dictionary # "THES" - thesaurus files # # Field 2: Language code from Locale "en" or "de" or "pt" ... # # Field 3: Country Code from Locale "US" or "GB" or "PT" # # Field 4: Root name of file(s) "en_US" or "hyph_de" or "th_en_US # (do not add extensions to the name) DICT en GB en_GB HYPH en GB hyph_en_GB THES en GB th_en_US_v2 # check packaged files [root@lalilili dictionaries]# unzip -l unxlngi4.pro/bin/writingaids.zip Archive: unxlngi4.pro/bin/writingaids.zip Length Date Time Name -------- ---- ---- ---- 526269 12-07-05 22:11 en_GB.dic 79272 12-07-05 22:11 hyph_en_GB.dic 27375 12-07-05 22:11 en_GB.aff 1438 12-07-05 22:11 README_en_GB.txt 637 12-07-05 22:11 dictionary.lst 3044762 12-07-05 22:11 th_en_US_v2.idx 18597793 07-24-05 00:32 th_en_US_v2.dat 100272 12-07-05 22:11 DicOOo.sxw 90508 12-07-05 22:11 FontOOo.sxw -------- ------- 22468326 9 files # configure only for American English (ENUS) [root@lalilili dictionaries]# [root@lalilili dictionaries]# cd ../config_office [root@lalilili dictionaries]# ./configure --with-jdk-home=/usr/java/j2sdk1.4.2_07/ --disable-mozilla -with-dict=ENUS [root@lalilili OOo_2.0beta2]# .. [root@lalilili OOo_2.0beta2]# . LinuxIntelEnv.Set.sh [root@lalilili OOo_2.0beta2]# cd dictionaries [root@lalilili dictionaries]# rm -rf unxlngi4.pro/ [root@lalilili dictionaries]# build [root@lalilili dictionaries]# cat unxlngi4.pro/bin/dictionary.lst # List of All Dictionaries to be Loaded by OpenOffice # --------------------------------------------------- # Each Entry in the list have the following space delimited fields # # Field 1: Entry Type "DICT" - spellchecking dictionary # "HYPH" - hyphenation dictionary # "THES" - thesaurus files # # Field 2: Language code from Locale "en" or "de" or "pt" ... # # Field 3: Country Code from Locale "US" or "GB" or "PT" # # Field 4: Root name of file(s) "en_US" or "hyph_de" or "th_en_US # (do not add extensions to the name) DICT en US en_US HYPH en US hyph_en_US THES en US th_en_US_v2 [root@lalilili dictionaries]# unzip -l unxlngi4.pro/bin/writingaids.zip Archive: unxlngi4.pro/bin/writingaids.zip Length Date Time Name -------- ---- ---- ---- 695728 12-07-05 21:47 en_US.dic 36467 12-07-05 21:47 hyph_en_US.dic 2731 12-07-05 21:47 en_US.aff 1625 12-07-05 21:47 WordNet_license.txt 637 12-07-05 21:47 dictionary.lst 3044762 12-07-05 21:47 th_en_US_v2.idx 18597793 07-24-05 00:32 th_en_US_v2.dat 100272 12-07-05 21:47 DicOOo.sxw 90508 12-07-05 21:47 FontOOo.sxw -------- ------- 22570523 9 files
Created attachment 32193 [details] Full test log
[try reassign]
nemeth->mh: Martin, I have made the QA, but please, check it. Thanks.
---- from langdev mailing list ---- Hi Laurent, Quoting Laurent Godard <lgodard@indesko.com>: > Hi all, Hi nemeth > > I just read the update of the dictionary.lst file > Thansk for that > > I've a question though > As en_GB thesaurus uses the same file as en_US, why not filling the .lst > file with the two lines idependantaly of the language ? I have made only a little 3-line patch: --- dictionaries/en_GB/dictionary.lst 2005-02-21 13:07:04.000000000 +0100 +++ dictionaries.britishthesau/en_GB/dictionary.lst 2005-12-03 21:58:27.000000000 +0100 @@ -1,2 +1,3 @@ DICT en GB en_GB HYPH en GB hyph_en_GB +THES en GB th_en_US_v2 --- dictionaries/en_US/makefile.mk 2005-09-08 19:57:34.000000000 +0200 +++ dictionaries.britishthesau/en_US/makefile.mk 2005-12-04 03:26:45.000000000 +0100 -.IF "$(DIC_ALL)$(DIC_ENUS)"!="" +.IF "$(DIC_ALL)$(DIC_ENUS)"!="" || "$(DIC_ALL)$(DIC_ENGB)"!="" ALLTAR : $(MISC)$/th_en_US_v2.don --- dictionaries/prj/build.lst 2005-11-11 12:12:19.000000000 +0100 +++ dictionaries.britishthesau/prj/build.lst 2005-12-07 21:33:40.000000000 +0100 @@ -4,7 +4,7 @@ di dictionaries\cs_CZ nmake - all di_cs_CZ di_diclst NULL di dictionaries\da_DK nmake - all di_da_DK di_diclst NULL di dictionaries\de_DE nmake - all di_de_DE di_diclst NULL -di dictionaries\en_GB nmake - all di_en_GB di_diclst NULL +di dictionaries\en_GB nmake - all di_en_GB di_en_US di_diclst NULL di dictionaries\en_US nmake - all di_en_US di_diclst NULL di dictionaries\it_IT nmake - all di_it_IT di_diclst NULL di dictionaries\ru_RU nmake - all di_ru_RU di_diclst NULL Separating WordNet thesaurus is a good idea, but I didn't want to do big changes in the source. > > Do I miss something ? > What if --with-dic is not set ? all is written ? Yes, it is, with the right en_GB and en_US thesaurus declaration: HYPH da DK hyph_da_DK HYPH de DE hyph_de_DE DICT en GB en_GB HYPH en GB hyph_en_GB THES en GB th_en_US_v2 DICT en US en_US HYPH en US hyph_en_US THES en US th_en_US_v2 DICT it IT it_IT HYPH it IT hyph_it_IT HYPH ru RU hyph_ru_RU Best regards, Laci
So does this only affect US and British English? What about Canadian English etc, shouldn't these be added, too?
nemeth->dnaber: There is no Canadian English dictionary in the source. I had tried to implement the ANY region in OOo's Thesaurus (for example, THES en ANY th_en_US_v2), but I didn't finish it. Using ANY region could give a default thesaurus for a language, but in the first place, DictOOo and OOo builds of the nativ language projects need to set the right thesaurus. There may be big differences at regions of the same language. For example, `The Spanish version of Windows used the word Hembra - meaning "woman" in Spain - for choosing gender. But in some Central American republics, notably Nicaragua, the word is an insult meaning "bitch". The program was changed.' (http://www.guardian.co.uk/uk_news/story/0,,1285890,00.html) Laci
Sorry to reopen, I was looking at CWS britishthesau and I found that for the en_GB version the licence file is not included. Something like this is needed in dictionaries/en_US/makefile.mk: .IF "$(DIC_ALL)$(DIC_ENUS)$(DIC_ENUS)"!="" DIC2BIN+= WordNet_license.txt .ENDIF
nemeth->vq: Thanks for the patch, I've added it. For generating right dictionary.lst I have made a target.pmk patch, too. (It's not the best solution. Separating WordNet would be better.) I have tested with --with-dict=ENGB, --with-dict=ENUS --with-dict=ENGB,ENUS --with-dict=ITIT --with-dict=ALL config parameters. (BTW, DIC_ALL is a sticky enviroment variable, we need explicit unset DIC_ALL to rebuild OOo with not all dictionaries) Diff: diff -u -r dictionaries.britishthesau/en_US/makefile.mk dictionaries/en_US/makefile.mk --- dictionaries.britishthesau/en_US/makefile.mk 2005-12-04 03:26:45.000000000 +0100 +++ dictionaries/en_US/makefile.mk 2006-01-19 12:46:09.000000000 +0100 DIC2BIN= \ en_US.aff \ en_US.dic \ - WordNet_license.txt \ hyph_en_US.dic .ENDIF +# add WordNet license to American and British English +.IF "$(DIC_ALL)$(DIC_ENUS)$(DIC_ENGB)"!="" + +DIC2BIN+= \ + WordNet_license.txt + +.ENDIF + diff -u -r dictionaries.britishthesau/util/target.pmk dictionaries/util/target.pmk --- dictionaries.britishthesau/util/target.pmk 2005-09-08 19:58:44.000000000 +0200 +++ dictionaries/util/target.pmk 2006-01-19 12:46:09.000000000 +0100 $(BIN)$/dictionary_$(DICT_LOCALE).line : dictionary.lst +.IF "$(DIC2BIN)"!="WordNet_license.txt" +$(TYPE) dictionary.lst > $(BIN)$/dictionary_$(DICT_LOCALE).line - +.ENDIF
I'm sorry for being such a pain in the ... You didn't really explain your dictionaries/util/target.pmk patch. I guess the only chance that "$(DIC2BIN)"=="WordNet_license.txt" is when en_US is build as a dependency for en_GB. In this case don't copy dictionary.lst. Please add a comment in target.pmk why you put that line there, otherwise this gets lost. Secondly, can you please put a README_en_GB_thes.txt file with something like this in en_GB/: en_GB is using the WordNet thesaurus from the en_US directory. If you make me the QA rep for britishthesau I'll immediately approve this CWS after the changes are in. Volker
Add requested comments. nemeth->vq: thanks for your suggestions and the QA. Laci
Thanks, looks good now. -> VERIFIED
thesaurus en_GB found in 2.0.2 RC1 -> close