Issue 44793 - No Thesaurus for British English (UK, GB)
Summary: No Thesaurus for British English (UK, GB)
Status: CLOSED FIXED
Alias: None
Product: General
Classification: Code
Component: thesaurus (show other issues)
Version: 3.3.0 or older (OOo)
Hardware: All All
: P2 Trivial with 3 votes (vote)
Target Milestone: OOo 2.0.2
Assignee: quetschke
QA Contact: issues@lingucomponent
URL:
Keywords: oooqa
Depends on:
Blocks:
 
Reported: 2005-03-11 11:16 UTC by komencanto
Modified: 2013-02-24 20:40 UTC (History)
6 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Full test log (28.81 KB, text/plain)
2005-12-07 21:22 UTC, nemeth.lacko
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description komencanto 2005-03-11 11:16:06 UTC
Currently a glaring hole in OOo for the half of the English speaking population
that uses British English is that there is no thesaurus. None at all. Not even a
straight US English port. That is unless you're smart enough to hack it and copy
the thesaurus files, or download something from
http://www.8daysaweek.co.uk/downloads.htm#ukthesaurus.
This is pretty poor and is the main thing that prevents me and others from using
OOo. I use the thesaurus all the time, but I also need British English spelling.
It is a glaring hole that needs to be fixed for OOo to be adopted in India,
Britain and Australia. There are thesauruses for almost all languages except
British English, which is really very odd.
Comment 1 komencanto 2005-03-13 01:08:56 UTC
It turns out that the thesaurus being used in the 2.0 beta is in fact
ambidextrous and supports both British English and American English. The list
for color for example, also contains colour.
So, all that needs to be done to completely solve this problem is add the line
THES en GB th_en_US_new
to C:\Program Files\OpenOffice.org1.9.79\share\dict\ooo\dictionary.lst ,
presuming that that th_en_US_new is still the filename once 2.0 leaves beta stage.

This will solve the whole problem for half the English speaking word and let us
use synonyms as much as we like!
Who can get this added in for 2.0?
Comment 2 ooolist2007 2005-09-11 21:14:33 UTC
Am I right that the line "THES en GB th_en_US_v2" just needs to be added to 
this file?: external/addons/dictionaries/en_US/dictionary.lst.  Laci, do you 
have write access to that file? 
 
Comment 3 komencanto 2005-10-13 13:44:31 UTC
That is all that needs to be added I believe. This really has to go in before
the release next week!
Comment 4 grsingleton 2005-12-03 14:18:08 UTC
Likewise for the_en_CA and probably for all English incarnations.
Comment 5 grsingleton 2005-12-03 20:19:59 UTC
.
Comment 6 nemeth.lacko 2005-12-04 02:52:49 UTC
Fixed in the CWS "britishthesau". 

CWS description:

`Add en_US thesaurus to the British English language. The new en_US thesaurus
based on WordNet. ``WordNet is a lexical database for the English language.''
(http://wordnet.princeton.edu) Using this theasurus for British is a reasonable
request.'

Laurent: Could you add the en_US theasurus to other English regions in the
DicOOo installer similar to French and German?

Thanks,

Laci
Comment 7 maison.godard 2005-12-04 09:15:15 UTC
stardiv has been updated
will be available after propagating

Laurent
Comment 8 nemeth.lacko 2005-12-07 21:18:19 UTC
Testing: Now also ENGB uses WordNet thesaurus.

Test log with comments:

# configure only for British (ENGB)

[root@lalilili OOo_2.0beta2] cd ../config_office
[root@lalilili OOo_2.0beta2] ./configure
--with-jdk-home=/usr/java/j2sdk1.4.2_07/ --disable-mozilla -with-dict=ENGB

# build dictionaries project

[root@lalilili config_office]# cd ..
[root@lalilili config_office]# . LinuxIntelEnv.Set.sh
[root@lalilili config_office]# cd dictionaries
[root@lalilili dictionaries]# build

# dictionary.lst with thesaurus

[root@lalilili dictionaries]# cat unxlngi4.pro/bin/dictionary.lst
# List of All Dictionaries to be Loaded by OpenOffice
# ---------------------------------------------------
# Each Entry in the list have the following space delimited fields
#
# Field 1: Entry Type "DICT" - spellchecking dictionary
#                     "HYPH" - hyphenation dictionary
#                     "THES" - thesaurus files
#
# Field 2: Language code from Locale "en" or "de" or "pt" ...
#
# Field 3: Country Code from Locale "US" or "GB" or "PT"
#
# Field 4: Root name of file(s) "en_US" or "hyph_de" or "th_en_US
#          (do not add extensions to the name)

DICT en GB en_GB
HYPH en GB hyph_en_GB
THES en GB th_en_US_v2

# check packaged files
[root@lalilili dictionaries]# unzip -l unxlngi4.pro/bin/writingaids.zip
Archive:  unxlngi4.pro/bin/writingaids.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
   526269  12-07-05 22:11   en_GB.dic
    79272  12-07-05 22:11   hyph_en_GB.dic
    27375  12-07-05 22:11   en_GB.aff
     1438  12-07-05 22:11   README_en_GB.txt
      637  12-07-05 22:11   dictionary.lst
  3044762  12-07-05 22:11   th_en_US_v2.idx
 18597793  07-24-05 00:32   th_en_US_v2.dat
   100272  12-07-05 22:11   DicOOo.sxw
    90508  12-07-05 22:11   FontOOo.sxw
 --------                   -------
 22468326                   9 files


# configure only for American English (ENUS)

[root@lalilili dictionaries]#
[root@lalilili dictionaries]# cd ../config_office
[root@lalilili dictionaries]# ./configure
--with-jdk-home=/usr/java/j2sdk1.4.2_07/ --disable-mozilla -with-dict=ENUS
[root@lalilili OOo_2.0beta2]# ..
[root@lalilili OOo_2.0beta2]# . LinuxIntelEnv.Set.sh
[root@lalilili OOo_2.0beta2]# cd dictionaries
[root@lalilili dictionaries]# rm -rf unxlngi4.pro/
[root@lalilili dictionaries]# build
[root@lalilili dictionaries]# cat unxlngi4.pro/bin/dictionary.lst
# List of All Dictionaries to be Loaded by OpenOffice
# ---------------------------------------------------
# Each Entry in the list have the following space delimited fields
#
# Field 1: Entry Type "DICT" - spellchecking dictionary
#                     "HYPH" - hyphenation dictionary
#                     "THES" - thesaurus files
#
# Field 2: Language code from Locale "en" or "de" or "pt" ...
#
# Field 3: Country Code from Locale "US" or "GB" or "PT"
#
# Field 4: Root name of file(s) "en_US" or "hyph_de" or "th_en_US
#          (do not add extensions to the name)

DICT en US en_US
HYPH en US hyph_en_US
THES en US th_en_US_v2
[root@lalilili dictionaries]# unzip -l unxlngi4.pro/bin/writingaids.zip
Archive:  unxlngi4.pro/bin/writingaids.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
   695728  12-07-05 21:47   en_US.dic
    36467  12-07-05 21:47   hyph_en_US.dic
     2731  12-07-05 21:47   en_US.aff
     1625  12-07-05 21:47   WordNet_license.txt
      637  12-07-05 21:47   dictionary.lst
  3044762  12-07-05 21:47   th_en_US_v2.idx
 18597793  07-24-05 00:32   th_en_US_v2.dat
   100272  12-07-05 21:47   DicOOo.sxw
    90508  12-07-05 21:47   FontOOo.sxw
 --------                   -------
 22570523                   9 files

Comment 9 nemeth.lacko 2005-12-07 21:22:32 UTC
Created attachment 32193 [details]
Full test log
Comment 10 nemeth.lacko 2005-12-07 21:24:55 UTC
[try reassign]
Comment 11 nemeth.lacko 2005-12-07 21:32:35 UTC
nemeth->mh: Martin, I have made the QA, but please, check it. Thanks.
Comment 12 nemeth.lacko 2005-12-07 21:33:38 UTC
.
Comment 13 nemeth.lacko 2005-12-09 06:45:37 UTC
---- from langdev mailing list ----
Hi Laurent,

Quoting Laurent Godard <lgodard@indesko.com>:

> Hi all, Hi nemeth
>
> I just read the update of the dictionary.lst file
> Thansk for that
>
> I've a question though
> As en_GB thesaurus uses the same file as en_US, why not filling the .lst
> file with the two lines idependantaly of the language ?

I have made only a little 3-line patch:

--- dictionaries/en_GB/dictionary.lst   2005-02-21 13:07:04.000000000 +0100
+++ dictionaries.britishthesau/en_GB/dictionary.lst     2005-12-03
21:58:27.000000000 +0100
@@ -1,2 +1,3 @@
 DICT en GB en_GB
 HYPH en GB hyph_en_GB
+THES en GB th_en_US_v2
--- dictionaries/en_US/makefile.mk      2005-09-08 19:57:34.000000000 +0200
+++ dictionaries.britishthesau/en_US/makefile.mk        2005-12-04
03:26:45.000000000 +0100
-.IF "$(DIC_ALL)$(DIC_ENUS)"!=""
+.IF "$(DIC_ALL)$(DIC_ENUS)"!="" || "$(DIC_ALL)$(DIC_ENGB)"!=""

 ALLTAR : $(MISC)$/th_en_US_v2.don
--- dictionaries/prj/build.lst  2005-11-11 12:12:19.000000000 +0100
+++ dictionaries.britishthesau/prj/build.lst    2005-12-07 21:33:40.000000000 +0100
@@ -4,7 +4,7 @@
 di     dictionaries\cs_CZ      nmake   -       all     di_cs_CZ di_diclst NULL
 di     dictionaries\da_DK      nmake   -       all     di_da_DK di_diclst NULL
 di     dictionaries\de_DE      nmake   -       all     di_de_DE di_diclst NULL
-di     dictionaries\en_GB      nmake   -       all     di_en_GB di_diclst NULL
+di     dictionaries\en_GB      nmake   -       all     di_en_GB di_en_US
di_diclst NULL
 di     dictionaries\en_US      nmake   -       all     di_en_US di_diclst NULL
 di     dictionaries\it_IT      nmake   -       all     di_it_IT di_diclst NULL
 di     dictionaries\ru_RU      nmake   -       all     di_ru_RU di_diclst NULL

Separating WordNet thesaurus is a good idea, but I didn't want to do
big changes in the source.

>
> Do I miss something ?
> What if --with-dic is not set ? all is written ?

Yes, it is, with the right en_GB and en_US thesaurus declaration:

HYPH da DK hyph_da_DK
HYPH de DE hyph_de_DE
DICT en GB en_GB
HYPH en GB hyph_en_GB
THES en GB th_en_US_v2
DICT en US en_US
HYPH en US hyph_en_US
THES en US th_en_US_v2
DICT it IT it_IT
HYPH it IT hyph_it_IT
HYPH ru RU hyph_ru_RU

Best regards,

Laci
Comment 14 ooolist2007 2005-12-11 19:20:57 UTC
So does this only affect US and British English? What about Canadian English 
etc, shouldn't these be added, too? 
 
Comment 15 nemeth.lacko 2005-12-12 14:51:45 UTC
nemeth->dnaber: There is no Canadian English dictionary in the source.
I had tried to implement the ANY region in OOo's Thesaurus 
(for example, THES en ANY th_en_US_v2), but I didn't finish it.
Using ANY region could give a default thesaurus for a language, but in the first
place, DictOOo and OOo builds of the nativ language projects need to set the
right thesaurus. There may be big differences at regions of the same
language. For example, `The Spanish version of Windows used the word Hembra -
meaning "woman" in Spain - for choosing gender. But in some Central American
republics, notably Nicaragua, the word is an insult meaning "bitch". The program
was changed.' (http://www.guardian.co.uk/uk_news/story/0,,1285890,00.html)

Laci
Comment 16 quetschke 2006-01-19 04:26:46 UTC
Sorry to reopen, I was looking at CWS britishthesau and I found that for the
en_GB version the licence file is not included.

Something like this is needed in dictionaries/en_US/makefile.mk:

.IF "$(DIC_ALL)$(DIC_ENUS)$(DIC_ENUS)"!=""
DIC2BIN+= WordNet_license.txt
.ENDIF
Comment 17 nemeth.lacko 2006-01-19 12:28:43 UTC
nemeth->vq: Thanks for the patch, I've added it. For generating right
dictionary.lst I have made a target.pmk patch, too. (It's not the best solution.
Separating WordNet would be better.)

I have tested with
--with-dict=ENGB,
--with-dict=ENUS
--with-dict=ENGB,ENUS
--with-dict=ITIT
--with-dict=ALL
config parameters. (BTW, DIC_ALL is a sticky enviroment variable, we need
explicit unset DIC_ALL to rebuild OOo with not all dictionaries)

Diff:

diff -u -r dictionaries.britishthesau/en_US/makefile.mk
dictionaries/en_US/makefile.mk
--- dictionaries.britishthesau/en_US/makefile.mk	2005-12-04 03:26:45.000000000 +0100
+++ dictionaries/en_US/makefile.mk	2006-01-19 12:46:09.000000000 +0100
 
 DIC2BIN= \
 	en_US.aff \
 	en_US.dic \
-	WordNet_license.txt \
 	hyph_en_US.dic
 
 .ENDIF
 
+# add WordNet license to American and British English
+.IF "$(DIC_ALL)$(DIC_ENUS)$(DIC_ENGB)"!=""
+
+DIC2BIN+= \
+	WordNet_license.txt
+
+.ENDIF
+

diff -u -r dictionaries.britishthesau/util/target.pmk dictionaries/util/target.pmk
--- dictionaries.britishthesau/util/target.pmk	2005-09-08 19:58:44.000000000 +0200
+++ dictionaries/util/target.pmk	2006-01-19 12:46:09.000000000 +0100
 
 $(BIN)$/dictionary_$(DICT_LOCALE).line : dictionary.lst
+.IF "$(DIC2BIN)"!="WordNet_license.txt"
 	+$(TYPE) dictionary.lst > $(BIN)$/dictionary_$(DICT_LOCALE).line
-
+.ENDIF
Comment 18 nemeth.lacko 2006-01-19 12:29:24 UTC
.
Comment 19 quetschke 2006-01-26 21:40:06 UTC
I'm sorry for being such a pain in the ...

You didn't really explain your dictionaries/util/target.pmk patch.

I guess the only chance that
"$(DIC2BIN)"=="WordNet_license.txt"
is when en_US is build as a dependency for en_GB. In this case don't copy
dictionary.lst.

Please add a comment in target.pmk why you put that line there, otherwise
this gets lost.

Secondly, can you please put a
README_en_GB_thes.txt
file with something like this in en_GB/:

en_GB is using the WordNet thesaurus from the en_US directory.

If you make me the QA rep for britishthesau I'll immediately approve this CWS
after the changes are in.

   Volker
Comment 20 nemeth.lacko 2006-01-31 01:28:57 UTC
Add requested comments.

nemeth->vq: thanks for your suggestions and the QA. Laci
Comment 21 quetschke 2006-01-31 17:25:50 UTC
Thanks, looks good now. -> VERIFIED
Comment 22 andreschnabel 2006-02-19 09:25:01 UTC
thesaurus en_GB found in 2.0.2 RC1 -> close