Apache OpenOffice (AOO) Bugzilla – Issue 91750
Japanese word cannot be searched in Search tab in Help window
Last modified: 2017-05-20 11:13:40 UTC
Tested build: DEV300m23 (Build:9326) [CWS:localisation29] Tested platform: Solaris SPARC, WinXP In the Search tab in the Help window, when entering Japanese word, help topics are not found. ASCII word can be searched, but any Japanese word cannot be searched. Please see the attached CWSl10n29_Help_ja.gif. Also, I tried to do the same steps in StarSuite Beta 2 build. Japanese word can be searched as the attached SS-Beta2_Help_ja.gif. Japanese word is translation for English word "help" in this image.
Created attachment 55169 [details] search result in localisation29 build
Created attachment 55170 [details] search result in SS Beta2 build
ab, is this related to the new indexer?!?
jsc: Can you please take over as ab is on vacation. Thanks.
accepted
the error must be in the lucene part of the index search. I can debug our own code and everything looks fine so far (the same as for the English version). But the lucene search returns 0, nothing find. I am currently still not able to debug the lucene code, i am investigating ... Maybe the index is wrong but a simple rebuild of the index changed nothing.
fixed on cws helpsearch I integrated lucene-analyzers-2.3.jar and use the CJK analyzer for Japanese. At the moment for Japanese only because Korean and Chinese worked with the standard analyzer as well. Maybe it would be interesting to check if it make sense to use language specific analyzers in the future to get better search results. Have to be evaluated. Changes in xmlhelp: introduce a new lang parameter in HelpSearch.java, HelpIndexer, resultsetforquery.cxx and use the CJK analyzer for ja. helpcontent2: use new lang parameter for HelpIndexer lucene: simplify makefile and build now the lucene-analyzers-XY.jar as well. scp2: add lucene-analyzers-2.3.jar config_office: add lucene-analyzers jar
jsc: sorry, reopening. xmlhelp2 missed the $(PATH_SEPERATOR)$(LUCENE_ANALYZERS_JAR) in the SYSTEM_LUCENE case config_office is not committed to the cws at all.
sorry, taking config_office back... xmlhelp just fixed by me
I take it.
Seen ok in cws helpsearch. -> verified
Seen ok in current master -> closed
Search behavior for Japanese word does not work as expected in OOo 3.0 RC1, so I reopen this and change the target milestone to OOo 3.0.1. Only first two characters are used as a search word. When entering Japanese word グループ for English word 'group', 'No topics found.' appears. グルー is not found. However, グル shows some topics. Ex. English Japanese (not found) -> (found) Wizard ウィザード -> ウィ Japanese 日本語 -> 日本 Spellcheck スペルチェック -> スペ グル is not a correct word in Japanese, so topics for グループ need to be found. Also, Japanese word 言語設定 for English 'Language Settings' shows some topics, but 言語 highlights in the chosen page, so the first two characters are used as a search word.
adding ihi and ufi in cc:
ab, please have a look . Seems the japanese hc2 search function still has problems ....
STARTED for now Probably I will create a new issue for this as reusing an old issue usually leads to some confusion concerning cws assignment etc.
pls see also issue 38553 for problems with the English search. Some more info also in issue 61820
I think that this issue is different from issue 38554/61820. Issue 38554/61820 is search issue between a word "custom" and a phrase "custom shape". However, this ja issue is search word issue that a word "custom" ("カスタム") is devided ("カス" and "タム") first two ja characters ("カス") are used as search word.
Evaluation showed that this is the default behavior of Lucene's CJKAnalyzer. This cannot be changed for 3.0.1. -> 3.1 for now.
Not enough time left to check / use another analyzer -> 3.2
-> OOo 3.3
The same problem in OOo_zh Li Meiying
Not enough time left for 3.3 -> 3.4
-> OOo 3.x
Reset assigne to the default "issues@openoffice.apache.org".