Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Japanese word cannot be searched in Search tab in Help window | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Internationalization | Reporter: | yuko <yuko.ohsumi> | ||||||
Component: | ui | Assignee: | AOO issues mailing list <issues> | ||||||
Status: | ACCEPTED --- | QA Contact: | |||||||
Severity: | Trivial | ||||||||
Priority: | P2 | CC: | amy2008, issues, ivo.hinkelmann, ooo.redflag, uwefis, zhuangyuelin | ||||||
Version: | DEV300m23 | Keywords: | CJK | ||||||
Target Milestone: | --- | ||||||||
Hardware: | All | ||||||||
OS: | All | ||||||||
Issue Type: | DEFECT | Latest Confirmation in: | --- | ||||||
Developer Difficulty: | --- | ||||||||
Attachments: |
|
Description
yuko
2008-07-17 09:40:52 UTC
Created attachment 55169 [details]
search result in localisation29 build
Created attachment 55170 [details]
search result in SS Beta2 build
ab, is this related to the new indexer?!? jsc: Can you please take over as ab is on vacation. Thanks. accepted the error must be in the lucene part of the index search. I can debug our own code and everything looks fine so far (the same as for the English version). But the lucene search returns 0, nothing find. I am currently still not able to debug the lucene code, i am investigating ... Maybe the index is wrong but a simple rebuild of the index changed nothing. fixed on cws helpsearch I integrated lucene-analyzers-2.3.jar and use the CJK analyzer for Japanese. At the moment for Japanese only because Korean and Chinese worked with the standard analyzer as well. Maybe it would be interesting to check if it make sense to use language specific analyzers in the future to get better search results. Have to be evaluated. Changes in xmlhelp: introduce a new lang parameter in HelpSearch.java, HelpIndexer, resultsetforquery.cxx and use the CJK analyzer for ja. helpcontent2: use new lang parameter for HelpIndexer lucene: simplify makefile and build now the lucene-analyzers-XY.jar as well. scp2: add lucene-analyzers-2.3.jar config_office: add lucene-analyzers jar jsc: sorry, reopening. xmlhelp2 missed the $(PATH_SEPERATOR)$(LUCENE_ANALYZERS_JAR) in the SYSTEM_LUCENE case config_office is not committed to the cws at all. sorry, taking config_office back... xmlhelp just fixed by me I take it. Seen ok in cws helpsearch. -> verified Seen ok in current master -> closed Search behavior for Japanese word does not work as expected in OOo 3.0 RC1, so I reopen this and change the target milestone to OOo 3.0.1. Only first two characters are used as a search word. When entering Japanese word グループ for English word 'group', 'No topics found.' appears. グルー is not found. However, グル shows some topics. Ex. English Japanese (not found) -> (found) Wizard ウィザード -> ウィ Japanese 日本語 -> 日本 Spellcheck スペルチェック -> スペ グル is not a correct word in Japanese, so topics for グループ need to be found. Also, Japanese word 言語設定 for English 'Language Settings' shows some topics, but 言語 highlights in the chosen page, so the first two characters are used as a search word. adding ihi and ufi in cc: ab, please have a look . Seems the japanese hc2 search function still has problems .... STARTED for now Probably I will create a new issue for this as reusing an old issue usually leads to some confusion concerning cws assignment etc. pls see also issue 38553 for problems with the English search. Some more info also in issue 61820 I think that this issue is different from issue 38554/61820. Issue 38554/61820 is search issue between a word "custom" and a phrase "custom shape". However, this ja issue is search word issue that a word "custom" ("カスタム") is devided ("カス" and "タム") first two ja characters ("カス") are used as search word. Evaluation showed that this is the default behavior of Lucene's CJKAnalyzer. This cannot be changed for 3.0.1. -> 3.1 for now. Not enough time left to check / use another analyzer -> 3.2 -> OOo 3.3 The same problem in OOo_zh Li Meiying Not enough time left for 3.3 -> 3.4 -> OOo 3.x Reset assigne to the default "issues@openoffice.apache.org". |