Bug 36807 - Search issue with non-English characters
Summary: Search issue with non-English characters
Status: ASSIGNED
Alias: None
Product: Lenya
Classification: Unclassified
Component: Lucene Integration (show other bugs)
Version: 1.2.4
Hardware: Other All
: P2 normal
Target Milestone: 2.0.1
Assignee: Lenya Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-25 22:00 UTC by solprovider
Modified: 2008-03-22 20:28 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description solprovider 2005-09-25 22:00:01 UTC
FILE: pubs\default\lenya\content\search\search-and-results.xsp
After line 162: 
String query = <xsp-request:get-parameter name="query" default=""/>;
ADD: 
query = new String(query.getBytes("ISO-8859-1"), "UTF-8");

This was discovered and fixed by John Cherouvim during the User ML thread
started on September 20, 2005.
Comment 1 solprovider 2005-09-25 22:06:20 UTC
Or change the first line to:
String query = <xsp-request:get-parameter name="query" default=""
form-encoding="UTF-8"/>;

I am unable to test either solution.  The purpose is to force Java to recognize
the full character set (not set certain characters to question marks) while
maintaining the String as UTF-8.
Comment 2 Andreas Hartmann 2008-03-22 14:48:50 UTC
Searching for umlauts works with LCR 636135. Can someone elaborate on the problem?
Comment 3 solprovider 2008-03-22 20:28:49 UTC
The original ML thread is at:
http://www.nabble.com/how-to-be-notified%2C-workflow-to860207.html#a907407

This preceded my becoming a Committer.  Lenya 1.2.4 contained my original version of Search.  Several patches to the Search system were added to the version on my website.  The last update to Search on my website was 2006-01-25. I do not know if anybody updated svn with those patches.  (I did not use svn until 2006-05.)

Most umlauts are in standard 8-bit charset and are not a true test of "characters not derived from the Latin alphabet."  I am unable to test if search works with truly different alphabets (e.g. Russian, most oriental languages.)