Bug 36807

Summary: Search issue with non-English characters
Product: Lenya Reporter: solprovider <solprovider>
Component: Lucene IntegrationAssignee: Lenya Developers <dev>
Status: ASSIGNED ---    
Severity: normal    
Priority: P2    
Version: 1.2.4   
Target Milestone: 2.0.1   
Hardware: Other   
OS: All   

Description solprovider 2005-09-25 22:00:01 UTC
FILE: pubs\default\lenya\content\search\search-and-results.xsp
After line 162: 
String query = <xsp-request:get-parameter name="query" default=""/>;
ADD: 
query = new String(query.getBytes("ISO-8859-1"), "UTF-8");

This was discovered and fixed by John Cherouvim during the User ML thread
started on September 20, 2005.
Comment 1 solprovider 2005-09-25 22:06:20 UTC
Or change the first line to:
String query = <xsp-request:get-parameter name="query" default=""
form-encoding="UTF-8"/>;

I am unable to test either solution.  The purpose is to force Java to recognize
the full character set (not set certain characters to question marks) while
maintaining the String as UTF-8.
Comment 2 Andreas Hartmann 2008-03-22 14:48:50 UTC
Searching for umlauts works with LCR 636135. Can someone elaborate on the problem?
Comment 3 solprovider 2008-03-22 20:28:49 UTC
The original ML thread is at:
http://www.nabble.com/how-to-be-notified%2C-workflow-to860207.html#a907407

This preceded my becoming a Committer.  Lenya 1.2.4 contained my original version of Search.  Several patches to the Search system were added to the version on my website.  The last update to Search on my website was 2006-01-25. I do not know if anybody updated svn with those patches.  (I did not use svn until 2006-05.)

Most umlauts are in standard 8-bit charset and are not a true test of "characters not derived from the Latin alphabet."  I am unable to test if search works with truly different alphabets (e.g. Russian, most oriental languages.)