Bug 36807

Summary:	Search issue with non-English characters
Product:	Lenya	Reporter:	solprovider <solprovider>
Component:	Lucene Integration	Assignee:	Lenya Developers <dev>
Status:	ASSIGNED ---
Severity:	normal
Priority:	P2
Version:	1.2.4
Target Milestone:	2.0.1
Hardware:	Other
OS:	All

Description solprovider 2005-09-25 22:00:01 UTC

FILE: pubs\default\lenya\content\search\search-and-results.xsp
After line 162: 
String query = <xsp-request:get-parameter name="query" default=""/>;
ADD: 
query = new String(query.getBytes("ISO-8859-1"), "UTF-8");

This was discovered and fixed by John Cherouvim during the User ML thread
started on September 20, 2005.

Comment 1 solprovider 2005-09-25 22:06:20 UTC

Or change the first line to:
String query = <xsp-request:get-parameter name="query" default=""
form-encoding="UTF-8"/>;

I am unable to test either solution.  The purpose is to force Java to recognize
the full character set (not set certain characters to question marks) while
maintaining the String as UTF-8.

Comment 2 Andreas Hartmann 2008-03-22 14:48:50 UTC

Searching for umlauts works with LCR 636135. Can someone elaborate on the problem?

Comment 3 solprovider 2008-03-22 20:28:49 UTC

The original ML thread is at:
http://www.nabble.com/how-to-be-notified%2C-workflow-to860207.html#a907407

This preceded my becoming a Committer.  Lenya 1.2.4 contained my original version of Search.  Several patches to the Search system were added to the version on my website.  The last update to Search on my website was 2006-01-25. I do not know if anybody updated svn with those patches.  (I did not use svn until 2006-05.)

Most umlauts are in standard 8-bit charset and are not a true test of "characters not derived from the Latin alphabet."  I am unable to test if search works with truly different alphabets (e.g. Russian, most oriental languages.)