36807 – Search issue with non-English characters

Bug 36807 - Search issue with non-English characters

Summary: Search issue with non-English characters

Status:	ASSIGNED

Alias:	None

Product:	Lenya
Classification:	Unclassified
Component:	Lucene Integration (show other bugs)
Version:	1.2.4
Hardware:	Other All

Importance:	P2 normal
Target Milestone:	2.0.1
Assignee:	Lenya Developers

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-09-25 22:00 UTC by solprovider
Modified:	2008-03-22 20:28 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description solprovider 2005-09-25 22:00:01 UTC

FILE: pubs\default\lenya\content\search\search-and-results.xsp
After line 162: 
String query = <xsp-request:get-parameter name="query" default=""/>;
ADD: 
query = new String(query.getBytes("ISO-8859-1"), "UTF-8");

This was discovered and fixed by John Cherouvim during the User ML thread
started on September 20, 2005.

Comment 1 solprovider 2005-09-25 22:06:20 UTC

Or change the first line to:
String query = <xsp-request:get-parameter name="query" default=""
form-encoding="UTF-8"/>;

I am unable to test either solution.  The purpose is to force Java to recognize
the full character set (not set certain characters to question marks) while
maintaining the String as UTF-8.

Comment 2 Andreas Hartmann 2008-03-22 14:48:50 UTC

Searching for umlauts works with LCR 636135. Can someone elaborate on the problem?

Comment 3 solprovider 2008-03-22 20:28:49 UTC

The original ML thread is at:
http://www.nabble.com/how-to-be-notified%2C-workflow-to860207.html#a907407

This preceded my becoming a Committer.  Lenya 1.2.4 contained my original version of Search.  Several patches to the Search system were added to the version on my website.  The last update to Search on my website was 2006-01-25. I do not know if anybody updated svn with those patches.  (I did not use svn until 2006-05.)

Most umlauts are in standard 8-bit charset and are not a true test of "characters not derived from the Latin alphabet."  I am unable to test if search works with truly different alphabets (e.g. Russian, most oriental languages.)