FILE: pubs\default\lenya\content\search\search-and-results.xsp After line 162: String query = <xsp-request:get-parameter name="query" default=""/>; ADD: query = new String(query.getBytes("ISO-8859-1"), "UTF-8"); This was discovered and fixed by John Cherouvim during the User ML thread started on September 20, 2005.
Or change the first line to: String query = <xsp-request:get-parameter name="query" default="" form-encoding="UTF-8"/>; I am unable to test either solution. The purpose is to force Java to recognize the full character set (not set certain characters to question marks) while maintaining the String as UTF-8.
Searching for umlauts works with LCR 636135. Can someone elaborate on the problem?
The original ML thread is at: http://www.nabble.com/how-to-be-notified%2C-workflow-to860207.html#a907407 This preceded my becoming a Committer. Lenya 1.2.4 contained my original version of Search. Several patches to the Search system were added to the version on my website. The last update to Search on my website was 2006-01-25. I do not know if anybody updated svn with those patches. (I did not use svn until 2006-05.) Most umlauts are in standard 8-bit charset and are not a true test of "characters not derived from the Latin alphabet." I am unable to test if search works with truly different alphabets (e.g. Russian, most oriental languages.)