Bug 16912

Summary: c:param encodes URL with default URLEncoder.encode(str), while must with URLEncoder.encode(str, encoding)
Product: Taglibs Reporter: Vit Timchishin <tivv>
Component: Standard TaglibAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: major CC: jakarta
Priority: P3    
Version: 1.0   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Vit Timchishin 2003-02-09 08:16:23 UTC
Summary says the problem - it does not work well with cyrillic.
The fix is:
ParamSupport.java (common/core), line 123:
        try
        {
        if (encode) {
            parent.addParameter(
                URLEncoder.encode(name,
pageContext.getResponse().getCharacterEncoding()), URLEncoder.encode(value,
pageContext.getResponse().getCharacterEncoding()));
        } else
            parent.addParameter(name, value);
        }
        catch (java.io.UnsupportedEncodingException e)
        {throw new JspException(e.toString());}
Comment 1 Pierre Delisle 2003-02-26 22:54:06 UTC
Thanks for the bug report.

Fix is more elaborate than the one suggested because
URLEncoder.encode(String, String) is new since J2SE 1.4 and
JSTL 1.0 must also run on previous releases of J2SE.
Comment 2 Stefan Kuehnel 2003-02-27 15:14:13 UTC
From my understanding of the documentation for URLEncoder and the implementation 
notes of the HTML spec 
(http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars), the parameter 
name and value should be encoded using UTF-8, not using the document encoding as 
the current fix does.  Looking at the Tomcat 4.x sources 
(http://cvs.apache.org/viewcvs.cgi/jakarta-tomcat-4.0/catalina/src/share/org/apa
che/catalina/connector/HttpRequestBase.java), it seems it uses the document 
encoding for parameter decoding, but shouldn't there at least be an option to 
specify the parameter encoding to use?
Comment 3 Vit Timchishin 2003-02-27 15:41:57 UTC
This would be more correct as soon as this would be parsed correctly by tomcat.
For now next test:
<c:out value="${param.param}"/>
<p><a href='test3.jsp?param=<%= java.net.URLEncoder.encode("&#1055;&#1088;&#1080;&#1074;&#1077;&#1090;", "UTF-8")
%>'>Click</a>

(file is test3.jsp) gives test3.jsp?param=%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82
URL that is parsed correctly in Mozilla (displayed OK in status bar), but
incorrectly in Tomcat - &#1072;&#65533;&#1073;&#65533;&#1072;&#1048;&#1072;&#1042;&#1072;&#1045;&#1073;&#65533; is displayed by c:out instead of &#1055;&#1088;&#1080;&#1074;&#1077;&#1090;.
Note that the test3.jsp also has next statements (that allows me to use cyrillic
correctly):
<%@ page pageEncoding="windows-1251" %>
<% if (request.getCharacterEncoding() == null)
request.setCharacterEncoding(response.getCharacterEncoding());
%>
Comment 4 Dmitry Andrianov 2003-03-05 15:11:36 UTC
Tha way you fixed this bug will work on JDK 1.4 but will fallback to old 
behavior on JDK 1.3. In fact that means bug is not fixed.

It would be better to implement your own urlEncode implementation inside JSTL 
and use it on JDK 1.3. Simplest way is to use URLEncoder.encode source from 1.4
Comment 5 Pierre Delisle 2003-04-30 14:33:15 UTC
*** Bug 19477 has been marked as a duplicate of this bug. ***
Comment 6 Pierre Delisle 2003-04-30 23:01:52 UTC
Stefan is right about the HTML spec. However, this part of the HTML
spec was apparently produced too late to have an impact on
reality. Browsers generally encode the query string using the
character encoding of the page containing the form. Moreover,
the JSP 2.0 spec also adopts this convention for internally
generated query strings.

It therefore seems wise to follow suit with what everyone else is doing.

I've updated the code to do the encoding as follows:
   Util.URLEncode(name, enc)
where the URLEncode method has been lifted from the Jasper2 source code
(we now use the same code for both J2SE 1.3 and J2SE 1.4),
and where enc is 'pageContext.getResponse().getCharacterEncoding()'.