Bug 25526 - tomcat parses the query string parameters as iso-8859-1
Summary: tomcat parses the query string parameters as iso-8859-1
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 4
Classification: Unclassified
Component: Unknown (show other bugs)
Version: 4.1.29
Hardware: PC All
: P3 normal with 20 votes (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-12-15 08:23 UTC by Panagiotis Korros
Modified: 2005-03-20 17:06 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Panagiotis Korros 2003-12-15 08:23:30 UTC
Tomcat allways parses the query string parameters as iso-8859-1 url encoded.
If the page that submits the data has UTF-8 encoding like:
<%@ page contentType="text/html;charset=utf-8"%>

And even have the SetCharacterEncodingFilter set to UTF-8 in the web.xml

<filter>
  <filter-name>SetCharacterEncoding</filter-name>
  <filter-class>filters.SetCharacterEncodingFilter</filter-class>
  <init-param>
    <param-name>encoding</param-name>
      <param-value>UTF-8</param-value>
  </init-param>
  <init-param>
    <param-name>ignore</param-name>
    <param-value>false</param-value>
  </init-param>
</filter>
<filter-mapping>
  <filter-name>SetCharacterEncoding</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>

You have to do the following trick in the page that handles the request to get
the correct encoding:

String param = new
String(request.getParameter("param").getBytes("iso-8859-1"),"utf-8");

Changing to POST works ok.
Comment 1 Mark Thomas 2004-06-25 22:55:40 UTC
The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to 
ISO-8859-1.
The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding field 
which defaults to the URIEncoding. It must be set before the parameters are 
parsed to have an effect.
Comment 2 Scott Farquhar 2004-06-26 02:31:59 UTC
So - if I do request.setCharacterEncoding() before I get the query parameters, then this will work?

This was closed as fixed.  Does this mean that this was a bug, and is now fixed?  From what you 
mention, the workaround is Tomcat specific, whereas this seems to be a bug with Tomcat not dealing 
with the spec properly?
Comment 3 Mark Thomas 2004-06-27 16:51:52 UTC
Depending on other settings (see below) yes it should. This has cropped up a 
number of times. I have put together some standard text which covers this area 
and have included it below.

REQUESTS
========

There are a number of situations where there may be a requirement to use non-
US ASCII characters in a URI. These include:
- Parameters in the query string
- Servlet paths

There is a standard for encoding URIs (http://www.w3.org/International/O-URL-
code.html) but this standard is not consistently followed by clients. This 
causes a number of problems.

The functionality provided by Tomcat (4 and 5) to handle this less than ideal 
situation is described below.

1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which 
if set to true will use the request body encoding to decode the URI query 
parameters.
  - The default value is true for TC4 (breaks spec but gives consistent 
behaviour across TC4 versions)
  - The default value is false for TC5 (spec compliant but there may be 
migration issues for some apps)
2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to 
ISO-8859-1.
3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding 
field which defaults to the URIEncoding. It must be set before the parameters 
are parsed to have an effect.

Things to note regarding the servlet API:
1. HttpServletRequest.setCharacterEncoding() normally only applies to the 
request body NOT the URI.
2. HttpServletRequest.getPathInfo() is decoded by the web container.
3. HttpServletRequest.getRequestURI() is not decoded by container.

Other tips:
1. Use POST with forms to return parameters as the parameters are then part of 
the request body.


RESPONSES
=========

HTML META tags are ignored by Tomcat. You may use <%@ page pagEncoding="..." %
> for JSPs.
Comment 4 Anton 2004-07-09 02:03:35 UTC
Sorry - but I am still confused: in your comment, in point number 1 you say that
there is useBodyEncodingForURI parameter for Cayote that for TC4 defaults to true. 

In the second point you say that Cayote has URIEncoding parameter which is set
to ISO-8859-1 and in the third point you mention that the URIEncoding parameter
is used to parse the URL parameters (by default).

From you points I do not understand what is the purpose of the
useBodyEncodingForURI and when is it used by Tomcat?

In TC4, the behaviour that we are seeing is that the URL parameters are parsed
using ISO-8859-1 even of the request.setCharacterEncoding() is called with "UTF-8".

Is there a way to configure Tomcat 4 and 5 (hopefully without calling
Tomcat/Cayote specific methods on objects at runtime) which will force the URL
Parameters to be parsed using the encoding of the body?

The reason I am after this, is that sometimes there is a need to send a browser
redirect passing parameters. A redirect always results in a GET HTTP request
from the browser. I would prefer not to use the session to store the parameters.

Thanks for all your help and information!
Comment 5 Mark Thomas 2004-07-09 17:38:59 UTC
Bugzilla is not a forum for support questions. Please ask questions such as 
this on tomcat user.