Tomcat allways parses the query string parameters as iso-8859-1 url encoded. If the page that submits the data has UTF-8 encoding like: <%@ page contentType="text/html;charset=utf-8"%> And even have the SetCharacterEncodingFilter set to UTF-8 in the web.xml <filter> <filter-name>SetCharacterEncoding</filter-name> <filter-class>filters.SetCharacterEncodingFilter</filter-class> <init-param> <param-name>encoding</param-name> <param-value>UTF-8</param-value> </init-param> <init-param> <param-name>ignore</param-name> <param-value>false</param-value> </init-param> </filter> <filter-mapping> <filter-name>SetCharacterEncoding</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> You have to do the following trick in the page that handles the request to get the correct encoding: String param = new String(request.getParameter("param").getBytes("iso-8859-1"),"utf-8"); Changing to POST works ok.
The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to ISO-8859-1. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding field which defaults to the URIEncoding. It must be set before the parameters are parsed to have an effect.
So - if I do request.setCharacterEncoding() before I get the query parameters, then this will work? This was closed as fixed. Does this mean that this was a bug, and is now fixed? From what you mention, the workaround is Tomcat specific, whereas this seems to be a bug with Tomcat not dealing with the spec properly?
Depending on other settings (see below) yes it should. This has cropped up a number of times. I have put together some standard text which covers this area and have included it below. REQUESTS ======== There are a number of situations where there may be a requirement to use non- US ASCII characters in a URI. These include: - Parameters in the query string - Servlet paths There is a standard for encoding URIs (http://www.w3.org/International/O-URL- code.html) but this standard is not consistently followed by clients. This causes a number of problems. The functionality provided by Tomcat (4 and 5) to handle this less than ideal situation is described below. 1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which if set to true will use the request body encoding to decode the URI query parameters. - The default value is true for TC4 (breaks spec but gives consistent behaviour across TC4 versions) - The default value is false for TC5 (spec compliant but there may be migration issues for some apps) 2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to ISO-8859-1. 3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding field which defaults to the URIEncoding. It must be set before the parameters are parsed to have an effect. Things to note regarding the servlet API: 1. HttpServletRequest.setCharacterEncoding() normally only applies to the request body NOT the URI. 2. HttpServletRequest.getPathInfo() is decoded by the web container. 3. HttpServletRequest.getRequestURI() is not decoded by container. Other tips: 1. Use POST with forms to return parameters as the parameters are then part of the request body. RESPONSES ========= HTML META tags are ignored by Tomcat. You may use <%@ page pagEncoding="..." % > for JSPs.
Sorry - but I am still confused: in your comment, in point number 1 you say that there is useBodyEncodingForURI parameter for Cayote that for TC4 defaults to true. In the second point you say that Cayote has URIEncoding parameter which is set to ISO-8859-1 and in the third point you mention that the URIEncoding parameter is used to parse the URL parameters (by default). From you points I do not understand what is the purpose of the useBodyEncodingForURI and when is it used by Tomcat? In TC4, the behaviour that we are seeing is that the URL parameters are parsed using ISO-8859-1 even of the request.setCharacterEncoding() is called with "UTF-8". Is there a way to configure Tomcat 4 and 5 (hopefully without calling Tomcat/Cayote specific methods on objects at runtime) which will force the URL Parameters to be parsed using the encoding of the body? The reason I am after this, is that sometimes there is a need to send a browser redirect passing parameters. A redirect always results in a GET HTTP request from the browser. I would prefer not to use the session to store the parameters. Thanks for all your help and information!
Bugzilla is not a forum for support questions. Please ask questions such as this on tomcat user.