after update to tomcat5.0.14Beta, old page occurs some encoding issues, it seemed that request.setCharacterEncoding(String enc) doesn't work! Since request.getCharacterEncoding() is return the correct encoding that I've set (GBK: a chinese encoding), but the String get by request.getParameter ("field_name") is still iso-8859-1! (ONLY after do like String newString = new String(request.getParameter("field_name")).getBytes("iso- 8859-1"), "GBK"); could get corrent String). Following is my test JSP source: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML> <%@ page language = "java" session = "false" contentType = "text/html; charset=GBK" %><%request.setCharacterEncoding("GBK");%> <HEAD> <TITLE>Test page</TITLE> <META HTTP-EQUIV="Content-Type" content="text/html; charset=GBK"> <META NAME="Author" CONTENT="Joachim"> <META NAME="Keywords" CONTENT=""> </HEAD> <BODY BGCOLOR="#FFFFFF"> <FORM METHOD=POST ACTION=""> <TEXTAREA NAME="text" ROWS="6" COLS="60" wrap="off"> <%=new String(cl(request.getParameter("text")).getBytes("iso-8859- 1"), "GBK")%> </TEXTAREA> <BR/><INPUT TYPE="submit" value="Submit"> </FORM> <%=request.getCharacterEncoding()%> </BODY> </HTML> <%! String cl(String v) { return (null == v) ? "" : v; } %>
URI parameters encoding isn't handled with that. See the URI encoding parameter. However, URI encoding can't be ade to work reliably in all cases, due to the absence of a standard. If you want i18n, use POST. Please do not reopen the report.
Remy, the example submited with the bug report should produce the result expected by the reporter without having to use the getBytes() workaround (and it does use POST). I cannot verify this as I don't have a tomcat 5 installation, but marking this as invalid for the reasons mentioned is just wrong.
I did look at it (it does indeed POST), and added traces into the o.a.tomcat.util.http.Parameters class, and (unsurprisingly) the correct encoding name is being used for character decoding (the input being a byte array). So the example should work fine.
I find the real reason: It's my fault, I use a filter(atlassian ProfilingFilter) which is called before Servlet service method and invoke request.getParameter() before! Thanks all, I will carefully test and verifiy before report!
Sorry for bothering you again, but I'm not completely with you... Though there is no standard for the URI encodings, there is a servlet specification (see 2.4pfd3, chapter SRV.4.9). Also, there are many cases when developer is not the one who will impose POST requests to his application - he is forced to get parameters from URI. So, this method SHOULD definitely work as before... I have prepared a 'clean' test file 'bug.jsp' which can be dropped either in Tomcat 5.0.9 or 5.0.14 to see the difference between these two points of developmnent (this file contains some russian text in Cp1251 as a sample). 5.0.9 works fine whereas 5.0.14 does not. <%@ page pageEncoding="Cp1251" language="java" contentType="text/html; charset=utf-8" %> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <% request.setCharacterEncoding("utf-8"); String test = request.getParameter("test"); if (test != null) { out.write("the length of the test value after decoding: " + test.length()); } out.write("<br>"); out.write("the value of the test parameter: " + test); out.write("<br>"); %> <form action="bug.jsp" method="get"> <input type="text" name="test" value="тест"> <input type="submit" value="test this russian text (4 characters)!"> </form> </body> </html> What do you think?
The previous behavior was breaking the HTTP spec. Of course, since you were using UTF-8, you were basically in the only situation that could work. You can use the URIEncoding attribute on the Connector to specify the URI encoding (so set it to UTF-8).