|Summary:||Tomcat5.5.35＋Java1.5 cannot return proper value of a request parameter|
|Product:||Tomcat 5||Reporter:||Hiroki Hayashi <hayashihra>|
|Component:||Connector:HTTP||Assignee:||Tomcat Developers Mailing List <dev>|
JSP file to reproduce the matter
Test.java - Test Charset.decode()
new implementation of ByteChunk.toStringInternal()
Description Hiroki Hayashi 2012-02-02 10:03:26 UTC
Created attachment 28251 [details] JSP file to reproduce the matter (1) Overview When I install Tomcat5.5.35＋jdk1.5.0_22 and run the JSP(please see the attached document), I cannot get proper value of a request parameter. I enter multibyte character (e.g. 10 or aa) into the textbox of the JSP, it runs correcly and i can get the input value (e.g. 10 or aa). But I enter 1 byte character (e.g. "1" or "a"), it runs incorrectly and i can get nothing. Please advise me. (Our customers are also waiting for the reason.) Thank you. (2) Steps to Reproduce [2-1] Install Tomcat5.5.35＋jdk1.5.0_22 [2-2] Deploy the JSP file in the following directory. /apache-tomcat-5.5.35/webapps/jsp-examples [2-3] Enter the 1 byte character (e.g. "1" or "0") to the textbox and push ok button. (3) Actual Results The "message" shows nothing. (4) Expected Results The "message" shows the input character. (5) Build Date & Platform Build 2012-02-02 on Windows7 (I suppose it does not depend on the Platform.) (6) Additional Information Tomcat5.5.34＋jdk1.5.0_22 runs correctly. So the following codes may be the reason: --- org.apache.tomcat.util.buf.ByteChunk.toStringInternal() # Line514 CharBuffer cb; cb = charset.decode(ByteBuffer.wrap(buff, start, end-start)); return new String(cb.array(), cb.arrayOffset(), cb.length()); ---
Comment 1 Konstantin Kolinko 2012-02-02 10:37:21 UTC
Similar recent discussion on users@: ("POST data (single character) cleared when using tomcat 6.0.33 and Character Encoding Filter") http://marc.info/?t=132668010800001&r=1&w=2 http://markmail.org/message/o7l2p7ve5cpswnzl You stumbled upon bug in charset implementation in Java 1.5: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6196991
Comment 2 Konstantin Kolinko 2012-02-02 11:00:11 UTC
Created attachment 28252 [details] Test.java - Test Charset.decode() I am attaching a test class that I wrote based on reproduction scenario in bug 6196991 + charset enumeration code from r1140904. This test prints names of charsets that cannot perform encoding+decoding roundtrip for single "A" character. Here is the list of charsets that are affected by this issue, tested with 1.5.0_20-b02, on Windows: --- Big5 Big5-HKSCS EUC-JP EUC-KR GB2312 GBK ISO-2022-JP JIS_X0212-1990 Shift_JIS windows-31j + two dozens of non-standard charsets whose names start with "x-" --- With 1.4.2_19-b04 on Windows the list is the same less GB2312 which is absent. With 1.6.0_30-b12 on Windows the list contains this only charset: ---- JIS_X0212-1990 + 4 non-standard charsets whose names start with "x-" ---- So: 1. The issue is indeed a bug in JRE. It is present in latest public versions of 1.4 and 1.5 that I have. I do not know anything about later "Java for business" versions. 2. The issue is absent in Oracle/Sun JDK 1.6.30. 3. The issue affects only certain encodings. If you can update your configuration and applications to use UTF-8, you would avoid this issue.
Comment 3 Keiichi Fujino 2012-02-03 09:32:27 UTC
Created attachment 28257 [details] new implementation of ByteChunk.toStringInternal() Hi All. I am using Charaset affected by this issue. Although I know this is a issue in Java, I propose new implementation of ByteChunk.toStringInternal(). I will propose to STATUS.txt. (both 5.5.x and 6.0.x)
Comment 4 Hiroki Hayashi 2012-02-03 10:22:20 UTC
Thank you very much for the answer, Mr. Kolinko. And Thank you for the patch to the issue, Mr. Fujino. I tried to run the program from Mr. Kolinko, and could get the "Broken charset" like Shift_JIS. I could understand that the issue is a bug in JRE, and it is sure that the support limitation of Java5 was over. Thank you, sir. On the other hand, there is a message"Tomcat5.5.x requires 5.0 or later"on the page. http://tomcat.apache.org/tomcat-5.5-doc/building.html#Download_and_install_a_Java_Development_Kit_1.4.x_or_later So, We hope to get the patch to the program. Thank you very much.
Comment 5 Konstantin Kolinko 2012-02-04 00:46:39 UTC
(In reply to comment #3) > Created attachment 28257 [details] > new implementation of ByteChunk.toStringInternal() > -1. There are two errors: 1) "return new String(buff, start, end-start);" is just wrong. It converts bytes to String using OS default encoding. As far as I understand the "result.isUnderflow()" condition means that all input data has been processed. This "return new String" code just handles an unexpected state. I suggest to replace that code by "cr.throwException();". 2) "charset.newDecoder()" is expected to be an expensive operation. In scenario of CVE-2012-0022 I expect it to have notable impact on performance. Charset.decode() uses a ThreadLocal-based cache of decoders. Maybe we can implement something like that cache, or just use a simple ThreadLocal (or other way) to pass a Decoder instance around while processing the same request.
Comment 6 Konstantin Kolinko 2012-02-04 01:29:57 UTC
(In reply to comment #5) > Maybe we can > implement (...) just use a simple ThreadLocal > to pass a Decoder instance around while processing the same request. If a Decoder instance is obtained from a ThreadLocal a quick way to test it against required charset is to compare it with decoder.charset(). 3) For large input data the current implementation that calls Charset.decode() is better than the proposed one, because it allocates less memory. The difference is between (size * averageCharsPerByte()) and (size * maxCharsPerByte()). I think threshold can be around 10 bytes. The Java bug #6196991 occurs when the value of (input size * decoder.averageCharsPerByte()) coerced to integer is 0. In this case in Java 5 the CharsetDecoder#decode(ByteBuffer) method erroneously treats it as if no input data were available. If input is > 10 bytes it should not trigger the bug #6196991.
Comment 7 Keiichi Fujino 2012-02-06 08:20:18 UTC
Created attachment 28274 [details] patch v2 Many thanks for the comments. I reimplement ByteChunk.toStringInternal(). > I suggest to replace that code by "cr.throwException();". The code was replaced by result.throwException(). CharacterCodingException is thrown as RuntimeException. > Charset.decode() uses a ThreadLocal-based cache of decoders. Maybe we can > implement something like that cache, or just use a simple ThreadLocal (or other > way) to pass a Decoder instance around while processing the same request. Cache of Decoder was created using simple ThreadLocal. This cache is very simple now. Only one Decoder instance is always cached. If you would like to cache two or more Decoder instances, it is necessary to refactor. In that case, a code will become complicated to a slight degree. > 3) For large input data the current implementation that calls Charset.decode() > is better than the proposed one, because it allocates less memory. The > difference is between (size * averageCharsPerByte()) and (size * > maxCharsPerByte()). > > I think threshold can be around 10 bytes. The threshold value was added.
Comment 8 Mark Thomas 2012-02-17 18:29:36 UTC
I am still leaning heavily towards WONTFIX for this. This issue affects a version of the JVM where fixes are no longer provided for free by Oracle. Users of such a JVM have two options: 1. Upgrade to a JVM release (minimum 1.6) where this is fixed and Oracle continue to make fixes freely available. 2. Pay for Oracle support. I am extremely reluctant to start adding significant chunks of code into what is a very old Tomcat release in order to work around a bug in a JVM that no-one should be using unless they are paying for support.