Bug 55917 - Cookie parsing fails hard with ISO-8859-1 values
Summary: Cookie parsing fails hard with ISO-8859-1 values
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 8.0.x-trunk
Hardware: All All
: P2 normal (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
Depends on:
Blocks: 55951
  Show dependency tree
Reported: 2013-12-20 20:22 UTC by Jeremy Boynes
Modified: 2014-09-02 15:13 UTC (History)
0 users

Fix to allow chars in the range 0xa0-0xff (5.12 KB, patch)
2013-12-20 20:30 UTC, Jeremy Boynes
Details | Diff
Allow 0xa0-0xff in V0 values only (5.70 KB, patch)
2013-12-21 21:20 UTC, Jeremy Boynes
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Boynes 2013-12-20 20:22:13 UTC
Some popular JavaScript libraries have started to set cookie values in the browser directly and include ISO-8859-1 (Latin-1) characters in the range 0xA0-0xFF. When the Cookie header is parsed by Tomcat, the request fails with an IllegalArgumentException[1] from the connector without giving the application an opportunity to validate the cookie value received.

RFC2616 (HTTP/1.1) allows header field-values to contain ISO-8859-1 characters which includes the range 0xA0-0xFF. RFC2109 (cookies) allows for "quoted-string" values which can contain TEXT octets (which includes those characters). This is different to cookie names which are defined as the more restricted "token" which only allows USASCII values. The original Netscape spec does not mention character encodings.

[1] http://svn.apache.org/viewvc/tomcat/tc7.0.x/trunk/java/org/apache/tomcat/util/http/CookieSupport.java?revision=1200183&view=markup#l190
Comment 1 Jeremy Boynes 2013-12-20 20:30:17 UTC
Created attachment 31139 [details]
Fix to allow chars in the range 0xa0-0xff

Patch allows characters in the range 0xA0-0xFF (so it continues to exclude controls both <0x20 and 0x80-0x9F). Added testcase for a Latin-1 character and test-suite passes.

To keep it simple, this patch does not attempt to differentiate between quoted and unquoted values. It also does not attempt to deal with values containing UTF-8 encoded data.
Comment 2 Mark Thomas 2013-12-20 20:55:47 UTC
This simple patch is not acceptable as it does not retain the limitation that cookie names must be tokens.

Now might be the time to re-write the cookie parsing using the HttpParser.

Given the 'fun' we have had with cookie processing in the past we need to be very careful about any changes we introduce. Now could be a good time to do this in 8.0.x and then back-port it once it is stable.
Comment 3 Mark Thomas 2013-12-20 20:58:00 UTC
If we do revisit cookie parsing we should keep RFC6265 in mind as well as the fact that Tomcat moved to a strict adherence to the cookie specs in order to avoid a number of potential security issues.
Comment 4 Jeremy Boynes 2013-12-20 21:28:24 UTC
I agree that this would be a good time for a larger cleanup. To keep things incremental I'll start with refining the patch (against trunk) to handle names and values separately.
Comment 5 Jeremy Boynes 2013-12-21 21:20:43 UTC
Created attachment 31140 [details]
Allow 0xa0-0xff in V0 values only

Minimal patch allowing ISO-8859-1 characters in the range 0xa0-0xff for V0 values only.

This refactors the check when processing tokens to allow 8-bit characters just for V0 values. They will still trigger an IllegalArgumentException if they appear in a name or in a V1 unquoted value.

V1 quoted values already support them via a different code path. I discovered an issue (#55918) there where CTLs will not cause an IAE and will appear in the returned value. I've tagged the tests for that as @Ignored to be resolved in a different fix.
Comment 6 Jeremy Boynes 2013-12-23 19:16:45 UTC
Patch applied to trunk as r1553187 to be included in release 8.0.0
Comment 7 Jeremy Boynes 2013-12-24 15:36:55 UTC
The patch for this has been reverted from trunk
Comment 8 Mark Thomas 2014-09-02 15:13:54 UTC
The new RFC6265 cookie parser (that also includes a new RFC2109 parser) correctly handles these values. I don't propose fixing the old parser.