Bug 55951 - HTML5 specifies UTF-8 encoding for cookie values
HTML5 specifies UTF-8 encoding for cookie values
Status: RESOLVED FIXED
Product: Tomcat 8
Classification: Unclassified
Component: Connectors
trunk
All All
: P2 enhancement (vote)
: ----
Assigned To: Tomcat Developers Mailing List
:
Depends on: 55917
Blocks:
  Show dependency tree
 
Reported: 2014-01-04 21:34 UTC by Jeremy Boynes
Modified: 2014-10-10 14:29 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Boynes 2014-01-04 21:34:01 UTC
The HTML5 specification is specifying that cookie values may contain characters that are not part of US-ASCII or ISO-8859-1 and that those codepoints should be UTF-8 encoded for display.

http://www.w3.org/html/wg/drafts/html/master/single-page.html#cookie

This will result in 8-bit high values in cookies that need to be accepted and set.

This will also require special encoding to handle conversion to the UCS-16 characters used by the Java String used to represent the value in the Cookie class.
Comment 1 Mark Thomas 2014-10-02 11:37:18 UTC
This will be available in 8.0.15 onwards via the Rfc6265CookieProcessor.
Comment 2 Mark Thomas 2014-10-02 12:47:02 UTC
Re-opening as the unit test didn't cover the end to end process are there are still some issues to resolve.
Comment 3 Konstantin Kolinko 2014-10-02 14:33:20 UTC
(In reply to Jeremy Boynes from comment #0)
> The HTML5 specification is specifying that cookie values may contain
> characters that are not part of US-ASCII or ISO-8859-1 and that those
> codepoints should be UTF-8 encoded for display.
> 
> http://www.w3.org/html/wg/drafts/html/master/single-page.html#cookie
>

What is the exact wording?

The above link is broken - there is no "cookie" anchor in the current version of that document. All I see are references to [COOKIES] document (#refsCOOKIES anchor) = RFC 6265.

http://tools.ietf.org/html/rfc6265


RFC 6265 does not allow non-ascii characters in cookie value in Set-Cookie header. Citing from its Chapter 4.1.1. Set-Cookie / Syntax,

 set-cookie-header = "Set-Cookie:" SP set-cookie-string
 set-cookie-string = cookie-pair *( ";" SP cookie-av )
 cookie-pair       = cookie-name "=" cookie-value
 cookie-name       = token
 cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
 cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                       ; US-ASCII characters excluding CTLs,
                       ; whitespace DQUOTE, comma, semicolon,
                       ; and backslash

The cookie-value is limited to US-ASCII, even when quoted.

At the same time, attributes (cookie-av) do not have such limitation and as such may be UTF-8:

 path-av           = "Path=" path-value
 path-value        = <any CHAR except CTLs or ";">


For reference, the place where UTF-8 is mentioned in RFC 6265 is in chapter 5.4. The Cookie Header. Citing:

   NOTE: Despite its name, the cookie-string is actually a sequence of
   octets, not a sequence of characters.  To convert the cookie-string
   (or components thereof) into a sequence of characters (e.g., for
   presentation to the user), the user agent might wish to try using the
   UTF-8 character encoding [RFC3629] to decode the octet sequence.
   This decoding might fail, however, because not every sequence of
   octets is valid UTF-8.
Comment 4 Mark Thomas 2014-10-02 14:43:47 UTC
(In reply to Konstantin Kolinko from comment #3)
> The cookie-value is limited to US-ASCII, even when quoted.
Agreed.

> At the same time, attributes (cookie-av) do not have such limitation and as
> such may be UTF-8:
> 
>  path-av           = "Path=" path-value
>  path-value        = <any CHAR except CTLs or ";">

Nope. CHAR is limited to USASCII. See the definition in section 2.2 of RFC 6265.
Comment 5 Mark Thomas 2014-10-02 15:42:32 UTC
Here is a patch that adds support for sending HTTP headers in character sets other than ISO-8859-1 and then uses that support for sending Set-Cookie headers.

Both AJP and HTTP needed changes SPDY didn't as it already used the approach the patch uses.

I still have some work to do to restore the filtering of CTLs.

http://people.apache.org/~markt/patches/2014-10-02-bug55951-tc8-v1.patch
Comment 6 Mark Thomas 2014-10-07 08:28:52 UTC
This is the completed patch:
http://people.apache.org/~markt/patches/2014-10-06-bug55951-tc8-v2.patch

I'll give folks a day or so to review and comment and then commit it.
Comment 7 Mark Thomas 2014-10-10 14:29:19 UTC
This has now been fixed in 8.0.x for 8.0.15 onwards.