The HTML5 specification is specifying that cookie values may contain characters that are not part of US-ASCII or ISO-8859-1 and that those codepoints should be UTF-8 encoded for display. http://www.w3.org/html/wg/drafts/html/master/single-page.html#cookie This will result in 8-bit high values in cookies that need to be accepted and set. This will also require special encoding to handle conversion to the UCS-16 characters used by the Java String used to represent the value in the Cookie class.
This will be available in 8.0.15 onwards via the Rfc6265CookieProcessor.
Re-opening as the unit test didn't cover the end to end process are there are still some issues to resolve.
(In reply to Jeremy Boynes from comment #0) > The HTML5 specification is specifying that cookie values may contain > characters that are not part of US-ASCII or ISO-8859-1 and that those > codepoints should be UTF-8 encoded for display. > > http://www.w3.org/html/wg/drafts/html/master/single-page.html#cookie > What is the exact wording? The above link is broken - there is no "cookie" anchor in the current version of that document. All I see are references to [COOKIES] document (#refsCOOKIES anchor) = RFC 6265. http://tools.ietf.org/html/rfc6265 RFC 6265 does not allow non-ascii characters in cookie value in Set-Cookie header. Citing from its Chapter 4.1.1. Set-Cookie / Syntax, set-cookie-header = "Set-Cookie:" SP set-cookie-string set-cookie-string = cookie-pair *( ";" SP cookie-av ) cookie-pair = cookie-name "=" cookie-value cookie-name = token cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE ) cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E ; US-ASCII characters excluding CTLs, ; whitespace DQUOTE, comma, semicolon, ; and backslash The cookie-value is limited to US-ASCII, even when quoted. At the same time, attributes (cookie-av) do not have such limitation and as such may be UTF-8: path-av = "Path=" path-value path-value = <any CHAR except CTLs or ";"> For reference, the place where UTF-8 is mentioned in RFC 6265 is in chapter 5.4. The Cookie Header. Citing: NOTE: Despite its name, the cookie-string is actually a sequence of octets, not a sequence of characters. To convert the cookie-string (or components thereof) into a sequence of characters (e.g., for presentation to the user), the user agent might wish to try using the UTF-8 character encoding [RFC3629] to decode the octet sequence. This decoding might fail, however, because not every sequence of octets is valid UTF-8.
(In reply to Konstantin Kolinko from comment #3) > The cookie-value is limited to US-ASCII, even when quoted. Agreed. > At the same time, attributes (cookie-av) do not have such limitation and as > such may be UTF-8: > > path-av = "Path=" path-value > path-value = <any CHAR except CTLs or ";"> Nope. CHAR is limited to USASCII. See the definition in section 2.2 of RFC 6265.
Here is a patch that adds support for sending HTTP headers in character sets other than ISO-8859-1 and then uses that support for sending Set-Cookie headers. Both AJP and HTTP needed changes SPDY didn't as it already used the approach the patch uses. I still have some work to do to restore the filtering of CTLs. http://people.apache.org/~markt/patches/2014-10-02-bug55951-tc8-v1.patch
This is the completed patch: http://people.apache.org/~markt/patches/2014-10-06-bug55951-tc8-v2.patch I'll give folks a day or so to review and comment and then commit it.
This has now been fixed in 8.0.x for 8.0.15 onwards.