Bug 66196 - HTTP/1 connector doesn't blow-up when HTTP header contains non-ASCII characters
Summary: HTTP/1 connector doesn't blow-up when HTTP header contains non-ASCII characters
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 9.0.65
Hardware: PC Linux
: P2 minor (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
: 65802 (view as bug list)
Depends on:
Reported: 2022-08-02 11:16 UTC by Boris Petrov
Modified: 2023-03-14 20:26 UTC (History)
1 user (show)


Note You need to log in before you can comment on or make changes to this bug.
Description Boris Petrov 2022-08-02 11:16:29 UTC
... unlike the HTTP/2 connector which complains:

Caused by: java.lang.IllegalArgumentException: The Unicode character [Б] at code point [1,041] cannot be encoded as it is outside the permitted range of 0 to 255.
        at org.apache.coyote.http2.HPackHuffman.encode(HPackHuffman.java:452)
        at org.apache.coyote.http2.HpackEncoder.writeHuffmanEncodableValue(HpackEncoder.java:229)
        at org.apache.coyote.http2.HpackEncoder.encode(HpackEncoder.java:191)
        at org.apache.coyote.http2.Http2UpgradeHandler.doWriteHeaders(Http2UpgradeHandler.java:727)
        at org.apache.coyote.http2.Http2UpgradeHandler.writeHeaders(Http2UpgradeHandler.java:680)
        at org.apache.coyote.http2.Stream.writeHeaders(Stream.java:466)
        at org.apache.coyote.http2.StreamProcessor.prepareResponse(StreamProcessor.java:151)
        at org.apache.coyote.AbstractProcessor.action(AbstractProcessor.java:379)
        at org.apache.coyote.Response.action(Response.java:211)
        at org.apache.coyote.Response.sendHeaders(Response.java:440)
        at org.apache.coyote.http2.Http2OutputBuffer.doWrite(Http2OutputBuffer.java:57)
        at org.apache.coyote.Response.doWrite(Response.java:615)
        at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:340)
        at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:784)
        at org.apache.catalina.connector.OutputBuffer.append(OutputBuffer.java:689)
        at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:388)
        at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:366)
        at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:96)

It would be nice for the HTTP/1 code to do the same as I would have caught a bug with my tests rather than debugging in production. :D
Comment 1 Mark Thomas 2022-08-15 10:16:33 UTC
Which header was this?
Comment 2 Boris Petrov 2022-08-15 10:18:44 UTC
Well, in my case it was the `ETag` header but I believe it's illegal to send non-ASCII characters in any header.
Comment 3 Mark Thomas 2022-08-16 00:09:06 UTC
Generally, Tomcat doesn't validate response headers unless it needs to process them for some reason. Applications are expected to set valid data. This also gives applications the flexibility to bend (or even break) the HTTP specs which can sometimes be useful when dealing with clients that don't follow the specs.

HTTP headers values are not restricted to US-ASCII. Many - including ETag - allow obs-text which is bytes in the range 0x80 to 0xFF. That said, RFC 9110 has a strong preference for US-ASCII. I suspect that requirement may get stronger over time.

To add to the "fun" some cookies - despite RFC 6265 and the HTTP RFCs - have been observed to use UTF-8 values.

And then there is RFC 8187 that I don't think I have ever seen in real world usage.

So, in short, things are not at all clear cut.

For HTTP/2, it will depend whether the implementation decides to use Huffman encoding or not. The specification doesn't define when to use Huffman and when not. You could argue that this imposes a requirement that the characters in header names and values must fall in the range 0x00 to 0xFF (other requirements limit this further). Which is stricter than HTTP/1.1.

My thinking at this point was whether or not it was practical to add a similar limit to HTTP/1.1. Reviewing the code, I think it is. Only cookie headers are treated as UTF-8. All other headers are treated as ISO-8859-1. If you try passing in a String that uses characters above code point 255, they will get corrupted. On that basis, I think it is better to trigger an error early rather than passing corrupted data to the client.

Triggering an error in the form of an exception problematic. Applications that appear to work at the moment would start failing once this change was applied. I think a reasonable compromise for HTTP/1.1 would be to log a warning (including the problematic String) and ignore the header.

Comment 4 Boris Petrov 2022-08-16 06:25:40 UTC
Logging a warning (or even logging at the ERROR level) sounds great to me. Everyone should be monitoring their logs for warnings and errors so this should be visible to most. And would save people time as they won't have to debug to try to figure out what their problem is.

Comment 5 Mark Thomas 2022-09-02 11:03:14 UTC
Fixed in 10.1.x for 10.1.0-M18 onwards

I intend to delay back-porting for a few releases in case the changes to MessageBytes triggers regressions.
Comment 6 ttera 2022-10-14 06:55:55 UTC
When will it be backported to v8.5, 9 and 10.0?
Comment 7 Mark Thomas 2022-11-01 09:15:30 UTC
The first approved release with this change was 10.1.0.

My current thinking is to allow another 2 to 3 releases before back-porting.
Comment 8 Mark Thomas 2022-11-25 12:30:58 UTC
*** Bug 65802 has been marked as a duplicate of this bug. ***
Comment 9 Mark Thomas 2023-01-05 15:19:02 UTC
Back-ported to:
-  9.0.x for  9.0.71 onwards
-  8.5.x for  8.5.85 onwards
Comment 10 Julian Reschke 2023-01-05 17:27:03 UTC
> And then there is RFC 8187 that I don't think I have ever seen in real world usage.

Content-Disposition. Trust me.
Comment 11 gabriel.hollies 2023-03-13 17:33:31 UTC
Looking over the thread here, it sounded like the path forward was to log, and potentially drop the header for Http1.1

However that same change wasnt done for the AJP Processor, causing invalid headers to suddenly break upon tomcat upgrade (discover in our 8.5.x usage)

java.lang.IllegalArgumentException: The Unicode character [–] at code point [8,211] cannot be encoded as it is outside the permitted range of 0 to 255
                at org.apache.tomcat.util.buf.MessageBytes.toBytesSimple(MessageBytes.java:290)
                at org.apache.tomcat.util.buf.MessageBytes.toBytes(MessageBytes.java:261)
                at org.apache.coyote.ajp.AjpMessage.appendBytes(AjpMessage.java:172)
                at org.apache.coyote.ajp.AjpProcessor.prepareResponse(AjpProcessor.java:1121)
                at org.apache.coyote.ajp.AjpProcessor$SocketOutputBuffer.doWrite(AjpProcessor.java:1511)
                at org.apache.coyote.Response.doWrite(Response.java:602)
                at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:356)
                at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:846)
                at org.apache.catalina.connector.OutputBuffer.realWriteChars(OutputBuffer.java:470)
                at org.apache.catalina.connector.OutputBuffer.flushCharBuffer(OutputBuffer.java:851)
                at org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:250)
                at org.apache.catalina.connector.CoyoteWriter.close(CoyoteWriter.java:107)

Is there any chance the same handling for HTTP 1.1 could apply to AJP?
Comment 12 Mark Thomas 2023-03-13 17:40:36 UTC
Re-opening for visibility
Comment 13 Mark Thomas 2023-03-14 20:26:58 UTC
See bug 66512 for the AJP aspect of this.