... unlike the HTTP/2 connector which complains: Caused by: java.lang.IllegalArgumentException: The Unicode character [Б] at code point [1,041] cannot be encoded as it is outside the permitted range of 0 to 255. at org.apache.coyote.http2.HPackHuffman.encode(HPackHuffman.java:452) at org.apache.coyote.http2.HpackEncoder.writeHuffmanEncodableValue(HpackEncoder.java:229) at org.apache.coyote.http2.HpackEncoder.encode(HpackEncoder.java:191) at org.apache.coyote.http2.Http2UpgradeHandler.doWriteHeaders(Http2UpgradeHandler.java:727) at org.apache.coyote.http2.Http2UpgradeHandler.writeHeaders(Http2UpgradeHandler.java:680) at org.apache.coyote.http2.Stream.writeHeaders(Stream.java:466) at org.apache.coyote.http2.StreamProcessor.prepareResponse(StreamProcessor.java:151) at org.apache.coyote.AbstractProcessor.action(AbstractProcessor.java:379) at org.apache.coyote.Response.action(Response.java:211) at org.apache.coyote.Response.sendHeaders(Response.java:440) at org.apache.coyote.http2.Http2OutputBuffer.doWrite(Http2OutputBuffer.java:57) at org.apache.coyote.Response.doWrite(Response.java:615) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:340) at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:784) at org.apache.catalina.connector.OutputBuffer.append(OutputBuffer.java:689) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:388) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:366) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:96) It would be nice for the HTTP/1 code to do the same as I would have caught a bug with my tests rather than debugging in production. :D
Which header was this?
Well, in my case it was the `ETag` header but I believe it's illegal to send non-ASCII characters in any header.
Generally, Tomcat doesn't validate response headers unless it needs to process them for some reason. Applications are expected to set valid data. This also gives applications the flexibility to bend (or even break) the HTTP specs which can sometimes be useful when dealing with clients that don't follow the specs. HTTP headers values are not restricted to US-ASCII. Many - including ETag - allow obs-text which is bytes in the range 0x80 to 0xFF. That said, RFC 9110 has a strong preference for US-ASCII. I suspect that requirement may get stronger over time. To add to the "fun" some cookies - despite RFC 6265 and the HTTP RFCs - have been observed to use UTF-8 values. And then there is RFC 8187 that I don't think I have ever seen in real world usage. So, in short, things are not at all clear cut. For HTTP/2, it will depend whether the implementation decides to use Huffman encoding or not. The specification doesn't define when to use Huffman and when not. You could argue that this imposes a requirement that the characters in header names and values must fall in the range 0x00 to 0xFF (other requirements limit this further). Which is stricter than HTTP/1.1. My thinking at this point was whether or not it was practical to add a similar limit to HTTP/1.1. Reviewing the code, I think it is. Only cookie headers are treated as UTF-8. All other headers are treated as ISO-8859-1. If you try passing in a String that uses characters above code point 255, they will get corrupted. On that basis, I think it is better to trigger an error early rather than passing corrupted data to the client. Triggering an error in the form of an exception problematic. Applications that appear to work at the moment would start failing once this change was applied. I think a reasonable compromise for HTTP/1.1 would be to log a warning (including the problematic String) and ignore the header. Thoughts?
Logging a warning (or even logging at the ERROR level) sounds great to me. Everyone should be monitoring their logs for warnings and errors so this should be visible to most. And would save people time as they won't have to debug to try to figure out what their problem is. Thanks!
Fixed in 10.1.x for 10.1.0-M18 onwards I intend to delay back-porting for a few releases in case the changes to MessageBytes triggers regressions.
When will it be backported to v8.5, 9 and 10.0?
The first approved release with this change was 10.1.0. My current thinking is to allow another 2 to 3 releases before back-porting.
*** Bug 65802 has been marked as a duplicate of this bug. ***
Back-ported to: - 9.0.x for 9.0.71 onwards - 8.5.x for 8.5.85 onwards
> And then there is RFC 8187 that I don't think I have ever seen in real world usage. Content-Disposition. Trust me.
Looking over the thread here, it sounded like the path forward was to log, and potentially drop the header for Http1.1 However that same change wasnt done for the AJP Processor, causing invalid headers to suddenly break upon tomcat upgrade (discover in our 8.5.x usage) java.lang.IllegalArgumentException: The Unicode character [–] at code point [8,211] cannot be encoded as it is outside the permitted range of 0 to 255 at org.apache.tomcat.util.buf.MessageBytes.toBytesSimple(MessageBytes.java:290) at org.apache.tomcat.util.buf.MessageBytes.toBytes(MessageBytes.java:261) at org.apache.coyote.ajp.AjpMessage.appendBytes(AjpMessage.java:172) at org.apache.coyote.ajp.AjpProcessor.prepareResponse(AjpProcessor.java:1121) at org.apache.coyote.ajp.AjpProcessor$SocketOutputBuffer.doWrite(AjpProcessor.java:1511) at org.apache.coyote.Response.doWrite(Response.java:602) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:356) at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:846) at org.apache.catalina.connector.OutputBuffer.realWriteChars(OutputBuffer.java:470) at org.apache.catalina.connector.OutputBuffer.flushCharBuffer(OutputBuffer.java:851) at org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:250) at org.apache.catalina.connector.CoyoteWriter.close(CoyoteWriter.java:107) Is there any chance the same handling for HTTP 1.1 could apply to AJP?
Re-opening for visibility
See bug 66512 for the AJP aspect of this.