|Summary:||Bug in Tomcat HTTP2 POST - with big payload|
|Product:||Tomcat 9||Reporter:||Lukasz Z <lz.zukowski>|
|Component:||Catalina||Assignee:||Tomcat Developers Mailing List <dev>|
|Attachments:||debug log and exception|
Description Lukasz Z 2020-03-03 11:48:46 UTC
Created attachment 37056 [details] debug log and exception I found a probably bug with http2 handling in tomcat. When I sending HTTP2 POST request with big payload I see below exception. This request is OK when I use Undertow for example or Tomcat with HTTP1.1 I prepared test where you can recreate this problem. Here is link https://github.com/zulk666/demo-http2 To start the demo you have to run com.example.demo.DemoApplication first then run test DemoApplicationTests. Exception which I received is org.apache.coyote.http2.ConnectionException: Invalid frame type [HEADERS] I tested also with curl, Chrome and Firefox and result is the same as in test.
Comment 1 Mark Thomas 2020-03-03 17:39:45 UTC
If you set useAsyncIO="false" on the Connector this should start working. We need to look into where things are going wrong in the asynIO code. I've narrowed it down to reading one less byte than is actually present. This triggers an off-by-one error when reading the next frame which causes it to be read as a HEADERS frame rather than a SETIINGS frame.
Comment 2 Remy Maucherat 2020-03-03 20:26:17 UTC
I didn't expect a real problem so I hadn't really investigated, so thanks for finding that out. That odd 16383 payload size in the logs is hard to miss. That's where that off by one would occur, but I haven't found out why.
Comment 3 Mark Thomas 2020-03-03 21:55:58 UTC
I think that might be a red herring. I think it is an artefact of client buffering. I've seen similar behaviour with Chrome. I'm currently looking at SecureNioChannel.read(ByteBuffer,int,int). I've been adding various debug logging and I think I have tracked it down to this method. Still looking for a root cause.
Comment 4 Mark Thomas 2020-03-03 22:23:30 UTC
There is a bug there (in the number of bytes returned) but not the root cause of this issue. I have a quick fix for SecureNioChannel that I'll look at refining once I've tracked down the root cause.
Comment 5 Remy Maucherat 2020-03-04 11:17:13 UTC
(In reply to Mark Thomas from comment #4) > There is a bug there (in the number of bytes returned) but not the root > cause of this issue. I have a quick fix for SecureNioChannel that I'll look > at refining once I've tracked down the root cause. I added a test, and I cannot reproduce the discrepancy SecureNioChannel.read using it (the read count returned matches the amount of data that is added into the buffers). My test also fails to reproduce the POST issue described so far, despite appearing identical. I do get some flow control errors though (non async, mostly). So not a very good result so far ...
Comment 6 Mark Thomas 2020-03-04 11:36:53 UTC
I think I have got to the bottom of this. It relates to reading HTTP/2 frames using asyncIO over TLS. TLS sometimes needs to decrypt more data that the caller requests. This is because data is decrypted in 'blocks' and the call may only have requested part of the next block. This excess data is placed in the read buffer so it is available for the next read. HTTP/2 async reads HTTP/2 frames using a gathering read. It uses one ByteBuffer for the 9 byte header and another for the frame payload. At this point the size of the frame payload is unknown so a bufter is used that is big enough for the largest frame. It is possible that more data is read into the payload buffer than is required. This excess data is returned to the read buffer so it is available for the next read. The problem is that we have two components writing to the read buffer and the code that returns this data assumes the read buffer is empty. Consider the following (thinking purely in terms of decrypted data): - There are 4 TLS "blocks" (A, B, C and D) of data in the socket input buffer - HTTP/2 async triggers a read - The payload buffer is big enough to hold all of blocks A and B and half of block C (C1) - The other half of C (C2) is placed into the read buffer - The HTTP/2 parser only needed block A so block B and C1 are returned to the read buffer - The order of data in the read buffer is now "C2, B, C1". It should be "B, C1, C2" In this instance, "B C1" is a single byte - hence why it looked like an off-by-one error. But it isn't. Returning the correct number of bytes for the TLS read is a separate issue. It is also the case that, if the TLS code puts some data into the read buffer, it will try and put as much as possible into the read buffer. I think this needs to change else there is a risk that the amount of data returned by the HTTP/2 parser will be in excess of the space available in the read buffer. I'm working on a fix. I need to be careful as fixing one of these issues may be sufficient to mask the others and I want to try and avoid that.
Comment 7 Mark Thomas 2020-03-04 15:11:15 UTC
Thanks for the test case. Being able to recreate the issue in an environment I can debug was a huge help. Fixed in: - master for 10.0.0-M2 onwards - 9.0.x for 9.0.32 onwards 8.5.x and earlier are not affected. I made the following changes: a) fixed the case of returning data to the read buffer when it was non-empty b) stopped TLS reads once they started to overflow into the readBuffer c) corrected the value returned (bytes read) by the gathering TLS read Fixing either a) or b) was sufficient for the test case to pass because the additional TLS reads from b) were what was causing the read buffer to be non-empty. It may be the case that either a) or b) is sufficient to fix all potential issues but I opted to fix both.
Comment 8 Remy Maucherat 2020-03-04 16:22:14 UTC
Perfect, I did not get anywhere with my test case [I really don't like working with SB on Tomcat dev, so ...], I still don't understand why it failed to reproduce the issue so feel free to improve it as needed. Heaving a heavy HTTP/2 POST test with TLS cannot hurt apparently, that's the kind of issue that will keep coming back otherwise.
Comment 9 Mark Thomas 2020-03-04 16:52:25 UTC
The test fails for me if I remove the fixes I made for this issue so I think all is well. I just made a minor tweak to prevent the test hanging on failure.
Comment 10 Lukasz Z 2020-03-05 07:34:58 UTC
Thanks for quick fix. This error has caused many strange problems for my application since we use http2.