Bug 65137 - Invalid chunk encoding in the tomcat answer
Summary: Invalid chunk encoding in the tomcat answer
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 8.5.63
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2021-02-11 16:18 UTC by barmand
Modified: 2021-02-19 17:24 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description barmand 2021-02-11 16:18:27 UTC

When tomcat is sending a huge answer (which is thus chunk encoded), the answer may become invalid at some point if the client reads this answer slowly.

- create a small jsp that output a lot of data:
StringBuilder s = new StringBuilder();
for( int i = 0; i < 6300000; i++ )
//for( int i = 0; i < 630; i++ )
  s.append("<p>line ");
- start tomcat (default config)
- get the answer with a limited bandwith (no issue otherwise): curl --limit-rate 70k http://localhost:8080/test.jsp -o /dev/null
- after some time (around 2-3min), an error is raised by curl:
curl: (56) Malformed encoding found in chunked-encoding

Regarding the network capture for this issue, it seems that tomcat restart the sending of a chunk while the same chunk was being sent (and was nearly finished). The next chunk seems to be sent correctly (but the answer is no longer valid at this point).

I can reproduce on Tomcat 9 too (but not on tomcat 7). I can reproduce on debian 10 (with the default openjdk 11) and debian 9 (with the default openjdk 8).

Comment 1 Mark Thomas 2021-02-18 09:37:01 UTC
Can you confirm which HTTP connector you used for this test. I am going to assume NIO (the default) but confirmation would be helpful.
Comment 2 barmand 2021-02-18 09:58:42 UTC

Thanks for the answer.

I use the Nio connector (and I can reproduce on tomcat 7 with the nio connector too). I can reproduce on tomcat 8.5 with the apr and the nio2 connector too.

I did have time to investigate a little: it seems that the timeout detection in NioBlockingSelector.write() is quite unreliable. MBs of data will be written on the socket initially, after some point, the code may have to wait more than 20s to write 8192 bytes (default chunk size), we will get a timeout even if the socket has still its buffer quite full and data are still be sent

What I have tried:
1 - increase the timeout. It works but the timeout is used for other things and it does not seem a very good idea to change it.

2 - reduce socket.txBufSize to 8191, it forces smaller writes in NioBlockingSelector.write(), time is updated more often, the timeout does not trigger.

It seems to work correctly on debian 10 (openjdk 11).

- this has no effect on openjdk 8 (debian 9; MBs of data are still accepted by the socket even if the SO_SNDBUF is under 8192bytes).
- there is performance degradation (around 10% for 100MB reply).

3 - I have tried to tweak the code so that it wait a little more before writing data to the socket:
keycount is initially set to 0 (instead of 1)
I replaced the following code:
if (cnt > 0) {
  time = System.currentTimeMillis(); //reset our timeout timer
  continue; //we successfully wrote, try again without a selector
if (cnt > 0) {
  if (!buf.hasRemaining())
  time = System.currentTimeMillis(); //reset our timeout timer
(the continue has been removed)
The behavior is quite the same as 2/: it works on openjdk11 but not on openjdk8. I did not test the perf (I would bet it is less than 10%).

Comment 3 Remy Maucherat 2021-02-18 10:05:55 UTC
Got it. In the case of NIO2, I'm afraid it is expected known behavior however. Increase the timeout or reduce the buffer (which is what you are doing).
Comment 4 Mark Thomas 2021-02-19 15:39:29 UTC
The connection is going to get closed due to the write timeout regardless. The best we can do here is avoid the additional corruption at the end of the truncated response.

Tomcat is at the mercy of the JVM and the OS here. If the OS/JVM takes longer than a write timeout to allow Tomcat to write a chunk of the output (typically 8k) then the timeout is going to happen.

You might also want to experiment with setting socket.txBufSizebut do not the warnings in the docs about using too small a value.

Fixed in 10.0.x for 10.0.3 onwards where NIO and APR/Native were impacted.
Back-ports for 9.0.x and 8.5.x to follow.
Comment 5 Mark Thomas 2021-02-19 17:24:06 UTC
Fixed in:
- 9.0.x for 9.0.44 onwards
- 8.5.x for 8.5.64 onwards