Bug 64574 - Some HTTP/1.1 responses are broken (missing headers, empty chunks)
Summary: Some HTTP/1.1 responses are broken (missing headers, empty chunks)
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 8.5.55
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2020-07-02 21:20 UTC by Martin Bonato
Modified: 2020-07-02 22:10 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Martin Bonato 2020-07-02 21:20:44 UTC
We are experiencing a very strange behaviour with our application running on Tomcat 8.5.55, since we changed our frontend reverse proxy from Nginx to Traefik. Traefik access logs show HTTP result code 500 for responses from Tomcat where Tomcat was reporting them with HTTP 200 in its own logs. This was only concerning a few requests per day on our production servers and we where not able to reproduce the effect on test systems. So we started a network capture on the production servers and where able to capture a few broken responses from Tomcat.

It turns out, that Tomcat is producing malformed HTTP responses every now and then. The HTTP headers are missing and the response immediately starts with a chunked response:

GET / HTTP/1.1
Host: ***
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
Accept: image/webp,image/apng,image/*,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
Cookie: JSESSIONID=****
Referer: https://***
Sec-Fetch-Dest: image
Sec-Fetch-Mode: no-cors
Sec-Fetch-Site: same-origin
X-Forwarded-For: ***
X-Forwarded-Host: ***
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Forwarded-Server: ***
X-Real-Ip: ***




We also captured responses where only an empty chunk and no headers were in the response.

Since the issue first appeared with the change from Nginx to Traefik, the most obvious change was that Nginx is using HTTP/1.0 in its requests to Tomcat and Traefik is using HTTP/1.1. So we where suspecting keep-alive and pipelining to trigger the issue. We disabled keep-alive in Traefik and still had the same issues. We found only one bug report on Stackoverflow describing a similar issue: https://stackoverflow.com/questions/53496598/tomcat-occasionally-returns-a-response-without-http-headers. (In the bug report Nginx was used with HTTP/1.1.) But the proposed solution was not appropriate for our situation. 

Code reading and further experimenting showed, that setting the system property org.apache.catalina.connector.RECYCLE_FACADES to 'true' is preventing the issue. The property name is actually misleading since setting it to 'true' is effectively disabling recycling of facade objects: https://github.com/apache/tomcat/blob/8.5.x/java/org/apache/catalina/connector/Connector.java#L171

It seems when facades (especially response' OutputSteam) is recycled there may be some kind of concurrency issue or the code obtaining the OutputStream from the response is not flushing or closing it correctly before the same OutputStream is handed over to another thread in processing the next request. 

We have not yet fully understand what is actually triggering the issue - it may also be in the framework and our own code processing the request - however, we would like to report for other users experiencing the same issue.

Also we are wondering, if recycling the OutputStream has any relevant performance advantage or for what other reason it is done by default. Since it may cause issues not easy to track down, it may be a better idea to disable it by default.

Interestingly it only causes broken responses when HTTP/1.1 is used, but there are differences in the request processing in HTTP11Processor depending on the protocol used.
Comment 1 Mark Thomas 2020-07-02 22:04:43 UTC
If using org.apache.catalina.connector.RECYCLE_FACADES fixes the issue then this is an application issue, not a Tomcat bug.

Somewhere in the application it is retaining a reference to a response object and continuing to use it after the response has been written to the client.

Hopefully, you should be seeing some NPEs in the logs which should help you track down the root cause.

The users list is the place to seek further assistance with this.
Comment 2 Remy Maucherat 2020-07-02 22:10:10 UTC
I agree the RECYCLE_FACADES name for the system property was misleading, and that's why it was changed when it was replaced by the discardFacades Connector attribute.