Bug 63766 - Resource leak: under certain conditions, request objects related to WebSockets are not freed
Summary: Resource leak: under certain conditions, request objects related to WebSocket...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: WebSocket (show other bugs)
Version: 8.5.46
Hardware: Macintosh All
: P2 major (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-24 00:49 UTC by Francis VAN AEKEN
Modified: 2019-09-26 21:00 UTC (History)
0 users



Attachments
Code to reproduce the problem (see description) (185.19 KB, application/zip)
2019-09-24 00:49 UTC, Francis VAN AEKEN
Details
Application code, using embedded Tomcat 8.5.46 (10.41 KB, application/zip)
2019-09-24 21:46 UTC, Francis VAN AEKEN
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Francis VAN AEKEN 2019-09-24 00:49:45 UTC
Created attachment 36794 [details]
Code to reproduce the problem (see description)

Resource leak: under certain conditions, request objects related to WebSockets are not freed

When Tomcat 8.5.38 is setting up a WebSocket (WS) connection with a client (or has just set up the connection - not sure), and then receives a TCP RST on that connection, it is possible that the associated objects are never freed. The objects are of the classes below.

org.apache.tomcat.websocket.server.WsHandshakeRequest
org.apache.catalina.connector.Request
org.apache.coyote.Request
org.apache.coyote.RequestInfo
org.apache.catalina.connector.RequestFacade

We saw this happen in production, and we were able to reproduce this with test code, running against our application, and against an out-of-the-box (OOB) (embedded) Tomcat.

I have attached the stack traces of the two use cases (our application and OOB Tomcat). Interestingly, the stack traces are different.

To reproduce the problem in a test environment, we have modified a TCP proxy to send a RST packet to the server shortly after sending the WebSocket upgrade HTTP request. When opening many WS connections, and having them automatically interrupted with RST packets, after a while a number of objects seem to be stuck in memory (see screenshot requests_objects.png). The objects stay in memory even when the proxy and client are shut down.

Thank you for having a look at this. This failure mode does not happen often, but when it happens, it eventually can bring the JVM down because of memory pressure.

ATTACHMENT

The attachment contains:

tomcat-webserver: an OOB (embedded) Tomcat with a WS endpoint
websockets/tcp-proxy: a TCP proxy, modified to send RST packets - run ProxyMain to start the proxy
websockets/websockets-client: a simple WS client, opening many connections - run SadPath to reproduce the problem
requests_objects.png: a VisualVM screenshot showing stuck objects
web_socket_connection_reset.txt: two stack traces (the first when reproducing the problem with our application, the second when reproducing the problem with tomcat-webserver)
Comment 1 Mark Thomas 2019-09-24 18:36:11 UTC
Please re-test this with the latest 8.5.x release (8.5.46 as I type this) and confirm whether or not this resolves the issue.

If it does not resolve the issue please update the test case and we will investigate.
Comment 2 Francis VAN AEKEN 2019-09-24 21:46:28 UTC
Created attachment 36798 [details]
Application code, using embedded Tomcat 8.5.46
Comment 3 Francis VAN AEKEN 2019-09-24 21:47:19 UTC
We have retested using Tomcat 8.5.46 (new code attached), and could still reproduce the problem, following the steps below.

Start the Tomcat application
Start the proxy
Start the test client (SadPath)
Wait for 20 minutes
Shut down the client
Shut down the proxy
Take a heapdump of the Tomcat application
In the heapdump, observe 100+ instances of each of the classes below

org.apache.coyote.Request
org.apache.catalina.connector.Request
org.apache.coyote.RequestInfo
org.apache.tomcat.websocket.server.WsHandshakeRequest
org.apache.catalina.connector.RequestFacade
Comment 4 Mark Thomas 2019-09-26 19:39:11 UTC
Thanks for the report and the test case. It makes it so much easier to track down and fix bugs when you have this much information to start from.

For the record, it was not what would normally a resource leak. More a failure to clean-up in a timely manner. Given enough (error free) further processing, the objects would have been cleaned up.

Fixed in:
- master for 9.0.27 onwards
- 8.5.x for 8.5.47 onwards
- 7.0.x for 7.0.97 onwards
Comment 5 Francis VAN AEKEN 2019-09-26 21:00:37 UTC
Thank you for addressing this issue so swiftly. We are impressed. Keep up the good work!