Bug 66841 - Memory leak from cancelled async http/2 streams
Summary: Memory leak from cancelled async http/2 streams
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 9.0.76
Hardware: PC Linux
: P2 regression (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-03 17:06 UTC by Andrew Murphy
Modified: 2023-09-20 14:25 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Murphy 2023-08-03 17:06:04 UTC
Regression caused by fixed for https://bz.apache.org/bugzilla/show_bug.cgi?id=63816

Have application using Spring server sent events for notifying logged in users of application events. SSE emitters have timeout of 1 day and so are active for the whole user session. Application tests registered SSE emitters at regular intervals to confirm the emitter is still alive and perform application cleanup if the emitter connection has been closed for any reason. Test is done by sending heartbeat message through each SSE emitter. 

If the SSE connection is cancelled from the remote client, the stream for the SSE emitter is set to a state of CLOSED_RST_RX which causes the heartbeat test message to receive a CloseNowException, this is expected based on our understanding of http/2 and tomcat. However the stream does not actually get closed because the asyncStateMachine is not updated and remains in a state of STARTED. This keeps the stream, stream processor, and associated resources from being closed and garbage collected. Streams in this state are only cleaned up when their asyncTimeout expires and the AbstractProtocol's timeoutFuture calls the processor's timeoutAsync method.

The memory used by the closed streams is leaked until they timeout.

The streams appear to be stuck in this state because when the application attempts to send the heartbeat message the Http2UpgradeHandler calls doStreamCancel when reserving the window size because the stream state does not allow writing. doStreamCancel sets the error flag on the stream's coyoteResponse (Stream.java line 275) and returns the CloseNowException. After the CloseNowException is returned the AbstractProcessor attempts to set the error state to CLOSE_NOW (AbstractProcessor line 241) but this fails to update the asyncStateMachine to an ERROR state because the if condition depends on the setError flag being true. But the setError flag can never be true in this scenario because the response error flag had already been set in doStreamCancel. This means the asyncStateMachine is never set to an error state or told to process an ERROR SocketEvent.

boolean setError = response.setError();
        boolean blockIo = this.errorState.isIoAllowed() && !errorState.isIoAllowed();
        this.errorState = this.errorState.getMostSevere(errorState);
        // Don't change the status code for IOException since that is almost
        // certainly a client disconnect in which case it is preferable to keep
        // the original status code http://markmail.org/message/4cxpwmxhtgnrwh7n
        if (response.getStatus() < 400 && !(t instanceof IOException)) {
            response.setStatus(500);
        }
        if (t != null) {
            request.setAttribute(RequestDispatcher.ERROR_EXCEPTION, t);
        }
        if (blockIo && isAsync() && setError) { <- setError is always false
            if (asyncStateMachine.asyncError()) {
                processSocketEvent(SocketEvent.ERROR, true);
            }
        }

We deployed a patched version of AbstractProcessor with the if condition like this:

if (blockIo && isAsync() && !asyncStateMachine.isAsyncError()) {

Our assumption is that whether the asyncStateMachine should process an ERROR event should depend on if the asyncStateMachine is already in an error state not on if the coyote Response is already in an error state.

Connector settings used:
<Connector port="8443" protocol="org.apache.coyote.http11.Http11Nio2Protocol"
               connectionTimeout="20000" maxThreads="150"
               SSLEnabled="true" scheme="https" secure="true" useAsyncIO="false" >
        <UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol" />
        <SSLHostConfig>
			<Certificate certificateKeystoreFile="conf/certs/devkeystore"
						 type="RSA"/>
        </SSLHostConfig>
    </Connector>

Note: issue occurred with NIO and APR protocol as well. useAsyncIO had to be set to false because we encountered a different bug when it was set to true.
Comment 1 Andrew Murphy 2023-08-03 17:16:24 UTC
Did some additional testing to confirm that the registered Spring SSE emitter onError callbacks are not being called due to this bug.
Comment 2 Mark Thomas 2023-08-04 11:05:01 UTC
Thanks of the report and the research. That all makes sense.

While previous refactoring of error handling has improved things, there are still aspects that I don't like and one of them is CoyoteResponse.setError() returning a boolean value. It looks like that is going to have to be addressed as part of the fix for this issue.

My plan is to proceed as follows:
- create a test case for this bug based on the description
- fix this bug by refactoring AsyncStateMachine.asyncError to be a NO-OP for the second and subsequent calls within an async cycle (cycle == generation in the code)

If time allows, I may complete so other refactoring I have in mind for error handling.
Comment 3 Mark Thomas 2023-08-06 10:50:00 UTC
Fixed in:
- 11.0.x for 11.0.0-M10 onwards
- 10.1.x for 10.1.12 onwards
-  9.0.x for  9.0.79 onwards
-  8.5.x for  8.5.92 onwards
Comment 4 Mark Thomas 2023-09-20 14:25:54 UTC
*** Bug 66875 has been marked as a duplicate of this bug. ***