Bug 63003

Summary: Tomcat is closing async connection prematurely when shutdown is initiated.
Product: Tomcat 7 Reporter: i.mochurad
Component: ConnectorsAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 7.0.68   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Proposed fix for Tomcat 9.0.x v1

Description i.mochurad 2018-12-11 17:27:50 UTC
I am heavily using asynchronous API in my application, which is running on Tomcat 7. One of the requirements that I have -- permit all requests that are currently in flight to complete their work before termination.

I have unloadDelay set in my context.xml:
<Context unloadDelay="60000">

Also I have the following properties set for each of my connectors in server.xml:
               asyncTimeout="40000"
               executorTerminationTimeoutMillis="60000"

I have thrown a 10 seconds sleep into one of my services and I place a curl request against my server, then immediately I request a shutdown of a tomcat service.

Curl returns with:
* Recv failure: Connection reset by peer
* Closing connection 1
curl: (56) Recv failure: Connection reset by peer

And in the catalina.out I see the following output:

Dec 11, 2018 5:14:46 PM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler ["http-nio-8443"]
Dec 11, 2018 5:14:46 PM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler ["http-nio-9082"]
Dec 11, 2018 5:14:46 PM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler ["http-nio-9080"]
Dec 11, 2018 5:14:46 PM org.apache.catalina.core.StandardService stopInternal
INFO: Stopping service Catalina
Dec 11, 2018 5:14:46 PM org.apache.catalina.core.StandardWrapper unload
INFO: Waiting for 1 instance(s) to be deallocated for Servlet [mvc-dispatcher]
Dec 11, 2018 5:15:07 PM org.apache.catalina.connector.CoyoteAdapter checkRecycled
INFO: Encountered a non-recycled request and recycled it forcedly.
org.apache.catalina.connector.CoyoteAdapter$RecycleRequiredException
	at org.apache.catalina.connector.CoyoteAdapter.checkRecycled(CoyoteAdapter.java:608)
	at org.apache.coyote.http11.AbstractHttp11Processor.recycle(AbstractHttp11Processor.java:1811)
	at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.release(Http11NioProtocol.java:218)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:708)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1761)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1720)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

Dec 11, 2018 5:15:27 PM org.apache.catalina.loader.WebappClassLoaderBase clearReferencesThreads
SEVERE: The web application [] appears to have started a thread named [FileWatchdog] but has failed to stop it. This is very likely to create a memory leak.
Dec 11, 2018 5:15:27 PM org.apache.catalina.loader.WebappClassLoaderBase clearReferencesThreads
SEVERE: The web application [] appears to have started a thread named [Thread-8] but has failed to stop it. This is very likely to create a memory leak.
Dec 11, 2018 5:15:27 PM org.apache.catalina.loader.WebappClassLoaderBase clearReferencesThreads
SEVERE: The web application [] appears to have started a thread named [threadDeathWatcher-2-1] but has failed to stop it. This is very likely to create a memory leak.

It seems that unloadDelay and other timeout properties are not being honored.
Comment 1 Mark Thomas 2018-12-17 14:15:03 UTC
It is the unloadDelay that is most relevant in this use case. If that is honoured then there should be enough time for the other timeouts to be honoured. Let me dig into what is going on.
Comment 2 Mark Thomas 2018-12-18 17:32:10 UTC
Created attachment 36335 [details]
Proposed fix for Tomcat 9.0.x v1

This is going to be a little trickier to solve than might be expected. The main cause of the complication is that there isn't a simple mapping between web applications and in-flight async requests.

There are other complications:
- requests can move between sync and async multiple times during their execution
- the unload delay is currently applied to each Servlet in turn

The proposed solution attempts to solve this by:
- providing a mechanism for the timeout task to timeout any async requests for stopped web apps
- tracking the number of in-flight async requests per web application so that the web application stop process can be paused (for up to unloadDelay milliseconds) to allow those asyncRequests to complete

Async requests that continue past the unload delay will be timed out. Given that the web application may have been stopped by that point, the timeout is likely to experience exceptions.

I'm providing this patch to solicit feedback. I'll probably come back to this after the holidays.

My current expectation is that this would be back-ported to 7.0.x and 8.5.x.
Comment 3 Mark Thomas 2019-01-23 21:09:31 UTC
Fixed in:
- trunk for 9.0.15 onwards
- 8.5.x for 8.5.38 onwards
- 7.0.x for 7.0.93 onwards

Note: The 7.0.x fix required rather more substantial changes than 8.5.x/9.0.x because it doesn't have the refactoring present in 8.5.x onwards. The community may decide that the changes are too much for 7.0.x.