Bug 62515 - Tomcat should perform a graceful shutdown
Summary: Tomcat should perform a graceful shutdown
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 8.0.x-trunk
Hardware: PC All
: P2 enhancement (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 62562 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-07-02 09:55 UTC by carbattles
Modified: 2018-07-23 08:05 UTC (History)
1 user (show)



Attachments
First attempt at a patch (9.28 KB, patch)
2018-07-05 21:17 UTC, Mark Thomas
Details | Diff
Updated patch based on feedback (7.27 KB, patch)
2018-07-06 07:25 UTC, Mark Thomas
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description carbattles 2018-07-02 09:55:18 UTC
Tomcat does not shutdown as other Java servers do.

Usual java server shutdown practice:
1. Stop listening on inbound port
2. Finish executing all existing requests based on X timeout
3. Shutdown server

Tomcat does this:
1. Finish executing all existing requests based on unloaddelay timeout
2. Stop listening on inbound port
3. Shutdown server

Due to tomcat still listening on inbound port it will accept requests to the end, which are then terminated, as they do not have sufficient time to finish, regardless of the value set for unloaddelay attribute.

Furthermore, it is impossible for any loadbalancer, reverse proxy, etc upstream to know that the tomcat is shutting down. For other java servers, because the inbound port is closed, the upstream cannot connect and behaves according to its purpose & configuration (for example send new requests to another server).

So in conclusion, by stopping the inbound port first thing, we can ensure existing request completing successfully as well as allowing upstream components to know that the server is not available for new requests.

To produce this problem, you need a simple servlet with a long response time. Invoke it, request shutdown of server, invoke it again.
Depending on unloaddelay value, either both requests will fail or the last request will fail.
Comment 1 Mark Thomas 2018-07-05 13:10:07 UTC
There are a couple of points it is worth clarifying before starting a discussion on any potential changes.

Currently, the point where the socket is bound and unbound is controlled by bindOnInit. By default, the socket is bound when the connector is initialised and unbound when it is destroyed.

On stop, the connector is first paused. Any new connections from that point will sit in the TCP backlog. Shutdown continues with the web applications being stopped followed by the host and engine. Then the connectors are stopped. Everything is then destroyed (same order). The connections in the backlog will be dropped when the server socket is unbound.

The behaviour reported in the description is consistent with the current implementation.

Effectively, what is being requested is that the server socket is unbound earlier in the process.

Some investigation is required to see what might be possible.
Comment 2 Mark Thomas 2018-07-05 21:17:09 UTC
Created attachment 36012 [details]
First attempt at a patch

The attached patch attempts to address this problem by closing the server socket at the start of the service stop() sequence if bindOnInit is set to false. Local testing has been positive. Feedback welcome.
Comment 3 Remy Maucherat 2018-07-05 21:33:42 UTC
+1

Maybe you can avoid the bindoninit get property hack by having AbstractEndpoint.closeServerSocket call an abstract doCloseServerSocket method only if bindoninit is false.
Comment 4 Mark Thomas 2018-07-05 22:02:19 UTC
Good idea. I'll take a look at that and see what can be done.
Comment 5 Mark Thomas 2018-07-06 07:25:59 UTC
Created attachment 36013 [details]
Updated patch based on feedback

An updated patch using Rémy's suggested approach.
Comment 6 Remy Maucherat 2018-07-06 11:18:37 UTC
Also +1, you should commit it as it's a useful change.

But then we'll probably refactor it, since maybe this should simply go into pause/resume (resume could reopen the server socket ?). I'm pretty sure the bindState is inconsistent as well and new states could be added (?).
Comment 7 Remy Maucherat 2018-07-06 11:20:48 UTC
(In reply to Remy Maucherat from comment #6)
> But then we'll probably refactor it, since maybe this should simply go into
> pause/resume (resume could reopen the server socket ?). I'm pretty sure the
> bindState is inconsistent as well and new states could be added (?).

Bad future refactoring idea anyway, pause/resume is not on the right class.
Comment 8 Mark Thomas 2018-07-06 20:25:50 UTC
Good catch on bindState. I've tweaked that part and also fixed a bug in the NIO handling.

I don't think this can be added as part of pause() and resume() as my understanding of those methods is that they are meant to pause processing but not close the socket - i.e. let stuff build up in the TCP backlog.

I have a few final checks to run and then I should be bale to commit this.
Comment 9 Mark Thomas 2018-07-09 12:05:01 UTC
Fixed in:
- trunk for 9.0.11 onwards
- 8.5.x for 8.5.33 onwards

Note that EOL for 8.0.x was 30 June 2018 so no further updates are expected for 8.0.x.
Comment 10 Mark Thomas 2018-07-23 08:05:21 UTC
*** Bug 62562 has been marked as a duplicate of this bug. ***