37529 – Tomcat take a long time to stop when configured for clustering

Bug 37529 - Tomcat take a long time to stop when configured for clustering

Summary: Tomcat take a long time to stop when configured for clustering

Status:	RESOLVED FIXED

Alias:	None

Product:	Tomcat 5
Classification:	Unclassified
Component:	Catalina:Cluster (show other bugs)
Version:	5.5.12
Hardware:	PC Linux

Importance:	P2 normal (vote)
Target Milestone:	---
Assignee:	Tomcat Developers Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-11-16 18:21 UTC by Chris Walker
Modified:	2005-11-28 13:05 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Chris Walker 2005-11-16 18:21:58 UTC

On my Linux system (Fedore Core 3) running Java 1.4.2_06, when I configure
tomcat to use SimpleTcpCluster, then when I try to shutdown tomcat, the shutdown
take a long time to complete and finally stops with an error as show in this log
excerpt:

2005-11-15 13:48:44,202 INFO  Pausing Coyote HTTP/1.1 on http-8888
2005-11-15 13:48:44,202 INFO  Pausing Coyote HTTP/1.1 on http-8444
2005-11-15 13:48:45,205 INFO  Stopping service Catalina
2005-11-15 13:48:45,206 INFO  Manager [/flexnet] expiring sessions upon shutdown
2005-11-15 13:48:45,781 INFO  Stopped ClusterSender at cluster
Catalina:type=Cluster,host=localhost with name
Catalina:type=ClusterSender,host=localhost
2005-11-15 13:50:50,440 INFO  Stopping Coyote HTTP/1.1 on http-8888
2005-11-15 13:50:50,440 INFO  Stopping Coyote HTTP/1.1 on http-8444
2005-11-15 13:50:50,448 ERROR Unable to process request in ReplicationListener
java.nio.channels.ClosedSelectorException
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:55)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:70)
        at
org.apache.catalina.cluster.tcp.ReplicationListener.listen(ReplicationListener.java:130)
        at
org.apache.catalina.cluster.tcp.ClusterReceiverBase.run(ClusterReceiverBase.java:394)
        at java.lang.Thread.run(Thread.java:534)
2005-11-15 13:50:50,472 ERROR Unable to start cluster listener.
java.lang.NullPointerException
        at
org.apache.catalina.cluster.tcp.ReplicationListener.listen(ReplicationListener.java:182)
        at
org.apache.catalina.cluster.tcp.ClusterReceiverBase.run(ClusterReceiverBase.java:394)
        at java.lang.Thread.run(Thread.java:534)

Notice the long delay between stopping the ClusterSender and stopping Coyote.

This is apparently caused by a bug in Java 1.4.2 with closing a Selector when
there are selects active on it.

There is a simple fix to
org.apache.catalina.cluster.tcp.ReplicationListener.stopListening:

--- ReplicationListener.java    2005-11-16 09:02:50.055300180 -0800
+++ ReplicationListener.java.fix        2005-11-16 09:02:45.017605588 -0800
@@ -187,8 +187,11 @@
      * @see org.apache.catalina.cluster.tcp.ClusterReceiverBase#stopListening()
     */
     protected void stopListening(){
+        doListen = false;
         if ( selector != null ) {
             try {
+                for ( int i = 0; i < getTcpThreadCount(); i++ )
+                    selector.wakeup();
                 selector.close();
             } catch ( Exception x ) {
                 log.error("Unable to close cluster receiver selector.",x);
@@ -196,7 +199,6 @@
                 selector = null;
             }
         }
-        doListen = false;
    }

Basically move the 'doListen = false' to the top of the method to avoid a race
condition that causes the Exceptions (selector.listen may be called while the
close is in progress and the selector may be set to null while the listener
threads are still looping.  The loop to call selector.wakeup() once for each
thread before calling selector.close() works around the Java bug with closing
while selects are in progress.

Comment 1 Yoav Shapira 2005-11-28 22:05:53 UTC

Looks like a good catch, thanks for reporting it.