Bug 42689

Summary: no way to timeout new connect attempts for replication sockets
Product: Tomcat 5 Reporter: Casey Lucas <clucas>
Component: Catalina:ClusterAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 5.5.23   
Target Milestone: ---   
Hardware: Sun   
OS: Solaris   
Attachments: diff to use Socket.connect with timeout parameter

Description Casey Lucas 2007-06-18 06:46:37 UTC
While testing clustering in our lab we noticed that when connectivity to one of
the cluster members was lost by pulling the network cable (serving replication
traffic), the entire cluster would become unresponsive.  We were pulling the
network cable to simulate catastrophic switch port failure or interface failure.
We were testing under load, using synchronous replication.  We found that
existing replication sockets would honor our timeout (ackTimeout)
configurations, but new connections established because of pool growth or
retries would not timeout socket connect attempts.  Because of not having a
timeout, requests would backlog and effectively bring the cluster down.

Theoretically, this connection establishment problem exists for all users of the
DataSender class.
Comment 1 Casey Lucas 2007-06-18 06:50:37 UTC
Created attachment 20366 [details]
diff to use Socket.connect with timeout parameter

Our fix was to change DataSender.createSocket to use the ackTimeout for
connection establishment.  This fix will only work with jdk 1.4 or higher.
Comment 2 Peter Rossbach 2007-06-20 05:58:00 UTC
Thanks for the report. This has been fixed in svn and will be in 5.5.25.
Peter