Bug 42689 - no way to timeout new connect attempts for replication sockets
Summary: no way to timeout new connect attempts for replication sockets
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina:Cluster (show other bugs)
Version: 5.5.23
Hardware: Sun Solaris
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2007-06-18 06:46 UTC by Casey Lucas
Modified: 2007-06-20 05:58 UTC (History)
0 users

diff to use Socket.connect with timeout parameter (1001 bytes, patch)
2007-06-18 06:50 UTC, Casey Lucas
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Casey Lucas 2007-06-18 06:46:37 UTC
While testing clustering in our lab we noticed that when connectivity to one of
the cluster members was lost by pulling the network cable (serving replication
traffic), the entire cluster would become unresponsive.  We were pulling the
network cable to simulate catastrophic switch port failure or interface failure.
We were testing under load, using synchronous replication.  We found that
existing replication sockets would honor our timeout (ackTimeout)
configurations, but new connections established because of pool growth or
retries would not timeout socket connect attempts.  Because of not having a
timeout, requests would backlog and effectively bring the cluster down.

Theoretically, this connection establishment problem exists for all users of the
DataSender class.
Comment 1 Casey Lucas 2007-06-18 06:50:37 UTC
Created attachment 20366 [details]
diff to use Socket.connect with timeout parameter

Our fix was to change DataSender.createSocket to use the ackTimeout for
connection establishment.  This fix will only work with jdk 1.4 or higher.
Comment 2 Peter Rossbach 2007-06-20 05:58:00 UTC
Thanks for the report. This has been fixed in svn and will be in 5.5.25.