|Summary:||no way to timeout new connect attempts for replication sockets|
|Product:||Tomcat 5||Reporter:||Casey Lucas <clucas>|
|Component:||Catalina:Cluster||Assignee:||Tomcat Developers Mailing List <dev>|
|Attachments:||diff to use Socket.connect with timeout parameter|
Description Casey Lucas 2007-06-18 06:46:37 UTC
While testing clustering in our lab we noticed that when connectivity to one of the cluster members was lost by pulling the network cable (serving replication traffic), the entire cluster would become unresponsive. We were pulling the network cable to simulate catastrophic switch port failure or interface failure. We were testing under load, using synchronous replication. We found that existing replication sockets would honor our timeout (ackTimeout) configurations, but new connections established because of pool growth or retries would not timeout socket connect attempts. Because of not having a timeout, requests would backlog and effectively bring the cluster down. Theoretically, this connection establishment problem exists for all users of the DataSender class.
Comment 1 Casey Lucas 2007-06-18 06:50:37 UTC
Created attachment 20366 [details] diff to use Socket.connect with timeout parameter Our fix was to change DataSender.createSocket to use the ackTimeout for connection establishment. This fix will only work with jdk 1.4 or higher.
Comment 2 Peter Rossbach 2007-06-20 05:58:00 UTC
Thanks for the report. This has been fixed in svn and will be in 5.5.25. Peter