The following codes are in the memberDisappeared method of org.apache.catalina.tribes.tipis.AbstractReplicatedMap. public void memberDisappeared(Member member) { boolean removed = false; synchronized (mapMembers) { removed = (mapMembers.remove(member) != null ); } Iterator i = super.entrySet().iterator(); while (i.hasNext()) { ** omit Relocate of session. ** This means relocate of the session is done every time after member is deleted from mapMembers (The value of removed : regardless of true/false). I think that if the member has already been deleted, the relocate of the session need not be done. This most strongly influencing is stop Tomcat(setting TcpFailureDetector) in Cluster at a high load (A lot of requests are processed at the same time). Above-mentioned case is The relocate of the session is done at all requests where memberDisappeared is detected by TcpFailureDetector. The relocate of the session is a little heavy processing. IMHO, this is not good thing. I made AbstractReplicatedMap's patch. Index: /tomcat6- trunk/java/org/apache/catalina/tribes/tipis/AbstractReplicatedMap.java =================================================================== --- /tomcat6- trunk/java/org/apache/catalina/tribes/tipis/AbstractReplicatedMap.java (revision 577691) +++ /tomcat6- trunk/java/org/apache/catalina/tribes/tipis/AbstractReplicatedMap.java (working copy) @@ -713,6 +713,7 @@ boolean removed = false; synchronized (mapMembers) { removed = (mapMembers.remove(member) != null ); + if (!removed) return; } Iterator i = super.entrySet().iterator(); Regards.
Fixed. Since memberDisappeared is called when any member goes away, not just map members. If you see scenarios where the memberDisappeared is called multiple times with the same member, please let me know, as that should not happen
(In reply to comment #1) > Fixed. Thanks for the correction. > Since memberDisappeared is called when any member goes away, not just map members. > If you see scenarios where the memberDisappeared is called multiple times with > the same member, please let me know, as that should not happen I think that AbstractReplicatedMap.memberDisappeared is called is the following cases. 1:McastServiceImpl$run -> ... -> AbstractReplicatedMap.memberDisappeared 2:GroupChannel.heartbeat() -> ... -> AbstractReplicatedMap.memberDisappeared 3:GroupChannel.send -> ReplicationTransmitter.sendMessage -> ... -> -> ChannelException occurs -> TcpFailureDetector.memberDisappeared -> AbstractReplicatedMap.memberDisappeared Case3 says. When Tomcat in the cluster is downed, the all of thread sending the replication message to downed Tomcat does throw ChannelException and calls TcpFailureDetector.memberDisappeared. TcpFailureDetector.memberDisappeared calls AbstractReplicatedMap.memberDisappeared. If the replication message is sended to downed Tomcat by multiple threads, AbstractReplicatedMap.memberDisappeared is called multiple times with the same member. The following logs are repeatedly output. # log.debug("Member["+member+"] disappeared, but was not present in the map."); # is called. ***********log************* ... Sep 25, 2007 5:27:10 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Received memberDisappeared [org.apache.catalina.tribes.membership.MemberImpl [tcp://XXXXXXXX:4001,XXXXXXXX,4001, alive=22797,id={74 41 4 115 77 -55 69 21 - 68 -127 79 110 -55 45 -36 -45 }, payload={}, command={}, domain={}, ]] message. Will verify. Sep 25, 2007 5:27:10 PM org.apache.catalina.tribes.tipis.AbstractReplicatedMap memberDisappeared FINE: Member[org.apache.catalina.tribes.membership.MemberImpl [tcp://XXXXXXXX:4001,XXXXXXXX,4001, alive=22797,id={74 41 4 115 77 -55 69 21 - 68 -127 79 110 -55 45 -36 -45 }, payload={}, command={}, domain={}, ]] disappeared, but was not present in the map. Sep 25, 2007 5:27:10 PM org.apache.catalina.tribes.tipis.AbstractReplicatedMap replicate SEVERE: Unable to replicate data. org.apache.catalina.tribes.ChannelException: Send failed, attempt:2 max:1; Faulty members:tcp://XXXXXXXX:4001; at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop (ParallelNioSender.java:172) at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage (ParallelNioSender.java:78) at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage (PooledParallelSender.java:53) at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage (ReplicationTransmitter.java:80) at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage (ChannelCoordinator.java:78) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage (ChannelInterceptorBase.java:75) at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage (TcpFailureDetector.java:87) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage (ChannelInterceptorBase.java:75) at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMe ssage(MessageDispatchInterceptor.java:73) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage (ChannelInterceptorBase.java:75) at org.apache.catalina.tribes.group.GroupChannel.send (GroupChannel.java:216) at org.apache.catalina.tribes.group.GroupChannel.send (GroupChannel.java:175) at org.apache.catalina.tribes.tipis.AbstractReplicatedMap.replicate (AbstractReplicatedMap.java:421) at org.apache.catalina.ha.session.BackupManager.requestCompleted (BackupManager.java:131) at org.apache.catalina.ha.tcp.ReplicationValve.send (ReplicationValve.java:548) at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage (ReplicationValve.java:535) at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage (ReplicationValve.java:517) at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage (ReplicationValve.java:428) at org.apache.catalina.ha.tcp.ReplicationValve.invoke (ReplicationValve.java:362) at org.apache.catalina.valves.ErrorReportValve.invoke (ErrorReportValve.java:102) ... omit ... Caused by: java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:525) at org.apache.catalina.tribes.transport.nio.NioSender.process (NioSender.java:88) at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop (ParallelNioSender.java:130) ... 26 more ... Regards.