Bug 43435 - AbstractReplicatedMap.memberDisappeared is executed more than the necessity.
AbstractReplicatedMap.memberDisappeared is executed more than the necessity.
Status: RESOLVED FIXED
Product: Tomcat 6
Classification: Unclassified
Component: Cluster
6.0.14
Other other
: P2 normal (vote)
: default
Assigned To: Tomcat Developers Mailing List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2007-09-20 04:51 UTC by Keiichi Fujino
Modified: 2007-09-25 03:03 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Keiichi Fujino 2007-09-20 04:51:07 UTC
The following codes are in the memberDisappeared method of 
org.apache.catalina.tribes.tipis.AbstractReplicatedMap. 

  public void memberDisappeared(Member member) {
        boolean removed = false;
        synchronized (mapMembers) {
            removed = (mapMembers.remove(member) != null );
        }
        
        Iterator i = super.entrySet().iterator();
        while (i.hasNext()) {
        ** omit Relocate of session. **
        
This means relocate of the session is done every time after 
member is deleted from mapMembers 
(The value of removed : regardless of true/false). 

I think that if the member has already been deleted, 
the relocate of the session need not be done. 

This most strongly influencing is 
stop Tomcat(setting TcpFailureDetector) in Cluster at a high load
(A lot of requests are processed at the same time).

Above-mentioned case is 
The relocate of the session is done at all requests 
where memberDisappeared is detected by TcpFailureDetector. 

The relocate of the session is a little heavy processing. 
IMHO, this is not good thing.

I made AbstractReplicatedMap's patch.

Index: /tomcat6-
trunk/java/org/apache/catalina/tribes/tipis/AbstractReplicatedMap.java
===================================================================
--- /tomcat6-
trunk/java/org/apache/catalina/tribes/tipis/AbstractReplicatedMap.java
	(revision 577691)
+++ /tomcat6-
trunk/java/org/apache/catalina/tribes/tipis/AbstractReplicatedMap.java
	(working copy)
@@ -713,6 +713,7 @@
         boolean removed = false;
         synchronized (mapMembers) {
             removed = (mapMembers.remove(member) != null );
+            if (!removed) return;
         }
         
         Iterator i = super.entrySet().iterator();

Regards.
Comment 1 Filip Hanik 2007-09-21 11:26:36 UTC
Fixed.
Since memberDisappeared is called when any member goes away, not just map members.
If you see scenarios where the memberDisappeared is called multiple times with
the same member, please let me know, as that should not happen
Comment 2 Keiichi Fujino 2007-09-25 03:03:50 UTC
(In reply to comment #1)
> Fixed.

Thanks for the correction.

> Since memberDisappeared is called when any member goes away, not just map 
members.
> If you see scenarios where the memberDisappeared is called multiple times 
with
> the same member, please let me know, as that should not happen

I think that AbstractReplicatedMap.memberDisappeared is called is the 
following cases.

1:McastServiceImpl$run -> ... -> AbstractReplicatedMap.memberDisappeared
2:GroupChannel.heartbeat() -> ... -> AbstractReplicatedMap.memberDisappeared
3:GroupChannel.send  -> ReplicationTransmitter.sendMessage -> ... ->
   -> ChannelException occurs 
   -> TcpFailureDetector.memberDisappeared
   -> AbstractReplicatedMap.memberDisappeared

Case3 says.
When Tomcat in the cluster is downed, 
the all of thread sending the replication message to downed Tomcat does throw 
ChannelException and calls TcpFailureDetector.memberDisappeared.
TcpFailureDetector.memberDisappeared calls 
AbstractReplicatedMap.memberDisappeared.

If the replication message is sended to downed Tomcat by multiple threads, 
AbstractReplicatedMap.memberDisappeared is called multiple times with the same 
member.

The following logs are repeatedly output. 
# log.debug("Member["+member+"] disappeared, but was not present in the map.");
# is called.

***********log*************
...
Sep 25, 2007 5:27:10 PM 
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector 
memberDisappeared
INFO: Received memberDisappeared
[org.apache.catalina.tribes.membership.MemberImpl
[tcp://XXXXXXXX:4001,XXXXXXXX,4001, alive=22797,id={74 41 4 115 77 -55 69 21 -
68 -127 79 110 -55 45 -36 -45 }, payload={}, command={}, domain={}, ]] 
message. Will verify.
Sep 25, 2007 5:27:10 PM org.apache.catalina.tribes.tipis.AbstractReplicatedMap 
memberDisappeared
FINE: Member[org.apache.catalina.tribes.membership.MemberImpl
[tcp://XXXXXXXX:4001,XXXXXXXX,4001, alive=22797,id={74 41 4 115 77 -55 69 21 -
68 -127 79 110 -55 45 -36 -45 }, payload={}, command={}, domain={}, ]] 
disappeared, but was not present in the map.
Sep 25, 2007 5:27:10 PM org.apache.catalina.tribes.tipis.AbstractReplicatedMap 
replicate
SEVERE: Unable to replicate data.
org.apache.catalina.tribes.ChannelException: Send failed, attempt:2 max:1; 
Faulty members:tcp://XXXXXXXX:4001; 
     at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop
(ParallelNioSender.java:172)
     at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage
(ParallelNioSender.java:78)
     at 
org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage
(PooledParallelSender.java:53)
     at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage
(ReplicationTransmitter.java:80)
     at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage
(ChannelCoordinator.java:78)
     at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage
(ChannelInterceptorBase.java:75)
     at 
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage
(TcpFailureDetector.java:87)
     at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage
(ChannelInterceptorBase.java:75)
     at 
org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMe
ssage(MessageDispatchInterceptor.java:73)
     at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage
(ChannelInterceptorBase.java:75)
     at org.apache.catalina.tribes.group.GroupChannel.send
(GroupChannel.java:216)
     at org.apache.catalina.tribes.group.GroupChannel.send
(GroupChannel.java:175)
     at org.apache.catalina.tribes.tipis.AbstractReplicatedMap.replicate
(AbstractReplicatedMap.java:421)
     at org.apache.catalina.ha.session.BackupManager.requestCompleted
(BackupManager.java:131)
     at org.apache.catalina.ha.tcp.ReplicationValve.send
(ReplicationValve.java:548)
     at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage
(ReplicationValve.java:535)
     at 
org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage
(ReplicationValve.java:517)
     at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage
(ReplicationValve.java:428)
     at org.apache.catalina.ha.tcp.ReplicationValve.invoke
(ReplicationValve.java:362)
     at org.apache.catalina.valves.ErrorReportValve.invoke
(ErrorReportValve.java:102)
    ... omit ...
Caused by: java.net.ConnectException: Connection refused: no further 
information
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:525)
     at org.apache.catalina.tribes.transport.nio.NioSender.process
(NioSender.java:88)
     at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop
(ParallelNioSender.java:130)
     ... 26 more
...

Regards.