49051 – Decrease in response by TcpFailureDetector.

Bug 49051 - Decrease in response by TcpFailureDetector.

Summary: Decrease in response by TcpFailureDetector.

Status:	RESOLVED FIXED

Alias:	None

Product:	Tomcat 6
Classification:	Unclassified
Component:	Cluster (show other bugs)
Version:	6.0.26
Hardware:	All All

Importance:	P2 normal (vote)
Target Milestone:	default
Assignee:	Tomcat Developers Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-04-06 09:12 UTC by Keiichi Fujino
Modified:	2010-04-09 08:47 UTC (History)
CC List:	0 users

Attachments
TcpFailureDetector's patch (924 bytes, text/plain) 2010-04-06 09:13 UTC, Keiichi Fujino	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Keiichi Fujino 2010-04-06 09:12:14 UTC

[Configuration]
Cluster configuration.
TcpFailureDetector is used. 
Synchronous replication

ChannelException is thrown when the destination node downs in the session replication.
ChannelException is caught by TcpFailureDetector, 
and verifies the member in TcpFailureDetector#memberDisappeared.

In TcpFailureDetector#memberAlive method, 
the member who failed in replication is checked to see if the member really is down.
Because member already is gone, TcpFailureDetector#memberAlive do the timeout in 1 sec(default 1 sec).
Then, member is removed from membership by membership#removeMember, 
and super.memberDisappeared(member) will be called. 

TcpFailureDetector#memberDisappeared is as follows. 
===
public void memberDisappeared(Member member) {
...skip
    synchronized (membership) {
        //check to see if the member really is gone
        //if the payload is not a shutdown message
        if (shutdown || !memberAlive(member)) {
            //not correct, we need to maintain the map
            membership.removeMember( (MemberImpl) member);
            removeSuspects.remove(member);
            notify = true;
        } else {
            //add the member as suspect
            removeSuspects.put(member, new Long(System.currentTimeMillis()));
        }
    }
...skip
}
===
All threads to wait for the acquisition of the lock of membership call the memberAlive method every time. 
And, the timeout will be done every time in 1 sec. 
As result,
in high-concurrent, decrease in a cruel response may happen.

For instance, 
when 100 threads waiting for the lock of membership, 
the thread to have acquired the lock at the end can not return the response for 100 sec. 

If member has not already existed in membership, TcpFailureDetector#memberAlive method need not be called. 

I made a patch.

Best regards.

Comment 1 Keiichi Fujino 2010-04-06 09:13:50 UTC

Created attachment 25233 [details]
TcpFailureDetector's patch

I made a patch.

Comment 2 Keiichi Fujino 2010-04-06 09:39:29 UTC

Fixed in trunk and proposed for 6.0.x.

Comment 3 Keiichi Fujino 2010-04-09 08:47:12 UTC

This fix applied to 6.0, will be in 6.0.27 onwards.