Bug 45261 - Concurrent node failure leads to inconsistent views.
Summary: Concurrent node failure leads to inconsistent views.
Alias: None
Product: Tomcat 6
Classification: Unclassified
Component: Cluster (show other bugs)
Version: 6.0.16
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: default
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2008-06-23 14:13 UTC by Robert Newson
Modified: 2014-02-17 13:56 UTC (History)
0 users

Demonstrate view inconsistency. (848 bytes, text/x-java)
2008-06-23 14:13 UTC, Robert Newson
An alternative coordinator that makes local decisions based on membership service (2.18 KB, text/x-java)
2008-06-26 10:16 UTC, Robert Newson

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Newson 2008-06-23 14:13:00 UTC
Created attachment 22166 [details]
Demonstrate view inconsistency.

In a four node cluster, using NonBlockingCoordinator, if two nodes fail at the same time, the remaining two nodes get different views and never converge.

When the other nodes restart, they never install a view at all.

I've attached the relevant demo code. Run it on 4 machines, wait for view installation, then CTRL-C two of them. The other two will never print the same UniqueId. Start a new node, view is always null.

Immediately after the two node failure, one of the surviving nodes issues this stack trace;

WARN - Member send is failing for:tcp://{-64, -88, -91, 34}:4000 ; Setting to su
spect and retrying.
ERROR - Error processing coordination message. Could be fatal.
org.apache.catalina.tribes.ChannelException: Send failed, attempt:2 max:1; Fault
y members:tcp://{-64, -88, -91, 34}:4000; 
        at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(Par
        at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessag
        at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMes
        at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessa
        at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(Chann
        at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(C
        at org.apache.catalina.tribes.group.interceptors.NonBlockingCoordinator.
Comment 1 Robert Newson 2008-06-25 13:50:40 UTC
So, I understand this better now and have a proposed fix.

Here's the procedure to reproduce the problem.

1) start four nodes.
2) see a view installation with four members.
3) kill two non-coordinator nodes in quick succession (a second or two)

From this point onwards, until it is killed, the coordinator is oscillating between two states. It recognizes that the state is inconsistent as it receives heartbeats from the the other node and the UniqueId's of its view does not match the coordinator. It then forces an election. Which fails as it believes an election is already running. This cycle repeats forever.

When the first node crashed, memberDisappeared() is called on the coordinator. It then starts sending messages as part of an election. A method throws here with a connection timeout (it was attempting to send to the second node, which just crashed). It never handles this case, leaving the 'election in progress' flag on. Forever.

Clearing suggestedViewId when the ChannelException is thrown is the fix;

@@ -500,6 +500,7 @@ public class NonBlockingCoordinator extends ChannelInterceptorBase {
                 processCoordMessage(cmsg, msg.getAddress());
             }catch ( ChannelException x ) {
                 log.error("Error processing coordination message. Could be fatal.",x);
+                suggestedviewId = null;                

this probably should only be done under some circumstances, so this isn't obviously a safe patch. Hopefully the author will have a better fix!

Comment 2 Filip Hanik 2008-06-26 08:05:07 UTC
hi Rob, 
the non blocking coordinator is still work in progress. Its one piece of code that got a bit over complicated once I started developing it, and I think it can be greatly simplified

I will take a look at this beginning of next week

Comment 3 Robert Newson 2008-06-26 08:16:13 UTC
I made my own coordinator which simply uses a sorted list of getMembers() + getLocalMember(), though it only installs views if the membership remains unchanged for a few seconds to avoid a little storm of view changes. Obviously it's a much weaker form of view management than your attempting, but it's probably good enough for my purposes.

Let me know when you get to this, I can test it out.
Comment 4 Robert Newson 2008-06-26 10:16:13 UTC
Created attachment 22179 [details]
An alternative coordinator that makes local decisions based on membership service

Happy to release this class under the Apache License. Let me know what you need from me.
Comment 5 Filip Hanik 2008-06-26 10:38:33 UTC
Just submit a 

and email a scanned copy to 
secretary [at) apache [dot] org
Comment 6 Mark Thomas 2008-10-01 08:46:18 UTC
For a contribution of a single class, the statement in comment #4 is more than enough. No need for a CLA.
Comment 7 Mark Thomas 2008-12-28 16:35:29 UTC
Many thanks for the patch. I have applied it to trunk and proposed it for 6.0.x. I made the following changes:
- changed package to org.apache.catalina.tribes.group.interceptors
- changed class name to SimpleCoordinator
- added the AL2 text to the beginning of the file
Comment 8 Robert Newson 2008-12-30 06:35:08 UTC
Thanks. I have since moved on to use a custom stack for group membership. I found an excellent paper which describes a robust mechanism for leader election. The paper also extends that algorithm to make a robust group membership protocol too.


Comment 9 Mark Thomas 2008-12-30 09:59:01 UTC
An updated patch is always welcome.
Comment 10 Robert Newson 2008-12-30 11:00:48 UTC
My comment was misleading. The "custom stack" in question is not based on Tribes at all.
Comment 11 Mark Thomas 2009-01-14 16:16:05 UTC
This has been applied to 6.0.x and will be included in 6.0.19 onwards.