Summary: | Race condition / out of order operation in session replication at node startup | ||
---|---|---|---|
Product: | Tomcat 6 | Reporter: | David Johle <djohle> |
Component: | Cluster | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | gbalogh |
Priority: | P2 | ||
Version: | 6.0.35 | ||
Target Milestone: | default | ||
Hardware: | PC | ||
OS: | Linux |
Description
David Johle
2012-07-05 21:56:22 UTC
In case it's helpful, here's the Cluster configuration...fairly basic stuff: <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"> <Manager className="my.deltamanager.extension.CustomManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Membership className="org.apache.catalina.tribes.membership.McastService" address="239.1.1.1" port="45564" frequency="500" dropTime="3000"/> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="auto" port="4000" autoBind="100" selectorTimeout="5000" maxThreads="6"/> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="\*\.page"/> <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/> <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> </Cluster> Thanks for the report. I think there is a problem with the behavior of the DeltaManager. As you know, DeltaManager is responsible for synchronizing the session on startup. A node receiving the EVT_GET_ALL_SESSIONS message is to serialize all session, and then sends back a EVT_ALL_SESSION_DATA message. After completing EVT_ALL_SESSION_DATA message, sends a EVT_ALL_SESSION_TRANSFERCOMPLETE message. At this time, if channelSendOptions is asynchronous(default), EVT_ALL_SESSION_DATA message is sent asynchronously. As a result, will be a race condition between the processing of the message containing the actual session data and the "transfer complete" message. I'm going to fix this behavior. I intend to make EVT_ALL_SESSION_DATA message always send in synchronous mode. Anyway the current workaround is to set 6 (sync + ack) to channelSendOptions. Best Regards. Fixed in trunk and 7.0.x and will be included in 7.0.30 onwards. Proposed for 6.0.x. Note: In this fix, EVT_ALL_SESSION_DATA message is sent in synchronous mode. Therefore, it waits for completion of a the message till Sender#timeout (default 3000 milliseconds). When timeout occurs while sending the EVT_ALL_SESSION_DATA message, you can configure following attributes. Sender#timeout DeltaManager#sendAllSessions DeltaManager#sendAllSessionsSize DeltaManager#sendAllSessionsWaitTime Moving to Tomcat 6 since it has been fixed in 7. Fixed in 6.0.x and will be included in 6.0.36 onwards. |