Bug 57532

Summary: Session expire message sent to cluster nodes even with DeltaSession configuration not to
Product: Tomcat 7 Reporter: andrew jardine <andrew.jardine>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED INVALID    
Severity: major CC: andrew.jardine
Priority: P2 Keywords: APIBug
Version: 7.0.42   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description andrew jardine 2015-02-04 00:23:56 UTC
Hi, - I suppose whether or not this is a bug depends on how you interpret it. I am using Tomcat 7.0.42 but I have also tried building all the way up to Tomcat 7.0.57 to see if any changes were made but to date nothing has. 

My cluster configuration using the DeltaManager for session replication. I am using a multi-cast configuration rather than fixed nodes. My manager is configured as follows --

<Manager className="org.apache.catalina.ha.session.DeltaManager"
                     expireSessionsOnShutdown="false"
                     notifyListenersOnReplication="true"
                />

.. the most important being that I don't want sessions to expire on shutdown. My interpretation of this is that in a controlled shut down scenario whereby I initiate a proper shutdown of Node A in an A, B, C cluster, that the server will be shut down and the sessions on A retained on B and C. This works as expected for any session that is not marked as Primary on the node that is being shut down. Primary flagged sessions however are causing a message to be sent to all nodes in the cluster causing the sessions to be lost. If the server (process) is killed no sessions are lost. 

I traced this issue back to the following scenario.

In the StandardManager when a shutdown event occurs it is calling a session.expire(true). Since we have configured the DeltaSession as our implementation class I logically went there. the expire( boolean ) method in that class simply makes a call to an overloaded expire( boolean, boolean ) version where the second argument is statically passed as true. The second argument in the overloaded method is used as a flag to determine whether or not to notifyCluster.

I forked the project (Tomcat 7.0.57 branch) and update the DeltaSession.expire( boolean ) method to use the following logic instead --

   /**
     * Perform the internal processing required to invalidate this session,
     * without triggering an exception if the session has already expired.
     *
     * @param notify
     *            Should we notify listeners about the demise of this session?
     */
    @Override
    public void expire(boolean notify) {

        boolean notifyCluster = true;

        if ( manager instanceof DeltaManager ) 
            notifyCluster = ((DeltaManager)manager).isExpireSessionsOnShutdown();

  
        expire( notify, notifyCluster );

    }

.. which allows me to preserve the same configuration for all Session managers except the Delta configuration. I built this code and replaced my binaries and have confirmed that with this logic, session expire is not communicated during shutdown events to the cluster. 

To me this seems like a defect which is why I am submitting this as an issue.
Comment 1 Mark Thomas 2015-02-04 19:37:08 UTC
StandardManager should be irrelevant since it isn't used for clustered web applications. The web application should be using DeltaManager.

On shutdown DeltaManager calls to two argument version of Session.expire() with the second argument set depending on expireSessionsOnShutdown.

I'd like to see a stack trace that shows where the call to DeltaSession.expire() is coming from.
Comment 2 andrew jardine 2015-02-06 02:48:51 UTC
Hey Mark,

I'll try to grab those details for you. All I can say for the moment is that if I set a breakpoint in the DeltaManager class for the expire(boolean) method, then during shutdown it hits. The only way I was able to preserve my session replication was to modify that method. Perhaps I have configured my cluster incorrectly? -- here is my configuration. I'll try to find some time this weekend to get a thread dump for you. 


server.xml --

<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="6">
<Manager className="org.apache.catalina.ha.session.DeltaManager"
                  expireSessionsOnShutdown="false"
                  notifyListenersOnReplication="true"/>
      <Channel className="org.apache.catalina.tribes.group.GroupChannel">
            <Membership className="org.apache.catalina.tribes.membership.McastService"
                                   address="224.5.0.1"
                                   port="45564"
                                   frequency="500"
                                   dropTime="3000"/>
           <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                            address="auto"
                            port="4000"
                            selectorTimeout="5000"
                            maxThreads="25"/>
           <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
                 <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
           </Sender>
           <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
           <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
           <Interceptor className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/>
</Channel>
<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
</Cluster>
Comment 3 Christopher Schultz 2015-02-06 03:28:27 UTC
(In reply to andrew jardine from comment #2)
> Perhaps I have configured my cluster incorrectly? -- here is my configuration.
>
> [snip]

You need N > 0 nodes to make a cluster. Are they all configured identically? Maybe you have one of them still running BackupManager.
Comment 4 andrew jardine 2015-02-06 14:50:14 UTC
I have tried this with 2 nodes all the way up to 5 nodes -- all configured the same, all the same results. The odd thing is that it only seems to affect sessions that are marked as Primary on the node that is being shut down. For example.

Node A                         Node B
session 1 (primary)            session 1
session 2                      session 2 (primary)
session 3                      session 3 (primary)
session 4 (primary)            session 4 
session 5 (primary)            session 5


If at this point I shutdown Node B, then session 2 and session 3 on Node A will be destroyed leaving me with -

Node A                         Node B
session 1 (primary)            
session 4 (primary)            
session 5 (primary)            

... but all nodes that were not primary do not appear to be sending the "SESSION EXPIRE" message to the cluster. 

I'll try to get that thread dump for you when I get a bit of breathing room at work.
Comment 5 Mark Thomas 2015-02-11 21:21:24 UTC
You are missing:

<ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener">

I've just confirmed with my 4-node test cluster and the latest 7.0.x code that - providing this is present - the sessions fail over correctly when the current primary node is shutdown gracefully.
Comment 6 andrew jardine 2015-03-09 13:59:29 UTC
Hey Mark,

Sorry for the delay -- other priorities came up. I'm trying this again right now, though I think I tried that as well already. Could be though that I did not have the session replication configured correctly when I tried it. If memory serves, this is what we found (though I am double checking this morning)

1. We set the jvmRoute attribute on the <Engine /> envelope. Node 01 had a value of jvmRoute="01", Node 02, a value of jvmRoute="02" etc.

2. We configured Apache to use a sticky session based on the jSessionID.

3. This worked and my first request when to Node 01 so I ended up with a session id similar to :: ABCD1234567890-01

4. All my subsequent traffic was routed to the 01 server.

5. Shutdown 01.

6. Apache started directing my requests to 02 -- but my jSessionID was now post-fixing the 02 jvmRoute value so I had -- ABCD1234567890-02 

7. Node 02 did not find a session with that ID, so it was creating a new session. 

-- again, I could have not had the replication working properly perhaps so I'll do some more testing today and update this ticket with anything I find.
Comment 7 andrew jardine 2015-03-09 14:59:15 UTC
Hey Mark,

UPDATE:

I am seeing the same behaviour, even with the JvmRouteBinderValve. At first I enabled the jvmRoute, but that was problematic because, as previusly mentioned, the route was post fixed to the sessionid. I removed the jvmRoute, but LEFT the JvmRouteBinderValve and restarted everything.

Session replication works. I have my 3 nodes behind an apache proxy that does RR load balancing to each of the nodes. When I shut down a node from the cluster, sometimes the session remains, other times it is lost. It appears, again, as though the primary node shut down is the problem.

For my requirements, the only time I would want a session expiration to be broadcast across the cluster is when a user logs out. The application handles this for me already so for Tomcat, the application server need only kill its own list of sessions on shut down. Perhaps my scenario is something that is outside the norm of what others/tomcat expects. Either way, the only solution I have found to date is to modify that method so that it does not default to broadcasting across the cluster.