Bug 43343 - Loss of data and concurrency issue with Catalina session persistent storage
Summary: Loss of data and concurrency issue with Catalina session persistent storage
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 5.5.27
Hardware: Other other
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-10 10:54 UTC by Wade Chandler
Modified: 2009-07-01 06:19 UTC (History)
0 users



Attachments
Reproducer, part 1: web application (1.70 KB, application/zip)
2009-05-01 19:17 UTC, Konstantin Kolinko
Details
Reproducer, part 2: client (6.56 KB, application/zip)
2009-05-01 19:40 UTC, Konstantin Kolinko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Wade Chandler 2007-09-10 10:54:45 UTC
A user was asking questions on the tomcat users mailing list about TC and how it
handled concurrency in sessions and session passivation/object caching of
sessions when they are inactive etc. I thought, surely this is using locks etc,
so had responded as such. Then I got curious. I started looking in the code and
found that indeed TC has a concurrency issue when the session is to pushed to
disk if it is inactive or the max number of live sessions is too high.

The problem is a user could have a session getting ready to become inactive. The
server is beginning to write the session out to disk. The user then comes in on
a request and gets the current session. They set an attribute. The web
application and the user think all is OK. But in reality the data they just put
into the session will be lost. The server just dumped their session to disk
containing the old data as they updated the copy. So, the next time they come in
the session will be pulled from disk with the old values and anything they put
into it will have been lost. Depending on the application this could be very bad.

To see the issue go to the file:
java/org/apache/catalina/session/PersistentManagerBase.java

methods:
swapIn
swapOut
writeSession
findSession
(other related)
then the different stores load and save methods. There is nothing keeping this
from happening.

If a lock is on a per session level this should keep it snappy for all other
requests etc. Either the session could be used as the lock or an object instance
variable on the session instance.

I noticed there is a comment in the source code for
PersistentManagerBase.processMaxIdleSwaps which reads:
// Swap out all sessions idle longer than maxIdleSwap
// FIXME: What's preventing us from mangling a session during
// a request?

So, apparently someone thought of this. Nothing is keeping it from mangling a
session currently.

What might be a fix is a session is given an instance variable which can be used
for a synchronization lock at the session level. In
PersistentManagerBase.findSession, this lock would be used before the session
can be returned. After the session map is accessed it should then return null if
the session had been passivating before it was asked to be found. The session
would then be loaded from storage again before it was ever given back to be
accessed had it been passivating. It would also be used in
PersistentManagerBase.* which call swapOut.

The code would look like this (might explain it better):

StandardSession:

/**
 Used for locking the session during persistence operations.
**/
Integer persistence_locker = new Integer(0);

PersistentManagerBase:

    /**
     * Return the active Session, associated with this Manager, with the
     * specified session id (if any); otherwise return <code>null</code>.
     * This method checks the persistence store if persistence is enabled,
     * otherwise just uses the functionality from ManagerBase.
     *
     * @param id The session id for the session to be returned
     *
     * @exception IllegalStateException if a new session cannot be
     *  instantiated for any reason
     * @exception IOException if an input/output error occurs while
     *  processing this request
     */
    public Session findSession(String id) throws IOException {

        Session session = super.findSession(id);
        //OK, at this point, we're not sure if another thread is trying to
        //remove the session or not so the only way around this is to lock
        //it (or attempt to) and then try to get it by this session id again.
        //If the other code ran swapOut, then we should get a null back during
        //this run, and if not, then by doing this we lock it out and then can
        //access the session safely and will call access on it to update the
        //access time and hopefully keep the processes from running swapOut
        //so, we're adding two accesses to the hashmap instead of one for
        //each request. But, we know we get the session or null back after
        //the other process has had a chance to remove it or not.
        if(session!=null{
	        synchronized(session.persistence_locker){
				session = super.findSession(session.getIdInternal());
				if(session!=null){
				   //we need to do this here
				   //to keep any external calling code from messing up the
				   //concurrency.
				   session.access();
				}
			}
		}
        if (session != null)
            return (session);

        // See if the Session is in the Store
        session = swapIn(id);
        return (session);

    }

    /**
     * Swap idle sessions out to Store if they are idle too long.
     */
    protected void processMaxIdleSwaps() {

        if (!isStarted() || maxIdleSwap < 0)
            return;

        Session sessions[] = findSessions();
        long timeNow = System.currentTimeMillis();

        // Swap out all sessions idle longer than maxIdleSwap
        // FIXME: What's preventing us from mangling a session during
        // a request?
        if (maxIdleSwap >= 0) {
            for (int i = 0; i < sessions.length; i++) {
                StandardSession session = (StandardSession) sessions[i];
                synchronized(session.persistence_locker){
            	    if (!session.isValid())
            	        continue;
            	    int timeIdle = // Truncate, do not round up
            	        (int) ((timeNow - session.getLastAccessedTime()) / 1000L);
            	    if (timeIdle > maxIdleSwap && timeIdle > minIdleSwap) {
            	        if (log.isDebugEnabled())
            	            log.debug(sm.getString
            	                ("persistentManager.swapMaxIdle",
            	                 session.getIdInternal(), new Integer(timeIdle)));
            	        try {
            	            swapOut(session);
            	        } catch (IOException e) {
            	            ;   // This is logged in writeSession()
            	        }
            	    }
				}
            }
        }

    }


    /**
     * Swap idle sessions out to Store if too many are active
     */
    protected void processMaxActiveSwaps() {

        if (!isStarted() || getMaxActiveSessions() < 0)
            return;

        Session sessions[] = findSessions();

        // FIXME: Smarter algorithm (LRU)
        if (getMaxActiveSessions() >= sessions.length)
            return;

        if(log.isDebugEnabled())
            log.debug(sm.getString
                ("persistentManager.tooManyActive",
                 new Integer(sessions.length)));

        int toswap = sessions.length - getMaxActiveSessions();
        long timeNow = System.currentTimeMillis();

        for (int i = 0; i < sessions.length && toswap > 0; i++) {
		synchronized(sessions[i].persistence_locker){
                    int timeIdle = // Truncate, do not round up
            	    (int) ((timeNow - sessions[i].getLastAccessedTime()) / 1000L);
	            if (timeIdle > minIdleSwap) {
	                if(log.isDebugEnabled())
	                    log.debug(sm.getString
	                        ("persistentManager.swapTooManyActive",
	                         sessions[i].getIdInternal(), new Integer(timeIdle)));
	                try {
	                    swapOut(sessions[i]);
	                } catch (IOException e) {
	                    ;   // This is logged in writeSession()
	                }
	                toswap--;
	            }
			}
        }

    }


    /**
     * Back up idle sessions.
     */
    protected void processMaxIdleBackups() {

        if (!isStarted() || maxIdleBackup < 0)
            return;

        Session sessions[] = findSessions();
        long timeNow = System.currentTimeMillis();

        // Back up all sessions idle longer than maxIdleBackup
        if (maxIdleBackup >= 0) {
            for (int i = 0; i < sessions.length; i++) {
                StandardSession session = (StandardSession) sessions[i];
                synchronized(session.persistence_locker){
                    if (!session.isValid())
                	    continue;
                	int timeIdle = // Truncate, do not round up
                    	(int) ((timeNow - session.getLastAccessedTime()) / 1000L);
                	if (timeIdle > maxIdleBackup) {
                    	if (log.isDebugEnabled())
                    	    log.debug(sm.getString
                    	        ("persistentManager.backupMaxIdle",
                    	        session.getIdInternal(), new Integer(timeIdle)));

                    	try {
                        	writeSession(session);
                    	} catch (IOException e) {
                        	;   // This is logged in writeSession()
                    	}
                	}
            	}
        	}
	    }

    }



So, currently one could very easily have a session and have some issues. Your
user would have a time when they thought they set something or your code could
think this then immediately the changes are lost. Anyways, that is what looks to
be the issue and a workable fix. The only place I saw other issues was inside of:
java/org/apache/catalina/valves/PersistentValve.java

where it incorrectly grabs the store from the PersistentManager and uses it
directly instead of using the manager API. To me this is bad in that the manager
is not able to be the manager and this other logic is accessing the store
directly and should never happen...unless it is used only in test cases etc.

The only way you could possibly *sort of* workaround this issue would be to use
a HttpSessionActivationListener, but even then it would be hard to code up a
100% reliable solution. You can see how this would *sort of* be a way to work
around the issue by looking at the class StandardSession method passivate , and
see the events will be thrown to alert code the session will passivate, but it
would be pretty complicated and ugly to protect the session completely.
Comment 1 Wade Chandler 2007-09-10 10:56:10 UTC
I thought about marking this as a P1 as it could cause some very hard to track
down data loss in a web application. What is the difference between P1 and P2
for the Tomcat project?
Comment 2 Chuck Caldarale 2007-09-10 11:09:30 UTC
Using a plain Object to synchronize on rather than an Integer(0) would be very
slightly more efficient.
Comment 3 Christopher Schultz 2007-09-10 13:51:32 UTC
Any reason not to use the Session object itself as the monitor?
Comment 4 Wade Chandler 2007-09-10 14:23:51 UTC
No real reason I can find. We might want to use an instance variable as the lock
just in case there are ever any other synchronized methods or sections added to
the session where the entire session is locked. That would help if those
synchronized session methods would need to be used by other threads and could be
accessed even during passivation. I only find one place where the session itself
is locked and that is in the expires method where a synchronized (this) is used.
 So, using the session itself would be OK for now, and it may logically work out
to be the case that no other action should ever be able to actually be made or
ever wanted on the session while it is being cached/passivated. I mentioned it
could be used as the lock in the my description some where. So, at least for
now, or always if deemed to always be wanted, I don't see any reason why it
itself can not be the lock.
Comment 5 Mark Thomas 2008-05-01 14:15:49 UTC
I have applied a patch to trunk based on the discussion here and proposed it for 6.0.x.
Comment 6 Mark Thomas 2008-05-15 10:28:56 UTC
The patch has been applied to 6.0.x and will be in 6.0.17 onwards.

Thanks for the suggested patch.
Comment 7 Konstantin Kolinko 2009-05-01 19:17:56 UTC
Created attachment 23592 [details]
Reproducer, part 1: web application

Web application for TC 5.5, 6.0 to reproduce the issue
Comment 8 Konstantin Kolinko 2009-05-01 19:40:29 UTC
Created attachment 23593 [details]
 Reproducer, part 2: client

HTTPClient.
commons-httpclient-3.1.jar and its dependencies are not included.

To run:
1. Deploy bug43343.war on a web server
2. Unpack this bug43343_client.zip and put the following libraries into lib subfolder:
commons-httpclient-3.1.jar
commons-codec-1.3.jar
commons-logging-1.1.1.jar
3. Run client.bat if you are on Windows

The client creates several (100) instances of HttpClient that connect to 
 http://localhost:8080/bug43343/index.jsp?number=nn&loop=ll

Each HttpClient remembers its cookies, and thus connects to a separate session.
The number parameter in URL is unique for each HttpClient instance, and loop starts with 0 and increments by 1 for each request from the same client.

The context.xml file of the web application configures PersistentManager with some low values, to that this issue is more visible.

The requests are being sent serially, with an interval of at least 4 seconds between request of the same client.

The client prints out the response of web application, and terminates if status code 500 is received.

Expected result:
Successful run of the client. That is, 150 loops, which needs about 10 minutes to complete.
Comment 9 Konstantin Kolinko 2009-05-01 20:27:25 UTC
I am reopening this issue, because:
1. The fix was not applied to TC 5.5, and the issue can be reproduced with TC 5.5.27.
2. The fix [1], as applied to 6.0 is not complete, though one may argue.

I am reducing severity to "normal".
The issue does not apply to the default configuration, where StandardManager is used.

[1] http://svn.apache.org/viewvc?rev=656751&view=rev

On the issue:
When PersistentManagerBase swaps out a session it performs two actions (see its implementation of swapOut()):
1. writing session data out
2. calling session.recycle() which invalidates the session

When session that is being used by an active request is being swapped out at the same time (and what is the need for doing that? - you will not be able to free the memory, because there is a reference), two races can occur:

Case A: request writes data to session at the same time as it is being written out and before the recycle() call.
Result: the update is lost, next request will see the old values.
This is the case that is fixed by PersistentManagerBase patch in rev=656751 [1].


Case B: request accesses session after session.recycle(), e.g. after swapOut() completes.
Result: java.lang.IllegalStateException: setAttribute: Session already invalidated,
though the session was not invalidated, but just swapped out.


---------------------------------------------
Sample output from the client when Case A is observed (with TC 5.5.27):

Method failed: HTTP/1.1 500 Internal Server Error

Error (number=-1509748992, loop=140):
java.lang.IllegalStateException: Attribute loop is not equal to the expected val
ue.
Expected: 139
Actual: 138

---------------------------------------------

Case B can be reproduced more easily if you replace
s/ if (false) { / if (true) { /
on line 57 in file index.jsp of the web application.

Sample output from the client when Case B is observed (current tc6.0.x, with the mentioned change of line 57):

Method failed: HTTP/1.1 500 Internal Server Error

Error (number=-1309785265, loop=0):
java.lang.IllegalStateException: setAttribute: Session already invalidated


I have also seen the following NullPointerException when running against TC 5.5.27, and I think that it is the Case B:

The server encountered an internal error () that prevented it from fulfilling this request.
exception:
   org.apache.jasper.JasperException
        org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServle
tWrapper.java:460)
        org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja
va:373)
        org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329)
        org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
root cause:
   java.lang.NullPointerException
        org.apache.catalina.session.StandardSession.setAttribute(StandardSession
.java:1309)
        org.apache.catalina.session.StandardSession.setAttribute(StandardSession
.java:1248)
        org.apache.catalina.session.StandardSessionFacade.setAttribute(StandardS
essionFacade.java:130)
        org.apache.jsp.index_jsp._jspService(index_jsp.java:88)
        org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
        org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja
va:331)
        org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329)
        org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
---------------------------------------------

Case B can be solved by avoiding swapping out sessions that are owned by a request. Synchronizing on a session is not sufficient.

A workaround is to avoid enabling IdleSwap in the configuration (it is disabled by default).
Comment 10 Mark Thomas 2009-05-25 15:37:40 UTC
Case B fixed in trunk and proposed for 6.0.x
Complete fix proposed for 5.5.x.
Comment 11 Mark Thomas 2009-06-23 11:32:52 UTC
This has been fixed for 6.0.x and will be included in 6.0.21 onwards.

Transferring to TC5 so it can be used to track the fix there.
Comment 12 Mark Thomas 2009-07-01 06:19:40 UTC
This has been fixed in 5.5.x and will be included in 5.5.28 onwards.