Summary: | Tomcat does not invalidate sessions after session-timeout period has passed. | ||
---|---|---|---|
Product: | Tomcat 5 | Reporter: | Eddie Wynn <eddiew> |
Component: | Catalina | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | amasson, kldavis, suzuki.yuichiro |
Priority: | P2 | ||
Version: | Nightly Build | ||
Target Milestone: | --- | ||
Hardware: | Other | ||
OS: | other | ||
Attachments: |
Stress Test Framework
A JSP That can be used to see the accessCount of a session |
Description
Eddie Wynn
2005-11-04 15:38:08 UTC
Is something holding a reference to those sessions? And does this happen with 5.0.30 or 5.5.12? (In reply to comment #1) > Is something holding a reference to those sessions? > And does this happen with 5.0.30 or 5.5.12? Thanks for the interest. This happens with 5.0.28 in production - I cannot reproduce it anywhere else (using 5.0.28) and I currently cannot migrate the production system to 5.0.30 either. So, in short I cannot say if it happens on anything other than 5.0.28 - if someone could say that this behaviour was different in 5.0.30 then I would be able to move forwards but without something more definitive I can't. I am not sure what you are getting at with the reference holding - if you mean is something keeping the session alive i.e. polling or such like then the answer is no. If you mean is a Java object keeping a reference to it - then aside from what Tomcat does internally (where I expect it must do...) then the answer is also no. Hope this helps. Eddie We have a similar probelm - but only when tomcat (5.0.28 as well) is connected through ajp13. As a difference to the original poster we do keep a reference to the Session Objects from within the servlet. This list is updated when we see a new session or when the valueUnbound() of an object (which is placed in every session) is called. This problem occurs on Windows (with IIS as frontend) as well as on Linux with Apache-httpd, but not if Tomcat is beeing run stand-alone (Windows or Linux). I have found one problem about the accessCount. In some times, it will not decrease to 0 after one request. So the session manger will never invalidate it. The Tomcat version is 5.0.19. (In reply to comment #4) > I have found one problem about the accessCount. > In some times, it will not decrease to 0 after one request. So the session > manger will never invalidate it. > The Tomcat version is 5.0.19. This sounds like you might be onto something - how did you find that? Can you reproduce this at will? It's very difficult to recure this problem. I have write a very complex web application. and mostly it will work fine. but some time, it will occur this problem. This problem mostly ocurred when two browser access the same page. And the web application will do following things: 1. APP A will include some pages in other applications. 2. There is a request wrapper. when include the page in other applications and try to get the session object using request.getSession(), it will get the session which is generated in App A. 3. The session objects generated in A, will be managed by our system. (of cuase , there is one listener to remove the session when it timeout) Sorry, I can only provide these info for you. (In reply to comment #6) > > This problem mostly ocurred when two browser access the same page. > Do you mean that wo browsers use the same session to access a page, or just the ( quite usual ) case that two people use the web application? > And the web application will do following things: > 1. APP A will include some pages in other applications. > 2. There is a request wrapper. when include the page in other applications and > try to get the session object using request.getSession(), it will get the > session which is generated in App A. That means in the included application B and C you get the session from application A? Application != context for you? > 3. The session objects generated in A, will be managed by our system. (of > cuase , there is one listener to remove the session when it timeout) > Sorry, I can only provide these info for you. So it seems that the common ground for all of us here is, that we have some objects in the session that implement the HttpSessionBindingListener or HttpSessionActivationListener. Maybe the cause lies within these? Do you mean that wo browsers use the same session to access a page, or just the ( quite usual ) case that two people use the web application? A: not the same session. Different IE browser will generate differnt session ID. That means in the included application B and C you get the session from application A? Application != context for you? A: yes. So it seems that the common ground for all of us here is, that we have some objects in the session that implement the HttpSessionBindingListener or HttpSessionActivationListener. A: I did not put any object implement the HttpSessionBindingLisener or HttpSessionActivationListener in the session. I think with different thread ( when include the page in other application), when it will operate the session object, it will not synchronize the session object (or its filed accessCount). Please check this defect: 28846 I think it should be the same problem. :-) (In reply to comment #8) > Do you mean that wo browsers use the same session to access a page, or just the > ( quite usual ) case that two people use the web application? > > A: not the same session. Different IE browser will generate differnt session > ID. > > That means in the included application B and C you get the session from > application A? > > Application != context for you? > > A: yes. Ok, so you can replicate the error if: 1. you have two or more simultaneous users accessing a page in your web application. 2. you have more than one application (servlet / jsp) within the same context, so the session stays the same through all requests. 3. one of your servlets includes the output from other servlets/jsps Did I get that right? > > > So it seems that the common ground for all of us here is, that we have some > objects in the session that implement the HttpSessionBindingListener or > HttpSessionActivationListener. > > A: I did not put any object implement the HttpSessionBindingLisener or > HttpSessionActivationListener in the session. > Ah - I had thought so, because you mentioned you remove the objects from the session when it expires. The only way known to me to do this is to implement one of those Interfaces (can't remember which one currently) > > I think with different thread ( when include the page in other application), > when it will operate the session object, it will not synchronize the session > object (or its filed accessCount). > That seems reasonable. If the access count is not properly synchronized it will indeed become screwed eventually. Yet it is strange that it will only occur in our application if we use the ajp13 connector with SSL requests. SSL on tomcat standalone - everyting OK. Plain http through Apache httpd (Linux) with ajp13 - everything OK SSL through Apache httpd (Linux) with ajp13 - No Session timeouts SSL through IIS with ajp13 - No Session timeouts Basically the same application on all machines... I'll try and stress-test my testserver a little and see if I can reproduce it with enough simultaneous requests (In reply to comment #9) > Please check this defect: 28846 > > I think it should be the same problem. :-) But that bug applies to the tapestry project. Is the engine mentioned in that bug the tapestry engine or the tomcat engine? (In reply to comment #11) > (In reply to comment #9) > > Please check this defect: 28846 > > > > I think it should be the same problem. :-) > > But that bug applies to the tapestry project. > Is the engine mentioned in that bug the tapestry engine or the tomcat engine? We're seeing this same issue, as well. We're using tomcat 5.5.4, and we appear to only see it in environments under heavy load when using the ajp13 connector. We have another customer redirecting from IIS and they're not experiencing this issue. I spent a couple hours looking at the code and I didn't see anything weird. What I can tell you is that for the sessions this occurs on, their inactivity time IS properly set, but they still don't get invalidated. When I call invalidate on those sessions, they ARE invalidated. We're not using the tapestry project at all--just Tomcat and Apache (or Tomcat and IIS). Those sessions which fail to invalidate stay around forever--or seemingly so. I've seen inactivity times in the 2500+ minutes range. (In reply to comment #10) > (In reply to comment #8) > > Do you mean that wo browsers use the same session to access a page, or just the > > ( quite usual ) case that two people use the web application? > > > > A: not the same session. Different IE browser will generate differnt session > > ID. > > > > That means in the included application B and C you get the session from > > application A? > > > > Application != context for you? > > > > A: yes. > > Ok, so you can replicate the error if: > 1. you have two or more simultaneous users accessing a page in your web application. > 2. you have more than one application (servlet / jsp) within the same context, > so the session stays the same through all requests. > 3. one of your servlets includes the output from other servlets/jsps > > Did I get that right? > > > > > > > So it seems that the common ground for all of us here is, that we have some > > objects in the session that implement the HttpSessionBindingListener or > > HttpSessionActivationListener. > > > > A: I did not put any object implement the HttpSessionBindingLisener or > > HttpSessionActivationListener in the session. > > > Ah - I had thought so, because you mentioned you remove the objects from the > session when it expires. The only way known to me to do this is to implement one > of those Interfaces (can't remember which one currently) > > > > > I think with different thread ( when include the page in other application), > > when it will operate the session object, it will not synchronize the session > > object (or its filed accessCount). > > > That seems reasonable. If the access count is not properly synchronized it will > indeed become screwed eventually. > Yet it is strange that it will only occur in our application if we use the ajp13 > connector with SSL requests. > SSL on tomcat standalone - everyting OK. > Plain http through Apache httpd (Linux) with ajp13 - everything OK > SSL through Apache httpd (Linux) with ajp13 - No Session timeouts > SSL through IIS with ajp13 - No Session timeouts > Basically the same application on all machines... > > I'll try and stress-test my testserver a little and see if I can reproduce it > with enough simultaneous requests We're seeing this issue on our NON-SSL Apache ajp13 system... so I don't think SSL is a requirement for this issue to occur. > We're seeing this issue on our NON-SSL Apache ajp13 system... so I don't think > SSL is a requirement for this issue to occur. All sessions to my instance of Tomcat are HTTP - they all come from a Proxy (Apache) which translates HTTPS to HTTP - so I produce this bug with no SSL and no ajp13. > We're seeing this same issue, as well. We're using tomcat 5.5.4, and we appear
> to only see it in environments under heavy load when using the ajp13 connector.
> We have another customer redirecting from IIS and they're not experiencing this
> issue.
>
> I spent a couple hours looking at the code and I didn't see anything weird.
> What I can tell you is that for the sessions this occurs on, their inactivity
> time IS properly set, but they still don't get invalidated. When I call
> invalidate on those sessions, they ARE invalidated.
> [...]
> Those sessions which fail to invalidate stay around forever--or seemingly so.
> I've seen inactivity times in the 2500+ minutes range.
>
This all fits into the threads issue theory.
Tomcat maintians a reference count for each session which is increased at the
beginning of a request and decreased at he end of the request.
If - for whatever reason - this reference count does not reach zero after all
requests have finished (e.g. because one thread was increasing while another was
decreasing, or an exception occurred which is not propably handled or... )
tomcat will not invalidate the session because it thinks there is still some
request currently running (which basically is a good thing. I do generally NOT
want my session to expire while a request is still running: e.g. a large download )
But this is still only a theory - I need to find a few free minutes to implement
a test case...
(In reply to comment #15) > > We're seeing this same issue, as well. We're using tomcat 5.5.4, and we appear > > to only see it in environments under heavy load when using the ajp13 connector. > > We have another customer redirecting from IIS and they're not experiencing this > > issue. > > > > I spent a couple hours looking at the code and I didn't see anything weird. > > What I can tell you is that for the sessions this occurs on, their inactivity > > time IS properly set, but they still don't get invalidated. When I call > > invalidate on those sessions, they ARE invalidated. > > [...] > > Those sessions which fail to invalidate stay around forever--or seemingly so. > > I've seen inactivity times in the 2500+ minutes range. > > > This all fits into the threads issue theory. > Tomcat maintians a reference count for each session which is increased at the > beginning of a request and decreased at he end of the request. > > If - for whatever reason - this reference count does not reach zero after all > requests have finished (e.g. because one thread was increasing while another was > decreasing, or an exception occurred which is not propably handled or... ) > tomcat will not invalidate the session because it thinks there is still some > request currently running (which basically is a good thing. I do generally NOT > want my session to expire while a request is still running: e.g. a large download ) > > But this is still only a theory - I need to find a few free minutes to implement > a test case... I don't mean to throw a wrench in the works, but if that's true, wouldn't that mean calling invalidate on the session later would NOT invalidate the session, or does invalidate ignore the session access count? Also, I just wanted to confirm that there are no large files to download in our system, so all of these users definately have an inactive connection--we're not seeing normal system behavior waiting on files. (In reply to comment #16) > I don't mean to throw a wrench in the works, but if that's true, wouldn't that > mean calling invalidate on the session later would NOT invalidate the session, > or does invalidate ignore the session access count? > > Also, I just wanted to confirm that there are no large files to download in our > system, so all of these users definately have an inactive connection--we're not > seeing normal system behavior waiting on files. > Most of all this happens in org.apache.catalina.session.StandardSession and org.apache.catalina.session.StandardManager. Calling invalidate() on the session more or less directly maps through to expire(), which will simply set the accessCount to 0 and remove the session from the sessionManager. After spending the whole of yesterday afternoon debugging I can also confirm, that it is indeed the accesssCount running out of sync. Alas, I still can not reproduce the problem. ( we found the wrong accessCount by running the failing production machine in debug mode and connecting to it with jdb) On my local (Windows XP) development machine with Sun-jdk 1.5.0_06, tomcat 5.0.28 and apache 2.0.55 connected through ajp13 (same setup as production, except for the OS) this simply does not occur. I also have second thoughts about the race condition theory, because one would expect, that such a thing would happen only once in a while - but on the servers we are seeing the issue on, it happens all the time (if not every time)... Has there been any update on this bug? I have the same problem, some sessions never expire (tomcat 5.5.12, jdk 1.5.0). It seems that the session accessCount is not correctly decremented. In my application, a single web browser can send several asynchronous XmlHttpRequests at the same time, so there are concurrent accesses on the server sesssion. I have a look at tomcat source code and it seems that the session validy management is not always 'synchronized', so I agree with the "race condition theory" ... (In reply to comment #19) > > I have a look at tomcat source code and it seems that the session validy > management is not always 'synchronized', so I agree with the "race condition > theory" ... > Yes, that were my thoughts as well, but the ++ and -- operations (as in sessionCount++) are atomic because sessionCount is an int. They should not need to be synchronized according to the virtual machine specifications. On another note: We also recently moved one perfectly well working installation to a new server, and suddntly we encounter this issue there as well. Previous installation: Single-Intel-CPU SuSE 9.0 Linux with kernel 2.6.11 Tomcat 5.0.29 - standalone jdk1.5.0_02 current installation: Dual AMD-x64 CPUs SuSE 9.3 (mostly out-of the box) with kernel 2.6.14 Tomcat 5.0.30 Apache httpd mod_jk jdk1.5.0_06 (x64) So the trigger has to be either the 64bit CPU / Java or Tomcat and mod_jk, or something in my configuration (the application is the same .war file) I do not have local access to any x64 machine, so I cannot debug that here, maybe someone else can jump in? (In reply to comment #20) > Yes, that were my thoughts as well, but the ++ and -- operations (as in > sessionCount++) are atomic because sessionCount is an int. They should not need > to be synchronized according to the virtual machine specifications. Generally this is not enough to have thread-safe code. For instance this kind of code is not thread-safe: this.x--; if (this.x == 0) ... In this case AtomicInteger.decrementAndGet can be used. > > So the trigger has to be either the 64bit CPU / Java or Tomcat and mod_jk, or > something in my configuration (the application is the same .war file) Threading problems (ie lack of synchronization) are more visible on multi-CPU or multi-core configurations. > > Generally this is not enough to have thread-safe code. For instance this kind of > code is not thread-safe: > this.x--; > if (this.x == 0) ... > In this case AtomicInteger.decrementAndGet can be used. Of course, but as far as I could tell, in the tomcat session-management there is a sessionCount-- for every sessionCount++, so in the end there should be a total sessionCount of 0. Maybe one of the checks for sessionCount==0 does not always immediately give the 'correct' result - but in the long run it has always to return to zero. > > > > > So the trigger has to be either the 64bit CPU / Java or Tomcat and mod_jk, or > > something in my configuration (the application is the same .war file) > > Threading problems (ie lack of synchronization) are more visible on multi-CPU or > multi-core configurations. > I'll have to check if all machines we see this on are multi-core... (In reply to comment #19) > I have the same problem, some sessions never expire (tomcat 5.5.12, jdk 1.5.0). > It seems that the session accessCount is not correctly decremented. > > In my application, a single web browser can send several asynchronous > XmlHttpRequests at the same time, so there are concurrent accesses on the server > sesssion. > > I have a look at tomcat source code and it seems that the session validy > management is not always 'synchronized', so I agree with the "race condition > theory" ... > It's possible but rare. -1 for adding syncs, though (feel free to use a patched Tomcat), as accessCount is a gimmick to support stupid usage scenarios (basically, people who were using really long running requests with really short expiration times). Either a smarter way of implementing this could be used, or the session could simply be expired if it becomes inactive for a really long period (say, one hour, or 5 times the regular timeout, whichever is greater). Proposed patch: in org.apache.catalina.session.StandardSession line 284 (tomcat 5.0.28) /** * The access count for this session. */ protected transient int accessCount = 0; has to be replaced by /** * The access count for this session. */ protected volatile transient int accessCount = 0; Explanation: The expression accessCount++ is not thread-safe because each thread may hold a local copy of the variable which does not have to be synchronized with main memory immediately. The volatile keyword forces this synchronization. See the discussion on http://forum.java.sun.com/thread.jspa?threadID=604831&start=30&tstart=0 [quote] 2) Local vs. main mem: Threads have--or can have--their own local copies of shared variables. When T1 writes a value to a variable, if the variable is not volatile and there's no syncing, that new value may live only in T1's local copy and may never get written to main mem, which means other threads may never see that new value. Entering and leaving sync blocks forces a reconciliation between the thread's local memory and main mem. Declaring a variable volatile forces every read and write of that variable to go to main mem. [end quote] Another solution would be to synchronize the code segments. I have to apologize for my claim that ++ and -- were atomic for int datatypes. This seems to be only true for single-CPU systems (without hyperThreading). (In reply to comment #24) > Proposed patch: -1. Pretend you read my comment. (In reply to comment #25) > (In reply to comment #24) > > Proposed patch: > > -1. Pretend you read my comment. Well, apart from the fact that I wrote my comment while you posted yours (mid-air-collision) I don't see any obvious reason against syncing the accessCount with the volatile keyword. The accesscount obviously needs to be synchronized in some way (or be removed, which I don't fancy because of large, time consuming, downloads). Of course I would be happy to improve my understanding, so please explain. For the "rare" issue. We see quite some of these stale sessions. (appx. 2-10 a day) I'm not saying that this is a major security issue, but it over time it gives an attacker quite a chance to guess some sessionId. Many people will not even be aware of this issue, because you can only see it if you keep track of sessions yourself. All others might take a look into the manager application and enjoy the number of concurrent users they usually have, not knowing those sessions should have expired a long time ago. (In reply to comment #26) > Well, apart from the fact that I wrote my comment while you posted yours > (mid-air-collision) > > I don't see any obvious reason against syncing the accessCount with the volatile > keyword. And you verified it actually fixed this "issue" ? > For the "rare" issue. We see quite some of these stale sessions. (appx. 2-10 a day) > I'm not saying that this is a major security issue, but it over time it gives an > attacker quite a chance to guess some sessionId. > > Many people will not even be aware of this issue, because you can only see it if > you keep track of sessions yourself. All others might take a look into the > manager application and enjoy the number of concurrent users they usually have, > not knowing those sessions should have expired a long time ago. As usual, 2-3 useless paragraphs as soon as there's a disagreement with a user. Special bonus for the security fud ;) (In reply to comment #24) > Proposed patch: > > in org.apache.catalina.session.StandardSession > line 284 (tomcat 5.0.28) > > /** > * The access count for this session. > */ > protected transient int accessCount = 0; > > has to be replaced by > > /** > * The access count for this session. > */ > protected volatile transient int accessCount = 0; > > Explanation: > The expression accessCount++ is not thread-safe because each thread may hold a > local copy of the variable which does not have to be synchronized with main > memory immediately. > The volatile keyword forces this synchronization. > > See the discussion on > http://forum.java.sun.com/thread.jspa?threadID=604831&start=30&tstart=0 > > [quote] > 2) Local vs. main mem: Threads have--or can have--their own local copies of > shared variables. When T1 writes a value to a variable, if the variable is not > volatile and there's no syncing, that new value may live only in T1's local copy > and may never get written to main mem, which means other threads may never see > that new value. Entering and leaving sync blocks forces a reconciliation between > the thread's local memory and main mem. Declaring a variable volatile forces > every read and write of that variable to go to main mem. > [end quote] > > Another solution would be to synchronize the code segments. > > I have to apologize for my claim that ++ and -- were atomic for int datatypes. > This seems to be only true for single-CPU systems (without hyperThreading). Volatile may not be the best idea. From the article: http://cephas.net/blog/2003/02/17/using_the_volatile_keyword_in_java.html "Careful, volatile is ignored or at least not implemented properly on many common JVM's, including (last time I checked) Sun's JVM 1.3.1 for Windows. There was an article on DDJ that could demonstrate this even on single-processor machines..." And from the article: http://www.javaperformancetuning.com/tips/volatile.shtml "Note however that volatile has been incompletely implemented in most JVMs. Using volatile may not help to achieve the results you desire (yes this is a JVM bug, but its been low priority until recently). " Our environment: We are only seeing this issue for sure on one machine, which is a 4 way (with HT--8 virtual CPU's) box. We see the issue pretty frequently occur there... maybe 20-100 times a day for a total of 2,000-4,000 total sessions. Also, this environment is NOT 64-bit. These are 32-bit HT Xeons. They are running the Sun 1.4.2_06 or the 1.4.2_08 JDK... So we know this issue occurs on 32 bit systems as well. (In reply to comment #28) > Volatile may not be the best idea. From the article: > http://cephas.net/blog/2003/02/17/using_the_volatile_keyword_in_java.html -1 for sync, and feel free to test volatile (I'm only -0 for it). So the only option you have left is simple check to force expiration of sessions which are way past the normal timeout (like, ten times). Per the previous comment: "-1 for sync, and feel free to test volatile (I'm only -0 for it). So the only option you have left is simple check to force expiration of sessions which are way past the normal timeout (like, ten times).", isn't this a violation of the Servlet specification? The specification states "The session-timeout element defines the default session timeout interval for all sessions created in this web application. The specified timeout must be expressed in a whole number of minutes. If the timeout is 0 or less, the container ensures the default behaviour of sessions is never to time out. If this element is not specified, the container must set its default timeout period." So in other words, the session-timeout says how many minutes before the session is expired. To expire in more or less minute than that is a violation of the specification, IMO. Also, as a Tomcat user, let me tell you where our users are noticing and complaining about this bug. We have a Current User Activity hyperlink in the application that lets the system administrator view which users are online and how long they have been idle. This lets them see if, for example, see if any users are online so that they know whether they can safely bounce Tomcat. (In reply to comment #30) You know, I do not care about what you think. I am proposing this, and that's the way it will be, I will never accept a fix based on syncs (of course, it's OSS, so you are more than welcome to use a custom patch). BTW, this may not have anything to do with accessCount thread safety anyway. I'm not trying to say this should be fixed with syncs. I'm just trying to let you know that there are users out there using this feature and that we would like a fix if possible :) It doesn't matter to me how the fix is implemented. and it will be fixed if you can provide us with a test case to reproduce the problem. That way we can fix the real problem, whether it requires sync or not sync is not relevant until we have identified the scenario where it happens. without a test case we could patch something different, leading to other problems, and that is what Remy not so eloquently might be trying to explain to you. (In reply to comment #33) > and it will be fixed if you can provide us with a test case to reproduce the > problem. That way we can fix the real problem, whether it requires sync or not > sync is not relevant until we have identified the scenario where it happens. > > without a test case we could patch something different, leading to other > problems, and that is what Remy not so eloquently might be trying to explain to you. > > If we can get a reproducable test case, I will provide one to you. As is, we are running the exact same software in hundreds of locations and we have only seen this at one site. Attempts to reproduce it on nearly identical hardware with a nearly identical configuration have been unsuccessful. Our ability to reproduce the exact Thanks for your help so far. If possible, I will look into patching the system with a volatile accessCount, but I don't think this is going to be possible. Spare your time. Volatile is not going to save you. While "x++" looks somehow atomic it simply isn't. Read as "x = x+1". Whether x is volatile or not simply doesn't matter. In order to do correct access counting, you need to synchronize or use the atomics provided in Java 5. And anyone who opposes that should remove the variable altogether as it doesn't do any good then. And should get some consulting about his fear of synchronization. Since this a severe bug for our customers (because multiple concurrent sessions are not allowed for a single user ID), I use the following simple workaround: 1) A servlet Filter mapped to "/*" manages 2 session attributes (*): - the end date of the last service() call, which is more interesting than the access date - the count of running service() calls 2) An HttpSessionListener maintains a list of all valid sessions. It has a timer that checks at a fixed rate if any session in this list has expired. The inactivity duration is computed from the end date of the last service(), so long running requests are correctly handled. Sessions that have at least one running service() are not invalidated, they are checked later. (*) the order of attribute updates is important to avoid a race condition that can occur if the timer checks the session list between the two calls to session.setAttribute(). As an aside it is possible to use atomics from Java 5 in lower versions over Java through the use of this open source library http://dcl.mathcs.emory.edu/util/backport-util-concurrent/ Alright, since a workaround has been suggested and the original poster has not come up with a reproducible test case, I'm closing this item for now. If the original poster or anyone else comes up with a way to reproduce this, please feel free to reopen this item, attach your new test case, and we will be glad to look at it. (In reply to comment #38) > Alright, since a workaround has been suggested and the original poster has not > come up with a reproducible test case, I'm closing this item for now. If the > original poster or anyone else comes up with a way to reproduce this, please > feel free to reopen this item, attach your new test case, and we will be glad to > look at it. I respectfully disagree. A number of people (myself included) have reproduced this bug, quite repeatedly. I have a very reproducable testcase onsite, I am just unable to provide our whole application to a third party. What we have been unable to provide is a self-contained testcase to reproduce this issue. I don't feel this is an issue that only one person has been dealing with--that would justify closing it. This issue has been seen on a number of environments, using entirely different applications. The workaround is to: 1) Make a container object to hold the tomcat session and track the inactivity manually 2) Write a background thread to invalidate the session once it times out. I view this as a very serious bug in tomcat, even if it's difficult to reproduce. I'm not blaming anyone or saying that it needs to get fixed--you can't very well fix an issue you can't reproduce, but closing it when it still exists doesn't seem like a good idea. I am re-opening this bug as I do not believe that marking it as 'works for me - fixed' is satisfactory to any of the people who are suffering this bug. If it was just one person suffering from the bug then this might be an acceptable reply but given that there are multiple people who have reported it - and some people who have managed to debug Tomcat and suggest what the problem might be - to just close it without proper investigation is unacceptable. I am aware that I have not provided a reproducable test case but like many other contributors I only see this issue in a live well loaded environment - in my case in a clients production environment - I simply cannot send you my clients entire application to test with and cannot reprodcue it here on our development machines - but this does not mean that there is not a problem. I look forward to some more in depth investigation and proposals for solving this problem. Thanks, Eddie We are seeing this issue in our production environment as well. We are running 5.5.15 on a dual xeon processor ibm blade. We have seen it for atleast the last year and found the same problem occurring on 5.0.28. We are running an identical staging configuration which we are unable to reproduce, presumably because it is only occurring under significant load. We discovered the problem when we implemented a session monitoring utility for our app. (In reply to comment #38) > Alright, since a workaround has been suggested and the original poster has not > come up with a reproducible test case, I'm closing this item for now. If the > original poster or anyone else comes up with a way to reproduce this, please > feel free to reopen this item, attach your new test case, and we will be glad to > look at it. It sounds to me like the reproducible test case is the problem. It is pretty near impossible to provide the application, hardware configuration AND load necessary to reproduce this on a running instance of tomcat. In lieu of this, I propose a simple test case which reproduces in separate test code which closely mimics the behavior of tomcat. This behavior is essentially multiple requests in multiple threads accessing the int accessCount field in a session object. Each thread will increment the accessCount and then decrement it when the request completes. Here is some sample code to which basically does that: import java.util.concurrent.atomic.AtomicInteger; public class SharedIntTest { public AtomicInteger execCount = new AtomicInteger(); public AtomicInteger countAtomic = new AtomicInteger(); public int count; public static void main( String[] args ) { final SharedIntTest shared = new SharedIntTest(); int maxThreads = 1000; while ( true ) { //Initialize counters shared.countAtomic.set( 0 ); shared.execCount.set( 0 ); shared.count = 0; for ( int i = 0; i < maxThreads; i++ ) { new Thread( new Runnable() { public void run() { //increment counters shared.countAtomic.incrementAndGet(); shared.count++; //decrement counters shared.countAtomic.decrementAndGet(); --shared.count; //increment execution counter shared.execCount.incrementAndGet(); } } ).start(); } while ( shared.execCount.get() < maxThreads ) { try { Thread.sleep( 10 ); } catch( InterruptedException e ) { } } if ( shared.count != 0 ) System.out.println( "Count Error: " + shared.count + "/" + shared.countAtomic.get() + " " + shared.execCount.get() ); } } } The variable count basically represents the accessCount field in the session. In the body of the main method there is a while loop which repeatedly runs the test code. The test code initalizes all of the variables, and then launches a number of threads, which represent the requests in tomcat which are all incrementing and decrementing accessCount on a given session. All that the threads do is to increment the count variable and then immediately decrement it, basically what the request threads are doing in tomcat. The threads also increment a variable called execCount which is used to determine the number of completed threads. In the main thread, there is a loop which monitors the execCount to determine when all of the threads are completed. To make things interesting I have put an AtomicInteger counter in, countAtomic, to determine whether or not the Atomic solution mentioned in comment #21 would resolve the issue. When run on a single processor, this could runs continuously and never prints any errors, as we would hope. When I run this code on a dual processor, within a couple of minutes a test will run where the final result of the count variable is not zero. The countAtomic variable always has a final result of zero. In conclusion, I believe that Matthias Ernst in comment #35 is exactly right, either a sync must be done or Atomics must be used. Otherwise the session management code as it exists now will be very broken on multiprocessor machines. Created attachment 18432 [details]
Stress Test Framework
Here is complete working WAR application with source included that might be
useful to build a real testcase from (or any test cases that only appear under
stress).
There is a web-app side, which you can either modify my DefaultServlet or add a
new servlet to provide additional code to help trigger the problem. It would
be worth file you building a simple interface into your Session Management
classes
There is a HTTP traffic generator based around commons-httpclient,
multi-threaded. You can theoretically extract the WAR onto a number of client
machines and/or use the same machine thats running Tomcat.
Invokation suggestion:
$ jar -xvf TCBug37356.war
$ cd WEB-INF
$ java -cp
classes:lib/commons-httpclient-3.0.1.jar:lib/commons-codec-1.3.jar:lib/commons-logging-1.0.4.jar:lib/log4j-1.2.13.jar
testpackage/TestClientTomcat
Places to edit:
testpackage/TestClientTomcat: Edit the defaults at the top of the class, for
URL, requests, simultaneous thread count.
TestClientTomcat#startNextThread(): just before the call to #sweepOnce() you
might need to modify requests, like every 1000 inject a HTTP request to poll
your Session manager (through the servlet) to look for problems and report.
TestClientRunnable#run(): ~around line 177~ Modify this to suit your HTTP
request setup and response parsing.
Maybe someone else could attach a cut down version of their session management
code for analying the state of TC from within.
Feel free to email me privately for any assistance with it.
(In reply to comment #42) > In conclusion, I believe that Matthias Ernst in comment #35 is exactly right, > either a sync must be done or Atomics must be used. Otherwise the session > management code as it exists now will be very broken on multiprocessor machines. Very funny. Adding two syncs part of every request just to provide support for a really stupid feature is not acceptable. The actual options are: - remove the "no expire while requests are in progress" feature by default, with an option to use the feature with full syncs for the two people that actually need it - force expire of abnormally long sessions regardless of the accessCount value (many times over the session expiration time, with a minimum amount of time of a few hours) There seems to be an argument about two separate things. The bug is that session management is broken. The poster wants it fixed and I think all the respondents want it fixed. Plain and simple. The second thing is that there is this feature to allow long requests with short session expirations, and it happens to be the thing that doesn't work and is breaking the session management. You have made it abundantly clear that you thing the feature is utterly idiotic. That's nice, I am glad you can share your emotions so freely, but as far as I am concerned, the feature is just breaking something which should fundamentally just work and it needs to be fixed. The bottom line is, I don't think myself or anyone else on this thread wants to add unnecessary syncs just to fix this problem. You keep responding as if that is what everyone wants. What people want is a fix. If the "no expiration while request is in progress" feature is an absolute must have, then you must sync. If it is not, then kill the feature or make it optional. I am not fighting for this feature. I just want session management to work! (In reply to comment #44) > (In reply to comment #42) > > In conclusion, I believe that Matthias Ernst in comment #35 is exactly right, > > either a sync must be done or Atomics must be used. Otherwise the session > > management code as it exists now will be very broken on multiprocessor machines. > > Very funny. Adding two syncs part of every request just to provide support for a > really stupid feature is not acceptable. The actual options are: > - remove the "no expire while requests are in progress" feature by default, with > an option to use the feature with full syncs for the two people that actually > need it > - force expire of abnormally long sessions regardless of the accessCount value > (many times over the session expiration time, with a minimum amount of time of a > few hours) > (In reply to comment #44) > Very funny. Adding two syncs part of every request just to provide support for a > really stupid feature is not acceptable. I don't know about calling it "really stupid". Non-standard and not well tested, certainly. Plus, since Sun points at Tomcat as the reference implementation of the Servlet spec, extensions like this should be eliminated as a matter of course. In my humble opinion, of course. =) > The actual options are: > - remove the "no expire while requests are in progress" feature by default, with > an option to use the feature with full syncs for the two people that actually > need it > - force expire of abnormally long sessions regardless of the accessCount value > (many times over the session expiration time, with a minimum amount of time of a > few hours) Both of those options are also "very funny". Please don't clutter up the code with yet another config option. Kill the feature and make folks who want it resort to a Filter or a Valve or something. The thought of adding code to look at sessions with nonzero accessCounts and arbitrarily killing off sessions after "many times over the expiration time" has passed ... yuck. A bug to fix a bug, now that's non acceptable. =P Remy: I see two possible options. 1) Remove the code to keep sessions alive if in the middle of downloading and past their timeout. While this is not standard--this very well may be needed in many customer environments, and changing this behavior could break applications that rely on this functionality. 2) Properly synchronize the broken code. I realize that you object to the performance overhead of a couple synchronize blocks. I'd like to reference the following pages: http://www-128.ibm.com/developerworks/java/library/j-jtp04223.html http://www-128.ibm.com/developerworks/java/library/j-threads1.html Both of these pages state that synchronization overhead is much less than most developers believe, and that modern JVM's are highly optimized to provide fast synchronization. Furthermore, synchronization is a fixed overhead and barely noticable unless tested with empty or mostly empty methods which are called repeatedly. Tomcat's request handling does not resemble anything close to an empty method, and most likely the overhead of synchronization would be negligable. The only way to know the overhead of adding two synchronized blocks for sure would be to add the relevant code and do performance testing. Perhaps if you agreed to consider a proposed patch to properly synchronize based on performance results, someone would be willing to give you the relevant fix. As things currently are, session expiration in Tomcat is broken on multi-processor systems. This is a SEVERE bug and worth a small amount of performance loss to fix. Furthermore, the synchronization should be performed on the session rather than globally. The vast majority of all requests would never contend for the synchronization lock in this case--further reducing the overhead even more. Only multiple requests on the same session would be serialized, and this would most likely not limit scalability of Tomcat. I personally would rather see the appropriate synchronization added and performance testing done to validate that the overhead is minimal. I believe that this solution would be better than possibly creating a regression in behavior for people who use Tomcat to perform long operations with a "short" timeout (for instance: hosting very large files to users on slow connections). Either way, the problem that needs to be fixed is session management. Having a broken session management system which does not properly invalidate connections is unacceptable. The workaround for terming very long term sessions also seems like a very bad idea to me. Reading and posting to bugzilla is really a waste of time. Oh well ... So I will implement option 1. To Remy: Please, don't do this, or provide an option to switch it on! Long running requests are common, like: 1) Downloading large files or items 2) Uploading large files The session MUST be synchronized, please think about multiple processor systems, where parallel requests will be executed within different threads! I can't see, why you are so ignorant, and why you are insisting on a real "stupid" (using our words) insynchronzed solution! Peter *** Bug 40305 has been marked as a duplicate of this bug. *** I have a simple test case that demonstrates this issue. I create two threads that repeatedly call session.access() session.endAccess() on the same session object. After running them for a five seconds, they are stopped and I manually check session.accessCount. With 1 thread it is zero. With two, it is typically in the low 1000s. This demonstrates conclusively that this is a thread-safety issue with session.accessCount Using volatile sends it through the roof to 100,000s. I am afraid, therefore, that this is not a viable option. Using syncs does fix it but slows the access() and endAccess() calls by several orders of magnitude (60ns total to 5.6ms total). This would have a serious performance impact. I am looking at alternatives. All suggestions welcome. (In reply to comment #51) > I have a simple test case that demonstrates this issue. I create two threads > that repeatedly call session.access() session.endAccess() on the same session > object. After running them for a five seconds, they are stopped and I manually > check session.accessCount. With 1 thread it is zero. With two, it is typically > in the low 1000s. This demonstrates conclusively that this is a thread-safety > issue with session.accessCount I'm unclear why there was a question in the first place. We have unsynchronized access to a variable that depends on sane values in a multi-threaded environment. Oddly enough, sessions in Tomcat (in general) appear to be impossible to make threadsafe. Since StandardSession is wrapped in a StandardSessionFacade before being exposed to servlet code, and the facade is not guaranteed to have a 1-to-1 mapping with the /real/ session, users cannot even synchronize on the "session" object they get back from calling HttpServletRequest.getSession(). I realize that I have brought up a different issue entirely, but the entire discussion above plus this fact lead me to the conclusion that the maintainers of Tomcat place a higher value on speed than thread safety. Which is worse... adding a few ns* to calls to a few methods in StandardSession, or have sessions piling up on top of one another until you run out of memory? (*) Note that in the test Mark describes, the object locks are under contention. He doesn't mention if his timings (sync'd vs. unsync'd) were in his single-thread or dual-thread runs. Uncontended locks are a lot faster than contended locks, so I would expect many sessions to continue with less overhead than those that are apparently getting many simultaneous accesses. For completeness, with two threads the average difference was 60ns to 50ms. For a single thread it was 75ns to 225ns. Bear in mind all of these figures are somewhat artificial. The real figure will vary with webapp, hardware etc. This has been fixed in SVN and will be in 5.5.21 onwards. In summary: - accessCount is disabled by default - when enabled, access to accessCount is synchronized Created attachment 19631 [details] A JSP That can be used to see the accessCount of a session For people having doubts and still investigating about this bug, I put in attachment a small JSP that you can put at the root of your web application to see if the bug is happening. You can log into your application and call this JSP (e.g. http://localhost:8080/myWebApp/testSession.jsp) that will display the accessCount of your session. Just do some refresh into your application ( on a real page ) and refresh the testSession.jsp. If you have no request pending, the displayed accessCount should be (1), if it's greater, you triggered the bug. On some of my applications, the accessCount takes 10 per each page, probably trigggered by numerous GET request made by the browser for getting CSS/Javascript files (even in pessimistic cache mode). if (Globals.STRICT_SERVLET_COMPLIANCE) { synchronized (lock) { accessCount++; } } In 5.5.25, it add some synchronized code for the accessCount when ++ and -- cases, but did not add synchronized code for reading operation in isValid() and recycle() method. public boolean isValid() { if (this.expiring) { return true; } if (!this.isValid ) { return false; } if (accessCount > 0) { return true; } if (maxInactiveInterval >= 0) { long timeNow = System.currentTimeMillis(); int timeIdle = (int) ((timeNow - thisAccessedTime) / 1000L); if (timeIdle >= maxInactiveInterval) { expire(true); } } return (this.isValid); } In tomcat 6.x, it use the AutomicInteger to fix this issue. Please check! As a minimum, you need to provide an explanation of why this needs further work and ideally a test case that demonstrates this issue. I don't see any immediate issues with the reads not being synchronised. |