I'm not quite sure if it's a bug or spec flaw, but I talked to Craig McClanahan and he encouraged me to submit it. The session attribute handling methods in 5.0.x aren't thread safe. The org.apache.catalina.session.StandardSession and StadardSessionFacade do not synchronize in get/set/remove Attribute. The result is following: If you write and read from the session simultaneously with multiple threads the getter/setter methods corrupt the underlying HashMap. The HashMap's entries got circularly linked, and any thread using a get() on such a HashMap will spin forever chasing its tail (quoted from Larry Isaacs). Now what Josh Bloch and Co. are saying in the javadoc for HashMap: * <p><b>Note that this implementation is not synchronized.</b> If multiple * threads access this map concurrently, and at least one of the threads * modifies the map structurally, it <i>must</i> be synchronized externally. * (A structural modification is any operation that adds or deletes one or * more mappings; merely changing the value associated with a key that an * instance already contains is not a structural modification.) This is * typically accomplished by synchronizing on some object that naturally * encapsulates the map. If no such object exists, the map should be * "wrapped" using the <tt>Collections.synchronizedMap</tt> method. This is * best done at creation time, to prevent accidental unsynchronized access to * the map: <pre> Map m = Collections.synchronizedMap(new HashMap(...)); * </pre> The bug is quite easy to fix, by making the map synchronized (as stated above) or explicitely synchronize the places where HashMap.get() or put() is called. I could provide a patch if wished.
And to prevent the REOPEN ... SRV.7.7.1 Threading Issues Multiple servlets executing request threads may have active access to a single session object at the same time. *The Developer has the responsibility for synchronizing access to session resources as appropriate.*
Sorry, but this means that the develop is responsible for the object he puts in session, not for the session itself. To quote Graig: the underlying spec language (SRV.7.7.1) was originally intended to remind application users that they must make their *own* objects threadsafe if they are stored as a session attribute. To use that language as an excuse for the *container* not having to make its own collections threadsafe means that the language is broken. So, reopen it now?
(In reply to comment #2) > Sorry, but this means that the develop is responsible for the object he puts in > session, not for the session itself. > To quote Graig: > the underlying spec language (SRV.7.7.1) was originally intended to remind > application users that they must make their *own* objects threadsafe if they are > stored as a session attribute. To use that language as an excuse for the > *container* not having to make its own collections threadsafe means that the > language is broken. > So, reopen it now? Don't bother ;)
(In reply to comment #2) Heheh ... I think you answered the question for yourself ... fix the spec first, then the container implementations second.
(In reply to comment #4) > (In reply to comment #2) > > Heheh ... I think you answered the question for yourself ... fix the spec > first, then the container implementations second. Interesting, servlet spec 2.3 has exact the same wording of SRV.7.7.1 Threading Issues and tomcat 4.1.31, which is the official implementation of the 2.3 spec, has a SYNCHRONIZED session. So tomcat 4.x is buggy, or tomcat 5.x is. You say it's the 4.x, I say it's 5.x.
(In reply to comment #5) Actually I'd say neither is buggy, since they both implement the spec as it's written. If that's not what was intended, then as you quoted Craig saying: "the language is broken", and the spec needs to be changed.
(In reply to comment #6) > (In reply to comment #5) > > Actually I'd say neither is buggy, since they both implement the spec as it's > written. If that's not what was intended, then as you quoted Craig saying: "the > language is broken", and the spec needs to be changed. I would agree that neither implementation is buggy -- it is entirely legal for a servlet container to avoid letting an application corrupt its interna data structures. It's too bad that the current Tomcat developers care more about performance than they care about reliability. If you aren't going to change it back to the 4.1 implementation (with synchronization locks around the accesses), please take my name out of the @author tag for org.apache.catalina.session.StandardSession -- this code does *not* represent anything I wish to be associated with.
(In reply to comment #7) > I would agree that neither implementation is buggy -- it is entirely legal for a > servlet container to avoid letting an application corrupt its interna data > structures. It's too bad that the current Tomcat developers care more about > performance than they care about reliability. Of course. Thankfully, you guys have glassfish now for good reliability, so you don't have to deal with these loony Tomcat developers anymore ;) > If you aren't going to change it back to the 4.1 implementation (with > synchronization locks around the accesses), please take my name out of the > @author tag for org.apache.catalina.session.StandardSession -- this code does > *not* represent anything I wish to be associated with. Sure, I have no problem with that if it's your wish. These changes were made as an experiment (the spec allows this to be non-synced), and included in 5.0.19+ and as a consequence in all the most popular 5.0.x releases. In the end, it would seem it worked reasonably well. However, I did get a few reports of corruption a while after this, and I added back synchrnozation on write operations to the map in the 5.5 branch. I never ported this back to 5.0.x given the lack of demand. Apparently, you didn't look at it. All that got discussed in the past, including the readding of some of the syncs.
After we finish clarifying who's is bigger (and mine is best case average), we could return to the bug. 1. The bug isn't invalid, so I reopen it. You can set it to WONTFIX, but not to INVALID. 2. Applications working well with tomcat 4, resin or probably any other servlet container do not work with tomcat 5.0.x or tomcat 5.5.x. Thus tomcat 5 seems to be not compatible to the servlet spec (or be the only one compatible), if not in the language of the specification itself, but in the intension of it. So tomcat 5 can be considered BROKEN. 3. The bug applies to tomcat 5.5.x as well, because "... added back synchrnozation on write operations to the map in the 5.5 branch... " doesn't fix anything, since the read operation are the problem and MUST be synchronized. 4. The effort to provide a workaround for this problem is enormous compared to the effort to fix the bug. Each and every framework around there now MUST provide two versions, the normal version and the tomcat 5 compatible version. You can't use struts, tapestry, jstl, actually NOTHING existing with tomcat 5. 5. The idea of removing synchronization to gain performance is absurd. Who needs the performance? People like us, who have 2000-3000 concurrent users on each webserver. But people who have 2000-3000 concurrent users, have also many concurrent requests, even from the same user, so instead of gaining performance we are gaining CRASHES. This bug killed 8 (!!!) webservers yesterday. 6. Consider the flurry in the tomcat users community, if the above points + your refusal to provide a three LOCs fix gonna make tomcat 5 UNUSEABLE.
Look, the basic problem is that the underlying implementation used to support attribute holders is being used in error according to it's documentation (java.util.HashMap documentation). It doesn't just leave the data in an inconstistent state, but rather crashes systems. If the developer has to synchronize access to the session or application or context every where attributes are being set then why not change the base application (Tomcat) to do this already....basically the application, context, and session are useless if nothing can be stored in them. So, they have to be synchronized anyways, so this should be done at the Tomcat level. 5.5.x and 5.0.x really need to be fixed because it doesn't make sense not to. Basically I'm getting this from the conversation (and anyone else should be).....the spec says it's up to the developer to synchronize........the developer has to synchronize every where the attributes are set and get.....so what is the difference if Tomcat does it or the developer does it? Plus, does the spec also say it's up to the developer to synchronize request, context, and session attributes? It's been a while since I have read the servlet spec, but I have the 2.3 version on disk and will skim over it again, but even if the spec says it....it doesn't make sense not to synchronize code that has to be synchronized one way or another anyways.
Something I'm curious about. If EL is used on in a JSP and a session object accessed and manipulated is this synchronized? What package should I look in for that to check?
So I verified that at least the 5.5.9 code only synchronized the setAttribute and not also the getAttribute....why?
(In reply to comment #12) > So I verified that at least the 5.5.9 code only synchronized the setAttribute > and not also the getAttribute....why? to quote Remy Maucherat: it was an experiment and he added back synchrnozation on write operations to the map in the 5.5 branch... Actually it doesn't make any sense because you need both get and set to be synchronized, to have the desired effect, so I think we should consider this a part of the "experiment".
Well at least ApplicationContext.java and ApplicationRequest.java are synchronizing their attributes in both cases put/get. So, that is one thing not to worry about, but there are other issues, and I'm betting the EL accessing code isn't synchronizing because looking at the code for jasper2 and looking at the runtime file PageContextImpl.java in the scoped getAttribute which calls the scoped doGetAttribute it does not synchronize the code either. So, basically what that says is if a JSP developer were to use jsp:useBean and creates a session bean then there are going to be problems because as soon as this happens an unsynchronized call is taking place. Thing that makes this debate over whether to change it or not silly is that in the scoped calls the application and request code is already synchronized at the class level so synchronizing only on the session at this point seems weird. Regardless....if this tiny three line change doesn't take place then there are more lines in Tomcat that apparently need to be changed, because this bug causes other issues.
Why not keep everybody happy, simply make the session management class a configurable option at both Server and Context level, if the TC developers wish to "experiment" they can configure the un-synchronized access they so desire.
I strongly agree with Craig, et al to error on the side of a more robust implemention by using synchronization on the side of Tomcat. I think doing otherwise would be doing a high-wire act without a net. There are too many places for a developer to miss handling the threading correctly to leave this as is.
I've been following this with a lot of interest ever since Leon raised the issue yesterday, and I discussed it with some folks at work today a bit. Looking at this strictly from a developers' point of view, I could care less what the specs says, and I could care less why the Tomcat code is the way it is. The bottom-line is that if I call session.getAttribute() or setAttribute () in a servlet or Struts Action or whatnot, I do not expect there to be any chance of the internal data structures getting corrupted, or me getting back inconsistent data, and I most definitely do not expect there to be any chance of a server hang or crash, and I do not expect to have to do anything myself to ensure any of this. Any other answer is, to me and to those I spoke to, just plain wrong. I am in absolute agreement with those saying this needs to be fixed. I do not recall ever having been bitten by this problem, but it's just subtle enough that I might not have known if I did. I don't think anyone is looking to place blame here. I don't care what was originally in the Tomcat code and what is there now or why it was changed. This simply comes down to a legitimate issue that needs to be resolved. It's not even a minor issue frankly, it's a pretty substantial one, regardless of the fact that it apparently hasn't caused huge problems for everyone. If the spec needs to be fixed, no problem, contact who needs to be contacted and let them know. But that DOES NOT mean you serialize and wait for them to do their thing. There is a solution here that, to my understanding, isn't contrary to the spec as it exists today anyway, so not following through with it is kind of silly. I'm sure any number of people would be willing to submit a patch for this if it's an issue of not having time, but to be arguing about whether it should be fixed or not doesn't seem to be reasonable on this one.
(In reply to comment #17) > I don't think anyone is looking to place blame here. I don't care what was > originally in the Tomcat code and what is there now or why it was changed. > This simply comes down to a legitimate issue that needs to be resolved. It's > not even a minor issue frankly, it's a pretty substantial one, regardless of > the fact that it apparently hasn't caused huge problems for everyone. I'm not even sure if I've been bitten by this either, but I have seen on the list numerous people speaking of running out of Tomcat threads and setting their connections to the max. If this issue were causing problems they might be having it and not even realize it. > If the spec needs to be fixed, no problem, contact who needs to be contacted > and let them know. But that DOES NOT mean you serialize and wait for them to > do their thing. There is a solution here that, to my understanding, isn't > contrary to the spec as it exists today anyway, so not following through with > it is kind of silly. I'm sure any number of people would be willing to submit > a patch for this if it's an issue of not having time, but to be arguing about > whether it should be fixed or not doesn't seem to be reasonable on this one. I agree....synchronizing these calls isn't going to be contrary to the spec in no way at all.
I wonder if the new java.util.concurrent classes could be used instead of simple HashMap? http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html but that would mean total dependence on j2se 1.5 and that would be a problem for supporting j2se 1.4, though a backport is being worked on here http://www.mathcs.emory.edu/dcl/util/backport-util-concurrent/ other reading here: http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html I think that sort of thing would provide a nice solution for speed + reliability. Using this as the underlaying base would/should fix issues with EL access too.
As discussed on tomcat-dev after I reexamined StandardSession code further, adequate synchrnoization seems to be in place in the current Tomcat 5.5 code so that the infinite loop situation described in this issue does not occur. Please do not reopen the report.
(In reply to comment #15) > Why not keep everybody happy, simply make the session management class a > configurable option at both Server and Context level, if the TC developers wish > to "experiment" they can configure the un-synchronized access they so desire. > Actually, the session management code *is* already configurable, albeit not trivially. You can include a <Manager> element inside your <Context> element and create a custom implementation of the Manager interface that returns session instances (probably subclassed from StandardSession) that do the locking for you. The key question remains what the default behavior should be, and/or whether there should be an easy boolean setting to turn this particular capability on or off.
(In reply to comment #8) > (In reply to comment #7) > > I would agree that neither implementation is buggy -- it is entirely legal for a > > servlet container to avoid letting an application corrupt its interna data > > structures. It's too bad that the current Tomcat developers care more about > > performance than they care about reliability. > > Of course. Thankfully, you guys have glassfish now for good reliability, so you > don't have to deal with these loony Tomcat developers anymore ;) > Apparently it is indeed a good thing :-). The appropriate patch was just committed to the Glassfish repository. If people want an open source implementation of the servlet spec where the developers listen to users on issues like this, you might want to browse over to: https://glassfish.dev.java.net and take a look. > > If you aren't going to change it back to the 4.1 implementation (with > > synchronization locks around the accesses), please take my name out of the > > @author tag for org.apache.catalina.session.StandardSession -- this code does > > *not* represent anything I wish to be associated with. > > Sure, I have no problem with that if it's your wish. > This (removing my name from the @author tag on StandardSession), both here and everywhere else in the Tomcat code base, would indeed be my wish.
(In reply to comment #20) > As discussed on tomcat-dev after I reexamined StandardSession code further, > adequate synchrnoization seems to be in place in the current Tomcat 5.5 code so > that the infinite loop situation described in this issue does not occur. Please > do not reopen the report. Remy, sorry, but you are wrong. The read is the problem, not the write. During the write process the hashmap is in flux, so if a read occurs at this time and is not synchronized the thread which reads is killed. So 5.5.x is as broken as 5.0.x is
Hmm, I also thing the spec interpretation from the current StandardSession inside Tomcat 5.0 and 5.5. is wrong. We had the same problem at Cluster DeltaSession for year ago. We now sync also the session attribute read operations. Without this fix the Cluster setup is useless for production servers. I vote for change the StandardSession implementation to.. public Object getAttribute(String name) { if (!isValid()) throw new IllegalStateException(sm .getString("standardSession.getAttribute.ise")); synchronized (attributes) { return (attributes.get(name)); } } public Enumeration getAttributeNames() { if (!isValid()) throw new IllegalStateException(sm .getString("standardSession.getAttributeNames.ise")); synchronized (attributes) { return (new Enumerator(attributes.keySet(), true)); } } === Peter
Created attachment 16339 [details] compiled fix for 5.0.25 a compiled patch for 5.0.19+
Created attachment 16340 [details] patch for 5.0.19+, source code patch for 5.0.19+, source code
I submitted a patch for 5.0.19+. To install it: jar -xf catalina.jar replace StandardSession.class with one attached. make a new jar. Alternatively you can rebuild complete tomcat, so i supplied the source code too :-) It would be cool to provide a download location for the patched catalina.jar, i don't think submitting a 700K file into bugzilla is the correct behaviour, so I'm not doing it.
(In reply to comment #24) > Hmm, I also thing the spec interpretation from the current > StandardSession inside Tomcat 5.0 and 5.5. is wrong. We had the same problem > at Cluster DeltaSession for year ago. We now sync also the session attribute > read operations. Without this fix the Cluster setup is useless for > production servers. > > I vote for change the StandardSession implementation to.. Ok. I don't know yet, since the repository is inconclusive, and doesn't match what you wrote. I see DeltaSession (introduced first along with 5.0.15+) has always had (from revision 1.1) syncs on everything (read/writes), so I don't see any conclusive information showing that the current 5.5.x code does produce the bug described here. Obviously, if 5.5.x really is still bad for this issue, it'll have to be fixed. (In reply to comment #23) > Remy, sorry, but you are wrong. The read is the problem, not the write. During > the write process the hashmap is in flux, so if a read occurs at this time and > is not synchronized the thread which reads is killed. > So 5.5.x is as broken as 5.0.x is This means you've tested with 5.5.x, and reproduced an issue ? Or written a microbenchmark showing a problem with reads ? All I can see is that you're saying "issue on read, 5.5.x same problem" like a broken record. (In reply to comment #22) > Apparently it is indeed a good thing :-). The appropriate patch was just > committed to the Glassfish repository. If people want an open source > implementation of the servlet spec where the developers listen to users on > issues like this, you might want to browse over to: > > https://glassfish.dev.java.net > > and take a look. Lol, whatever. The web tier of Glassfish is really an ugly behind-the-back fork. All that was required to avoid the ill feelings is a bit of respect, being informed just a little bit, which would have allowed reasonable planning for development resources. I'm glad you're happy about Glassfish, but personally, I'll use it in the long run to point out there are problems with large companies like Sun and IBM, their interactions with the ASF, and how they should not be trusted (as in, accept whatever they contribute, but there's no need for being thankful for it - and obviously, don't install them as "despot" on a project ever, but at the ASF, it's a bit hard to get into this situation). > This (removing my name from the @author tag on StandardSession), both here and > everywhere else in the Tomcat code base, would indeed be my wish. I wasn't really serious. In the end, it's not really my decision, so you should send a request to the pmc once it is correctly setup.
It should be posssible to install such patches by simply place the class file at the correct position. In this case it would be jakarta-tomcat-5.0.28/server/classes/org/apache/catalina/session
(In reply to comment #29) > It should be posssible to install such patches by simply place the class file at > the correct position. > > In this case it would be > > jakarta-tomcat-5.0.28/server/classes/org/apache/catalina/session Yes, the classloader setup has been designed to easily allow such patching, but the folder structure can get a bit messy (you got it right, though).
(In reply to comment #28) > This means you've tested with 5.5.x, and reproduced an issue ? Or written a > microbenchmark showing a problem with reads ? All I can see is that you're > saying "issue on read, 5.5.x same problem" like a broken record. No this means that the statement of the HashMap authors: * <p><b>Note that this implementation is not synchronized.</b> If multiple * threads access this map concurrently, and at least one of the threads * modifies the map structurally, it <i>must</i> be synchronized externally. also applies to x Reader, one Writer, so synchronizing "writes only" is a misusage of the HashMap according to the documentation. On the other hand, I made some measures and calculated how much performance you gain by removing synchronization from get/set methods. I compared the performance of synchronized hashmap access against not synchronized (singlethreaded, pIV 2.8 Ghz, HT) and calculated that you gain 230 milliseconds on 3,000,000 operations! That is 0.00008 milliseconds per operation. Even you would have 100 accesses to the session from 100 parallel threads, it would cost you additional 8 milliseconds. According to alexa.com we have an average response time of 0.8 seconds (for the user) and are faster then 80% of the net (google's average is 0.7). I don't know how many sites are faster, lets assume the fastest are at about 0.5 seconds (and sites making 100 session accesses in one request surely do not belong in this category). So if your average request duration is 500 millis, how important is it for you to gain 0.00008 milliseconds, or even 0.8 milliseconds?
(In reply to comment #31) > No this means that the statement of the HashMap authors: > * <p><b>Note that this implementation is not synchronized.</b> If multiple > * threads access this map concurrently, and at least one of the threads > * modifies the map structurally, it <i>must</i> be synchronized externally. > also applies to x Reader, one Writer, so synchronizing "writes only" is a > misusage of the HashMap according to the documentation. It's cool, but I have not asked for the nth lecture of this portion of the javadoc. How about trying to answer my question ? It doesn't seem that hard. > On the other hand, I made some measures and calculated how much performance you > gain by removing synchronization from get/set methods. > I compared the performance of synchronized hashmap access against not > synchronized (singlethreaded, pIV 2.8 Ghz, HT) and calculated that you gain 230 > milliseconds on 3,000,000 operations! > That is 0.00008 milliseconds per operation. Even you would have 100 accesses to > the session from 100 parallel threads, it would cost you additional 8 milliseconds. > According to alexa.com we have an average response time of 0.8 seconds (for the > user) and are faster then 80% of the net (google's average is 0.7). I don't know > how many sites are faster, lets assume the fastest are at about 0.5 seconds (and > sites making 100 session accesses in one request surely do not belong in this > category). So if your average request duration is 500 millis, how important is > it for you to gain 0.00008 milliseconds, or even 0.8 milliseconds? In a microbenchmark, the JIT could be playing tricks on you, so I don't know ;) Obviously, one single read by itself is not going to cost much. Now multiply that by the number of reads you could be making during a single request, and also imagine what it could be if useless syncs were added in plenty other places inside the container "just to be safe". Syncs should be added wherever needed, but not more than needed. If you like microbenchmarks, you could compare (let's say with 1/3 writes, 2/3 reads): HashMap without sync, HashMap with syncs on writes, Hashtable, ConcurrentHashMap. I think there could be some more tuning being done for the attributes map (like setting a good initial size). Besides, this is a bit OT, and doesn't answer my question. I have just looked at two other popular servers source code, and some don't do any syncing for this, like Tomcat 5.0.x does. Overall, it means it's not portable, and the webapp really should plan on syncing on the session externally wherever needed. What I am willing to provide (this is the intent of the code in 5.5.x right now), by default, is making sure the HashMap cannot get corrupted, and that the infinite loop described in this report cannot occur.
> Overall, it means it's not > portable, and the webapp really should plan on syncing on the session externally > wherever needed. If you use different techniques this might become pain. It might be hard to tell everyone to synchronize against session, and in the end you have the same as synchronize in the base class. Well not really the same as tomcat can synchronize against the map, we have to synchronize on a wider context - the session. > What I am willing to provide (this is the intent of the code in > 5.5.x right now), by default, is making sure the HashMap cannot get corrupted, > and that the infinite loop described in this report cannot occur. I wonder how this can be done? You might have to introduce your own map implementation, no? Is it possible to create a thread-safe hash-map without synchronization? Ok, if two threads put in an element with the same key it might not be deterministic which of both are really set then, but this is not the problem we have to solve.
(In reply to comment #33) > > Overall, it means it's not > > portable, and the webapp really should plan on syncing on the session externally > > wherever needed. > If you use different techniques this might become pain. > It might be hard to tell everyone to synchronize against session, and in the end > you have the same as synchronize in the base class. Well not really the same as > tomcat can synchronize against the map, we have to synchronize on a wider > context - the session. You indeed have to sync on a wider context in many cases. > > What I am willing to provide (this is the intent of the code in > > 5.5.x right now), by default, is making sure the HashMap cannot get corrupted, > > and that the infinite loop described in this report cannot occur. > I wonder how this can be done? > You might have to introduce your own map implementation, no? No, because at this point I believe making sure that the HashMap does not get corrupted (using syncs on put and remove) is enough to guarantee that the get method doesn't enter an infinite loop (by returning null, or the result when there's a problem - it will be unpredictable, but seems to me equivalent to the higher level concurrency issues if you mix and match reads/writes in the webapp for critical data without being careful). Other than this, it doesn't look that the collection being used with its default parameters is that optimal. > Is it possible to create a thread-safe hash-map without synchronization? > Ok, if two threads put in an element with the same key it might not be > deterministic which of both are really set then, but this is not the problem we > have to solve. Maybe using a fully array based structure rather than a linked list would make it more robust (ie, reads could be bad, writes could be lost, but the structure would remain consistent). There is stuff like ConcurrentHashMap, but it of course does synchronization of its own. I did agree previously (hence the current code in the 5.5 branch) that robustness is good, and that the HashMap structure should be protected, as there's no way to restore it, so there are syncs for put and remove. Since there's a real demand and you guys are quite persistent, I now agree on adding an extra configuration parameter on the context allowing to define the sweet spot for the collection size, as well as its synchronization policy (the default being the current 5.5 behavior, unless/until it is shown to still be able to cause the inifinite loop originally described in the report).
How do we approach the specifications committee for official clarification of this point. Who is going to take charge to actively do that? Here is hoping to provoke all vendors to take a look at the issue and report back to the committee foir a verdict in the coming months.
(In reply to comment #34) > No, because at this point I believe making sure that the HashMap does not get > corrupted (using syncs on put and remove) is enough to guarantee that the get > method doesn't enter an infinite loop (by returning null, or the result when > there's a problem - it will be unpredictable, but seems to me equivalent to the > higher level concurrency issues if you mix and match reads/writes in the webapp > for critical data without being careful). I understand the point that others are making for all access needing the same sychronization. I don't understand your logic that read/get don't need syncronization. The issue with the infinite loop read stems from the fact that two different threads access the map at the same time. One for read and one for write, while the writer is modifying the map it corrupted the pointers the reader is using to traverse the data structures and thus enters an infinite loop (due to the HashMap design, other collection classes can detect some situations and throw an exception for ConcurrentAccessException). This is because there is no synchrnozation between get/put operations. Only between puts. As I pointed out in the TC user mailing list, it would be possible to use a ReadWriteLock to allow threading of multiple readers to take place. But while there is one writer working on the Map you need to be sure no reader is using the map too. Introducing the ReadWriteLock might introduce an unwanted JDK5 dependancy, I think its a new JDK5 concurrency class ? It might also be slower than a regular synchronized lock. But without benchmarks we wont know.
(In reply to comment #36) > One for read and one for write, while > the writer is modifying the map it corrupted the pointers the reader is using to > traverse the data structures and thus enters an infinite loop Sorry to reply to my own post, but my use of the word "corrupted" is a bad choice. In the normal operation of a write modification to the map the internal data structures are altered into a temporary inconsitant state. This inconsitancy is part of the normal working of the write operation. When the write operation returns to the application the map integrity is always consistant. The basic contract is true of all object design, unless otherwise stated to be thread safe. Which we all agree HashMap is not. If the read operations happens to bump into this moment of temporary inconsistacy the infinite loop can occur.
(In reply to comment #37) > If the read operations happens to bump into this moment of temporary > inconsistacy the infinite loop can occur. All the entry objects will be mutated. While it may be inconsistent and might loop for an instant (although I am not convinced this could really be the case; I think the trouble without any sync could only occur if there was more than one concurrent unsynced write, and in particular, two "remove"), the pointer value will be corrected and the loop should exit. That's my interpretation looking at the code. I think I'll write a small program to test this.
(In reply to comment #38) > All the entry objects will be mutated. While it may be inconsistent and might > loop for an instant (although I am not convinced this could really be the case; > I think the trouble without any sync could only occur if there was more than one > concurrent unsynced write, and in particular, two "remove"), the pointer value > will be corrected and the loop should exit. That's my interpretation looking at > the code. I think I'll write a small program to test this. > Let me ask you another question then. Where is the written specification for the HashMap that states your usage is safe ? It sounds like you as working on the presumption that all implementation's won't cause an infinite loop (or set fire to the computer) but you dont have any API contract to back that presumption up. I read the specification to state that some put/remove operations (that modify the map structurally are explicitly not threadsafe) You can't call threadsafe and non-threadsafe calls to an API at the same time. The threadsafe calls are only threadsafe with respect to other threadsafe calls on the same API. My understanding of this: You can call threadsafe API calls at the same time. Anytime you want to call a non-threadsafe one you have to serialize it with respect to the API Interface not with respect to itself or other similar operations (unless otherwise stated).
(In reply to comment #39) > Let me ask you another question then. Where is the written specification for > the HashMap that states your usage is safe ? It sounds like you as working on > the presumption that all implementation's won't cause an infinite loop (or set > fire to the computer) but you dont have any API contract to back that > presumption up. > > I read the specification to state that some put/remove operations (that modify > the map structurally are explicitly not threadsafe) > > You can't call threadsafe and non-threadsafe calls to an API at the same time. > The threadsafe calls are only threadsafe with respect to other threadsafe calls > on the same API. > > My understanding of this: > > You can call threadsafe API calls at the same time. > > Anytime you want to call a non-threadsafe one you have to serialize it with > respect to the API Interface not with respect to itself or other similar > operations (unless otherwise stated). I guess if it goes back again to lawyerspeak level rather than logic, then there's nothing to talk about. It is all related to reasonable reliability and robustness, and I don't believe the algorithm of a hashmap can become that weird (the Sun structure is already not particularly safe). I mean, there could even be bugs in the collection implementation too. I'd like to remind you once more that this synchronization in the container is not mandatory from what I can see, and at least one other popular container apparently behaves like Tomcat 5.0. It's the end of this discussion thread as far as I am concerned :) I'll also add a way to configure size and sync of the collection (assuming I confirm behavior to be acceptable using a test program).
(In reply to comment #38) > (In reply to comment #37) > > If the read operations happens to bump into this moment of temporary > > inconsistacy the infinite loop can occur. > > All the entry objects will be mutated. While it may be inconsistent and might > loop for an instant (although I am not convinced this could really be the case; > I think the trouble without any sync could only occur if there was more than one > concurrent unsynced write, and in particular, two "remove"), the pointer value > will be corrected and the loop should exit. That's my interpretation looking at > the code. I think I'll write a small program to test this. > You can certinaly get null returned when a valid value was added to the Map previosly as well because of the inconsistency. The hash index used to locate the first Entry is based on the hash and the length of the table (I guess we should say the chosen bucket). If the value has been added to the map. Then a call for a get occurs and a call for a write and the write makes a resize and the resize changes the location of the bucket then e could match another Entry besides the one for the correct hash then you start getting into some looping. In a test I had a get interation of 4 times occur when this happened, but considering all of the possible shifting that can occur I'm not convinced that an infinite loop could not occur. Given the bucket index algorithm and all of the possible values for a hash and a table size I don't think one can safely say it's not possible without synchronizing. If there is a way to determine that the behavior of one search for a bucket with any hash is always going to produce the same result for any size table then I think you can safely not synchronize the code.(In reply to comment #38) > (In reply to comment #37) > > If the read operations happens to bump into this moment of temporary > > inconsistacy the infinite loop can occur. > > All the entry objects will be mutated. While it may be inconsistent and might > loop for an instant (although I am not convinced this could really be the case; > I think the trouble without any sync could only occur if there was more than one > concurrent unsynced write, and in particular, two "remove"), the pointer value > will be corrected and the loop should exit. That's my interpretation looking at > the code. I think I'll write a small program to test this. > You can certinaly get null returned when a valid value was added to the Map previosly as well because of the inconsistency. The hash index used to locate the first Entry is based on the hash and the length of the table (I guess we should say the chosen bucket). If the value has been added to the map. Then a call for a get occurs and a call for a write and the write makes a resize and the resize changes the location of the bucket then e could match another Entry besides the one for the correct hash then you start getting into some looping. In a test I had a get interation of 4 times occur when this happened, but considering all of the possible shifting that can occur I'm not convinced that an infinite loop could not occur. Given the bucket index algorithm and all of the possible values for a hash and a table size I don't think one can safely say it's not possible without synchronizing. If there is a way to determine that the behavior of one search for a bucket with any hash is always going to produce the same result for any size table then I think you can safely not synchronize the code.
(In reply to comment #40) > > I read the specification to state that some put/remove operations (that modify > > the map structurally are explicitly not threadsafe) > > > > You can't call threadsafe and non-threadsafe calls to an API at the same time. > > The threadsafe calls are only threadsafe with respect to other threadsafe calls > > on the same API. There is nothing weird about anything. Its basic computer programming design. If you write a program what uses another API and you make presumtions about how you can use it then you will come unstuck, not matter what any servlet specification says. If however you base it on whats is written into the contract of what the API presents to your application then you can legitimatly point the finger at that API when you find a problem. Maybe you should write your own Map interface to get the bahaviour you want, I think a generic ReadWriteLock protected Map interface would be a good thing for the wider Java community to have access to. Your approach for writing a test case to prove the problem is valid, because its a concurrency rare case, maybe if you sprinckled Thread.sleep() throughout the HashMap code to slow it down the you might be able to make the problem 100% reproducable. But I don't believe Sun's implementation needs to be fixed, as this problem does not seem to contradict the written contract the HashMap API presents. > I'd like to remind you once more that this synchronization in the container is > not mandatory from what I can see And I'm not disputing you. I'm disputing the more fundimental usage problem. There is no way for a wep-app to override the Session object in a portable way across containers. > I'll also add a way to configure size and sync of the collection (assuming I > confirm behavior to be acceptable using a test program). This would be welcomed by me.
(In reply to comment #38) > I think the trouble without any sync could only occur if there was more than one > concurrent unsynced write, and in particular, two "remove"), the pointer value > will be corrected and the loop should exit. That's my interpretation looking at If I understand this statement correctly - You are saying that syncing the put and remove should be enough to stop the infinite loop, and that this is why it was "FIXED" in 5.5. If an example could be provided that an unsynced get can lead to an infinite loop, would this change anything concerning the status of this problem? A second question following on from this: When would anyone want to use a "threadUnsafe" session set/getAttribute? As we have seen, NOT syncing this causes hangs - so therefore developer needs to do this (if he uses these methods). Wouldn't it make more sense to just sync it further down in tomcat, as suggested here, rather than needing to sync the whole method every single time you use this in your code?
I guess I should add that it seems logical if one web page calls Session.setAttribute then it shouldn't be possible later to make a call to getAttribute and it not be available. i.e. a minute earlier I made a call to set then a minute later during my get a resize occurs and I get a null even though I should not get a null. This could occur because of the bucket distribution on certain hash and table size combinations vary. I had one hash and table combination yield a 9 index. Later when the resize occured this index was moved to 3, and the resize before that it was moved to 11. So, start at 9, resize, then to 11, resize, then to 3. So you can imagine what would happen if during the time I get the hash from indexFor and then access the array what I might get....null. This all in HashMap.get
(In reply to comment #44) > I guess I should add that it seems logical if one web page calls > Session.setAttribute then it shouldn't be possible later to make a call to > getAttribute and it not be available. i.e. a minute earlier I made a call to > set then a minute later during my get a resize occurs and I get a null even > though I should not get a null. This could occur because of the bucket > distribution on certain hash and table size combinations vary. I had one hash > and table combination yield a 9 index. Later when the resize occured this index > was moved to 3, and the resize before that it was moved to 11. So, start at 9, > resize, then to 11, resize, then to 3. So you can imagine what would happen if > during the time I get the hash from indexFor and then access the array what I > might get....null. This all in HashMap.get This even without speaking of the infinite loop issue.
(In reply to comment #43) > When would anyone want to use a "threadUnsafe" session set/getAttribute? > As we have seen, NOT syncing this causes hangs - so therefore developer > needs to do this (if he uses these methods). Wouldn't it make more sense to > just sync it further down in tomcat, as suggested here, rather than needing > to sync the whole method every single time you use this in your code? One possible case is for a really simple dumb cache: where you expect the result to be present in the session on every time except the first (eg where the read:write ratio is 10000:1). In such a case, if the get is unsynchronized, it's safe to say: Object cached = session.get(key); if (cached == null) { synchronized (session) { cached = session.get(key); if (cached == null) { cached = "blah"; session.put(key, cached); } } } I understand there is potential waste doing two gets (one in and one out of synchronized block), but this only gets executed on the first hit, so decreases in significance as the read:write ratio goes up. Anyway - that's one case I've found where an un-threadsafe get helps performance wise, especially on lower spec machines short on memory (only through subjective observations though - don't have any numbers).
I guess the point of the above would be that desired synchronization behaviour is usage-dependent, so giving the developer the freedom/responsibility to decide it is not necessarily a bad thing. Personally my reading of the spec was that this was the reason it was left to the developer: because only the developer has a good idea what the desired sync behaviour is.
(In reply to comment #47) > I guess the point of the above would be that desired synchronization behaviour > is usage-dependent, so giving the developer the freedom/responsibility to > decide it is not necessarily a bad thing. Personally my reading of the spec > was that this was the reason it was left to the developer: because only the > developer has a good idea what the desired sync behaviour is. > It would be fine if all the other places in Tomcat were fixed to allow you to do that. Point in case jsp:useBean with scope="session" can't be synchronized..not that I know of without changing the TC5.0 and 5.5 code. Also the JSTL libraries, and many other jakarta libraries.
(In reply to comment #46) > > One possible case is for a really simple dumb cache: where you expect the > result to be present in the session on every time except the first (eg where > the read:write ratio is 10000:1). In such a case, if the get is > unsynchronized, it's safe to say: I understand, but how can you guarantee that the 'second' request does not overtake the first put request? Not very likely, agreed - but still possible. Users have a great way at pressing F5 very quickly!
(In reply to comment #49) > I understand, but how can you guarantee that the 'second' request does not > overtake the first put request? Not very likely, agreed - but still possible. > Users have a great way at pressing F5 very quickly! That was the purpose of the second get check inside the sync block. The second request will, if the cache is null, block on the sync'd session, and only proceed into the sync block after the put. The first thing it does is check again, which returns non null and it continues happily.
(In reply to comment #50) > That was the purpose of the second get check inside the sync block. The second > request will, if the cache is null, block on the sync'd session, and only > proceed into the sync block after the put. The first thing it does is check > again, which returns non null and it continues happily. The situation can occur IF the second request overtakes the first that Object cached = session.get(key) will simply not come back and return null, because it hangs in the hashmap inside the get and so your check for null will never be called
Bear in mind if you are using clustering, you have to put the modified session attribute after you have finished modifying the object and wish the new state to persist for the next request, as clutering replication only takes place at setAttribute() time. So the 1000:1 may only be realistic for those users. Also with the other issue of getting a null back when you expected to see an object. Session data does not stay around forever, it expires so all web-apps must deal with the no session situation. So I'd agree that so long as your servlet still holds the exact Session object instance you did a setAttribute() on you can reasonable expect a getAttribute() on it exist. But between requests you let go of the instance, and leave it upto the container session manager to hold onto. If as comment #44 implies you are talking minutes later, then I read into that you mean across requests. Well there is no garuntee your session still exists so your web-app must deal with that situation. But you should not need to deal with seeing an old overwritten value turn up again, that would be a problem.
(In reply to comment #35) > How do we approach the specifications committee for official clarification of > this point. > > Who is going to take charge to actively do that? > > Here is hoping to provoke all vendors to take a look at the issue and report > back to the committee foir a verdict in the coming months. > The formal mechanism to do that is send email to the feedback address listed on the spec (for 2.4, that's <servletapi-feedback AT eng.sun.com>). I've done that, and corresponded privately with the spec lead for Servlet 2.5 (Greg Murray) as well, who will then discuss it with the expert group to see what (if any) language changes they might want to do in Servlet 2.5.
(In reply to comment #48) > (In reply to comment #47) > > I guess the point of the above would be that desired synchronization behaviour > > is usage-dependent, so giving the developer the freedom/responsibility to > > decide it is not necessarily a bad thing. Personally my reading of the spec > > was that this was the reason it was left to the developer: because only the > > developer has a good idea what the desired sync behaviour is. > > > > It would be fine if all the other places in Tomcat were fixed to allow you to do > that. Point in case jsp:useBean with scope="session" can't be synchronized..not > that I know of without changing the TC5.0 and 5.5 code. Also the JSTL > libraries, and many other jakarta libraries. Also if you access the PageContext in a jsp and use the scoped methods the ones accessing APPLICATION and REQUEST scope are already synchronized, yet the session is not. Also, now you have to synchronize not only session calls, but you also have to synchronize the calls to the PageContext in a JSP. So, instead of some type of a performance gain you end up with double the monitors for those objects and then the one for the session. I think these are the points people are making.....there is so much to change now that a patch now and maybe a refactoring could be done and an announcement that the functionality is changing. But, before TC decides to do that maybe they should see how all of the other servers are handling this. Because if they are handing this with synchronization and others are writing code not worrying about synchronizing........they are not going to rewrite to use Tomcat later and what about code targeted for a container like Oracle or Sun ONE that someone is using in TC which is not synchronized...point being if the user base gets run off whats the point, and who will be left donating to the project. I really think this is too subtle of a thing to say it's by the spec....at least for the time being, and maybe it should always and forever remain synchronized depending on what is going on with the real world usage of the spec in other cases. There are other situations where you don't even have access to synchronize the call...again tags, and I'm sure some other API calls directly manipulating the session. I have found no where in other Tomcat code synchronizing the session access, so there are other issues or fix this one. I haven't even seen comments on all of the other issues in the TC code accessing the session or the other jakarta projects from anyone but those arguing this needs to be fixed. So, what's the story? Is this one thing changing or are all of the others changing or is nothing changing (being fixed)?
(In reply to comment #38) > All the entry objects will be mutated. While it may be inconsistent and might > loop for an instant (although I am not convinced this could really be the case; > I think the trouble without any sync could only occur if there was more than one > concurrent unsynced write, and in particular, two "remove"), the pointer value > will be corrected and the loop should exit. That's my interpretation looking at > the code. I think I'll write a small program to test this. Ok, I wrote a program which can reproduce the bug even with tomcat 5.5 conditions, with synchronized put/remove and unsynchronized get. I must correct myself, since it doesn't produce an infinite loop, but just a very LONG LASTING FINITE loop, and I am talking about hours or days of execution. I do following: Having X writer and X reader threads, which modify (set or remove by 50% chance) and read 10 mappings concurrently. The abovementioned bug occurs pretty soon (500.000 operations), the read thread hangs around for 5-10 seconds and gets fixed by the writer thread some time later. I've measured that a read operation lasts on average 0.01-0.02 milliseconds. So if one of the threads detects that last execution lasted longer then 5 seconds it is safe to say that this thread was hanging and got fixed by another writer. Speaking of that, we could probably reproduce it on the production servers by not restarting them, and leting them run some hours or days instead (maybe as long as the user is still online, I don't know) and hope they will get fixed. However this is not really a solution. The reason I can't reproduce the infinite loop, is that I can't stop the writers in time. Performing thousands of writes per second in multiple threads makes it hard to stop all writers at the same point, before they unintentionally fix the reader. Still, I assume, I have a good chance that an infinite loop happens on the production server if the user leaves the server, and no further write operations are performed on his session.
Looks like the problem will hard to reproduce as it requires a situation where a write to the hashmap causes a table resize/reindex to occur while another thread is reading from the hashmap. I would have thought it would return a null when it couldn't find the value. But there is no guarantee on what happens. Developers could sync on the session everytime they want to read. But why should they, it is adding to the complexity of writing web apps. As Wade Chandler points out it may not even be possible with the JSTL tags. Some have suggested using ReadWriteLock or some other tricky method of reading, checking if null, and then doing a synced read. Which is all very good, but there is a simpler solution (for tomcat 5.5) - use ConcurrentHashMap - it says that "Special-purpose data structures such as the Concurrent hash tables in this package have far less overhead, and typically much better performance than placing ReadWriteLocks around most sequential data structures." Also, the ConcurrentHashMap already does the second suggestion of doing a sync after a null is returned (if you look at the link at the end of this). Here you have someone who has done all this hard work of figuring out when you can read/write and deals with locking. It will give you consistant results, and takes out the opportunity for developers to make mistakes themselves with deciding whether to sync when reading from a session or not. Plus JSTL and everywhere else will benefit in not getting inconsistant results. Next stop, Application scope... some more reading about ConcurrentHashMap internals: http://www-128.ibm.com/developerworks/java/library/j-jtp08223/
(In reply to comment #56) > Looks like the problem will hard to reproduce as it requires a situation where a > write to the hashmap causes a table resize/reindex to occur while another thread > is reading from the hashmap. > > I would have thought it would return a null when it couldn't find the value. > But there is no guarantee on what happens. > It should return a null when you simply look at it the code. I can't reproduce the infinite loop myself, but Leon says he can reproduce the problem, and if it's so then it's a problem (Murphy's Law), and I definitely see that it is possible to add an attribute then later while a write is occuring to ... for instance ... be logged in to some application which is relying on session variables and the system think you are not then get kicked out because of concurrency issues in with the HashMap not being synchronized because the system thinks a session attribute/variable isn't set when it is. The original set could have occurred hours before then just when the read is occuring a write takes place which resizes the HashMap, and then an issue arises which makes web apps unpredictable in Tomcat....you can predict it by knowing the default size of the Map and not using more session variables than the Map threshold to size I suppose, but give me a break. I can't say for certain, myself, whether the infinite loop will occur just syncing the writes or not, but the way the get method works with the index being calculated for the hash bucket (array index) you can certainly say it's unpredictable and it can definitely cause goofy errors. So an INVALID resolution seems irresponsible for this issue.
I assume that everyone has read the tomcat mailing list regarding this issue and seen and read the link pointing to http://blogs.opensymphony.com/plightbo/archives/000175.html So are we still arguing whether or not getAttribute can cause an infinite loop? or not? If we all agree that getAttribute can cause an infinite loop IF called in a multithreaded environment, we (I assume) all agree that the call to it MUST be synced SOMEWHERE. This also includes the example in comment 46, as it can not be gaurenteed that getAttribute (inside the session.get) in the first line does not happen in the middle of putAttribute in the middle of session.put. Now the last question is remaining is who's responsibility is it to sync it? If the end developers need to sync it - they will need to sync much more than just the 'hashmap' itself
(In reply to comment #58) > I assume that everyone has read the tomcat mailing list regarding this issue > and seen and read the link pointing to > > http://blogs.opensymphony.com/plightbo/archives/000175.html > > So are we still arguing whether or not getAttribute can cause an infinite loop? > or not? > > If we all agree that getAttribute can cause an infinite loop IF called in a > multithreaded environment, we (I assume) all agree that the call to it MUST be > synced SOMEWHERE. This also includes the example in comment 46, as it can not > be gaurenteed that getAttribute (inside the session.get) in the first line > does not happen in the middle of putAttribute in the middle of session.put. > > Now the last question is remaining is who's responsibility is it to sync it? > > If the end developers need to sync it - they will need to sync much more than > just the 'hashmap' itself > > > Again though, when you look at all of the places accessing Session.getAttribute you can't synchronize all of them, so the last question that I see is what is going to be fixed? Either all of these other places in TC need to be fixed or this one does. But, I think a real world study should take place before all the other places are fixed and this one isn't. I mean, if all other containers are behaving like TC4.x and synchronizing these calls then it just makes sense for TC to follow suit.
(In reply to comment #58) > synced SOMEWHERE. This also includes the example in comment 46, as it can not > be gaurenteed that getAttribute (inside the session.get) in the first line > does not happen in the middle of putAttribute in the middle of session.put. Actually, after looking at this a bit more - I need to correct myself - this example works because the hashmap does not need to be resized on the first and only putAttribute call. Still - not very pretty though.
Too much has said alread, but I also think it is not the only option to simply synchronize. Most of the time the different webapps/technologies do not utilize the same content of such a session map. e.g. JSF and your home app mostly wont use the same keys in this hashmap. So why introduce a bottleneck? (and I didnt mean a real performance one, its more philosophical) Why should one technique influence another one? And there is an impact, in the worst case this synchronized map acts like a "sequencer". What I would like to say is, its somehow reasonable to not simply synchronize, but to find a hashmap which is able to handle this case. This hashmap should be able to insert keys in concurrent. I one insert the same key in parallel the result wont be deterministic, ok - fine. And for sure, such a map should not lock (and not even wait for some "free me" event) on get. Isnt there a "cool" java collection developer here?
(In reply to comment #61) java.util.ConcurrentHashMap
(In reply to comment #62) > (In reply to comment #61) > > java.util.ConcurrentHashMap You meant java.util.concurrent.ConcurrentHashMap, didn't you? Too bad it has been introduced in java 5.0
Another question: http://java.sun.com/j2se/javadoc/writingdoccomments/index.html According to sun's javadoc principals (above link) # The Java Platform API Specification is a contract between callers and implementations. The Specification describes all aspects of the behavior of each method on which a caller can rely. It does not describe implementation details, such as whether the method is native or synchronized. The specification should describe (textually) the thread-safety guarantees provided by a given object. In the absence of explicit indication to the contrary, all objects are assumed to be "thread-safe" (i.e., it is permissible for multiple threads to access them concurrently). It is recognized that current specifications don't always live up to this ideal. Since the javadoc of the HTTPSession (http://java.sun.com/j2ee/1.4/docs/api/index.html) doesn't mention anything about session being not thread-safe, should not the developer be able to rely on the fact that the implementation is thread safe?
(In reply to comment #64) > Another question: > > http://java.sun.com/j2se/javadoc/writingdoccomments/index.html > > According to sun's javadoc principals (above link) > # The Java Platform API Specification is a contract between callers and > implementations. Leon, Can you publish your TC+HashMap finding you based your comment #55 ? I've always tried to program for the worst case scenario, assume thread-unsafe unless otherwise stated. Probably my 'C' language background thinking here. So it interests me to hear this stated in reverse for Java (not a bad thing at all). Where within the reference is that implicit thread-safe notion stated ? Is a HashMap really needed for session attributes, exactly how many attributes does an average application have set ? Replacing the collection with a bespoke TC internal class based on something as silly as a linked list or fixed hash bucket redirect into linked list should serve most users well. Then making the exact collection implementation class a configurable item so those users that would benifit from a safe HashMap for scalabilty of key count would be happy to fix their performance with a config change. It would please me greatly to see a more proper collection class for the job with a safe write operation whilst allowing simultenous mutiple reads. But trying to bend the HashMap API into something its not is wrong. Java Moto #1: "You program to interfaces"
(In reply to comment #65) > all). Where within the reference is that implicit thread-safe notion stated ? http://java.sun.com/j2se/javadoc/writingapispecs/index.html#top-level
(In reply to comment #66) > http://java.sun.com/j2se/javadoc/writingapispecs/index.html#top-level That reference is invalid in the context of this bug report. I read it to be a hyperthetical example of what to put inside javadoc information. Nothing concrete to under pin all Java programming design. My comments fallback to what is written (and not written) into the documentation for the HashMap implementation that TC has elected to use for the job. It does not say you call get() while you are also doing a put() (when the put() may cause internal re-arangement), it explicitly warns you of this problem and as we have no way knowing if the put will cause a rearrangment to achieve TCs goal of robustness we have to synchronize our usage.
Created attachment 16347 [details] bug reproduction eclipse ready project. to start: jar -xf synchtest.jar cd synchtest java -cp classes leon.synch.StartTest
(In reply to comment #65) > Leon, > > Can you publish your TC+HashMap finding you based your comment #55 ? It's a simple program, which actually shows the bug (on synchronized put/remove and unsychronized get). I am to stupid to program the writers to stop as soon as the loop occurs, so they fix the problem some time after it occurs (thats why a long loop instead of infinite loop). After each execution the reader measures how long the execution lasted. If the execution lasted longer then 5 seconds it cries. The mid time of the execution is 0.05 milliseconds. So we can safely assume, that executions which last 5 seconds or more aren't normal. In fact I had it all night running with 714 "too long executions", longest of them being 70 seconds. If I add synchronized(map) in the getAttribute method of the Storage class it doesn't occure anymore.
(In reply to comment #67) > (In reply to comment #66) > > http://java.sun.com/j2se/javadoc/writingapispecs/index.html#top-level > > That reference is invalid in the context of this bug report. I read it to be a > hyperthetical example of what to put inside javadoc information. > > Nothing concrete to under pin all Java programming design. I think what leon is trying to say is that 'this is what should be done'. So either - the documentation for HTTPsession has a problem, that it is missing the information regarding it not being thread safe - or the implementation in TC is wrong. According to what I have read here, it has been decided that TC says that HTTPsession is NOT threadsafe, meaning that all developers who use HTTPsession will - should they call 'put' - need to ensure that they sync the session - otherwise nasty things can happen. This means that the documentation should definitely be changed to warn developers, and that all 3rd party projects fix their code so that it does not crash/ hang tomcat 5.0 . (I haven't had a proper look to check if the synced puts in 5.5 are enough to stop the loops)... IMHO - Anyone who doesn't care whether or not the data is consistent - really doesn't need the data, and shouldn't bother calling the routine.
Guys please .. Remy already said way back in #40 that he would look into allowing this to be configurable :)
(In reply to comment #71) > Guys please .. Remy already said way back in #40 that he would look into > allowing this to be configurable :) I don't really see anything that says the real issue will be addressed. An out of the box Tomcat has to be configured not to have a bug? Plus, he said he would look into it and all this and that and that's fine, but no where does it reflect he believes there is an issue, and right now a bug caused by an uninformed decision remains. One, how do you synchronize calls with jsp:useBean with scope of session? Where else is session accessed in the Tomcat code without synchronizing that developers should be aware of? Is JSESSIONID stored in the session as well and where is it accessed? We need to make sure that access is synchronized. If we are going to talk about specifications and responsibilities lets talk about them, but logically and truthfully. There is more to this issue than can be addressed by: "It's the developers responsibility.". Do we need to file bugs for all of the other issues if this is going to remain invalid? I say that must be the case if this is to remain invalid. Personally I don't like the idea that "this was done as an experiment"... written in one of the comments above... in a system said to be "production quality" (Tomcat home page). The point is this: if it were an experiment it should have used an option to turn it on, but should not have been a default by any means. If a system has an apparent flaw that reduces it's stability and reliability and especially when it can be solved with a couple of lines of code.....a patch is a no brainer. This entire issue makes Tomcat currently invalid as there are sections of standard usage which can not be safe guarded against concurrency issues. If I have an issue where customers can get kicked out of the site after being logged in and the client asks me to fix it....I can't without changing the entire application, and by that I mean not using jsp:useBean calls in my JSP pages without wrapping them with other calls, so what's the point of using the tags in JSP if they can't be safe guarded without more work. I thought the point of the tags, EL, and other things is to make development quicker and to provide short cuts. My opinion: 1) Before a decision was made to change the synchronization of something which was currently synchronized when documentation for the underlying implementation explains there are issues with not synchronizing should not have only been discussed as to whether it was safe, logical, and correct, but should have been proved. 2) If it were to be an experiment, then it should have used an option to turn it on, and the default been the more correct implementation for the given situation. 3) When a problem is identified in a piece of software being promoted as production ready it should be handled. Not shaken off as INVALID. Where is the proof that unsychronized reads don't have issues. Examine the HashMap source code and write a few tests. You'll see with the bouncing index that a valid value can exist in the Map for a given key and that Entry not be located because null is returned. Thus a program relying on a session variable to be set will not be able to retrieve that session variable and take an entirely different path because of this....it's very simple........just read the code: public Object get(Object key) { Object k = maskNull(key); int hash = hash(k); int i = indexFor(hash, table.length); Entry e = table[i]; while (true) { if (e == null) return e; if (e.hash == hash && eq(k, e.key)) return e.value; e = e.next; } } between the time i is set from a call to indexFor if table is resized with all values null or all items are transfered and the given index for i is null then null is returned as soon as we step into the while loop. What that means is: If between the time i is set from indexFor and the HashMap is resized because of a write then there is a concurrency issue which affects the consistency of the application. How can this issue be worked around currently using Tags? If we need to file bugs on all the other places please let us know. If you would like some help with the other issues I would be willing to help, but my feelings are this should be changed back for the time being and we need a new release based on the latest release with only this change to the source. In the mean time all the other issues can be worked on as all of that will take more time than this change. Once those issues are dealt with then change it back and make sure the issue is documented on the front page of the Tomcat documentation.
Ok, I'm going to *explain* why I'm so upset about this. "There is heavily used public API in this heavily multithreaded application that are not thread safe and not documented clearly as such." You might as well put it on the front page - you've got a page of comments from various ASF people trying to explain to the world how that sentence (which is what it all boils down to) makes a hair of sense. Any other parts of the most commonly used API in the system, those things which are not only used on multithreaded applications but happen to be *heavily* used parts of the API, with a high risk of threading problems, happen to be? Because I'm looking at your response and wondering WTF you expect of developers - to walk through the spec and GUESS at which things you've decided to willfully misinterpret for the sake of shaving off a few hundred milliseconds that cause my live applications hell? I've been wondering what's caused this for ages now, and just *happened* to stumble across this. How dare *ANYONE* at the ASF claim a holier-than-thou attitude about a fork when something as simple and basic as this gets explained away, marked invalid, and ignored. You're experimenting with people's live applications, whom everyone's been encouraging to trust software written by authors who think a few hundred milliseconds on Joe's web app is more important than stability. Plain and simple. WTF are you doing to Tomcat? Is there anything more important than its reputation as a stable platform? If the ASF worked so bloody well, we wouldn't be seeing someone asking us to misread sections of the spec. There is *no* defence for this kind of behaviour in a server like this; there's loads of defence for the existence of the bug, to be sure - but none for this kind of response... So all the pointless crying about someone forking off looks a lot more, now, like you're getting exactly what you deserve for decisions which quite clearly diverge from the sane by several kilometers. Even rereading this, I can't get over it - a major bug in the support for multithreading, being ignored by its developers for a range of differently pointless reasons, resulting in a "hey, we'll make it configurable". Configurable? What, now I *want* to selectively cause data corruption? I didn't realise it was such a beneficial option to have on hand. Sure, some of our readers get 'teh suck', but hey, it's 8 milliseconds faster for the other guy. This is one of those bugs that goes out in e-mail, to just about everyone I know, with a note that says maybe we should be looking at alternative solutions to Tomcat in our live environment - because if this is the kind of judgement being used to build the application we're depending on to serve content, you can keep it. This all works so long as I don't have to question fundamental decisions about whether the software is well made, and up until now, I've never even considered the possibility of whether or not I should trust what comes out of the Tomcat dev team. Now I find out that a major bug that's been affecting our platforms was someone's little 'experiment'. I haven't got two fingers big enough to stick up at the moron who thought *that* explanation was a good idea. Fix it. Resolved my arse. And while you're at it, tell me what other areas of the "spec" I should be misinterpreting, and working around in my code. Crying about a fork? Reading this, you practically *deserve* one.
I reopened this ticket because it pretty clearly should not be closed, or at least not with an INVALID resolution. I would have thought others would have done this (again, it seems it has been closed and reopened already), but so be it, I'll do it. * 18 people voted for this bug, which alone indicates it is not INVALID. * There continues to be comments made on it which reflect people wanting this addressed in some way other than sweeping it under the rug (see the rather heated comment right before this one... that is not the first time I have personally heard similar sentiments from someone recently). Even if all the committers vehemently disagree with that thinking (and it appears to mainly be one frankly) it is improper to close this as INVALID because that does not accurately reflect reality. * There continues to be discussion on the mailing lists, and elsewhere, about this. That too indicates the INVALID resolution is not correct. If someone wants to verbally slap me for doing this, go for it. I for one do not believe this whole situation is being handled properly and this is my act of civil disobedience in the matter. If you want to close it again, at least DO NOT do the Tomcat user community the disservice as marking it as INVALID because that resolution will do more to hurt people than not fixing it (i.e., some may conclude there really is nothing to worry about and get burnt in production). Mark it as what it truly is: WONTFIX... which would *still* be the wrong answer IMO, but will at least provide the proper information to anyone looking at this ticket after the fact, scratching their head asking what the hell is going on?!?
(In reply to comment #73) The great thing about the ASF is its open source and you can fix it yourself. I would prefer it fixed in Tomcat, but I think your rant, Wade, at a Tomcat developer isn't going to make that more likely - this is a volunteer organization. I've created a patch, based on Craig's idea (see comment#21), that can be "plugged" in for Tomcat 5.0.x which creates "thread safe" sessions - available <a href="http://www.niallp.pwp.blueyonder.co.uk/TomcatBug36541.html">here</a>
Just realize that some of us work for companies that don't allow us to create our own patched version--we're required to use the binary, versioned downloads. So while a patch may be sufficient for some of the userbase, it's not the answer for everyone. So let's just fix it. Jay
(In reply to comment #76) OK "patch" is probably mis-leading - its really a "plug-in". You still use the Tomcat binary download unchanged. Its just a case of dropping in an extra jar and configuring Tomcat to use it. (In reply to comment #75) > (In reply to comment #73) > would prefer it fixed in Tomcat, but I think your rant, Wade, at a Tomcat Sorry Wade, meant Gregory :-(
Guys, we're in active discussion with the Servlet Spec expert group about resolving this. It's an open issue in the center of everyone's radar and highest on the priority list. It'll get fixed in the best way possible as soon as possible. Let's try to stay professional about it.
(In reply to comment #75) > (In reply to comment #73) > > The great thing about the ASF is its open source and you can fix it yourself. I > would prefer it fixed in Tomcat, but I think your rant, Wade, at a Tomcat > developer isn't going to make that more likely - this is a volunteer > organization. > > I've created a patch, based on Craig's idea (see comment#21), that can > be "plugged" in for Tomcat 5.0.x which creates "thread safe" sessions - > available <a > href="http://www.niallp.pwp.blueyonder.co.uk/TomcatBug36541.html">here</a> That is the great thing about open source software. Unfortunately I didn't write comment #73. It certainly is a volunteer organization. I have tried to help and give advice on the Tomcat list for years. I have even tried to help out with a few open source projects with some of my own time: iReport on sourceforge.net is one of them. When I work on any open source project I accept others critism and am conscious of the fact that I did not write all of the source code in the project nor am I the only person using and relying on the software. I am also aware that because an issue isn't affecting me directly it doesn't mean it isn't an issue, and these types of problems require being looked into when clear and relevant evidence is provided to display a real and valid problem exists. The last paragraph is written becuase, I have not been bitten by this specific bug as of yet. I saw the issue come across the Tomcat users list. I then evaluated the situation. As I was evaluating the situation I noticed that while I didn't see how an infinite loop could occur I definitely could see an issue because not synchronizing the reads can allow incorrect returns from a call to getAttribute (null when the key clearly points to a relevant value) and the while loop can be entered and iterated over a different number of times before returning. I even went as far to take java.util.HashMap, copy it into it's own class so I could access it's internals from code, then wrote tests to validate what I thought could happen. I then noticed areas in the Tomcat source code which are accessing the session which are not synchronized which developers can not directly synchronize. So, clearly there is a real issue which needs to be addressed a the release level. A patch is fine, and thank you for it, but it doesn't solve the scenario where a new user who doesn't know about this issue, the patch, or anything else associated with it downloads and installs Tomcat. So, when they run into it we'll have this same thing over and over again until it's fixed. I have a feeling it's being worked on though. Also, I think some of the more hateful comments could have been avoided had there not been antagonizing language used by certain folks. If you will notice the comment you are referring to posted by another user, they seem agitated by some of the responses to the issue. So, yes it's a volunteer organization, and ASF should be respected by both users and developers alike. It should be considered a privilege to be allowed to be a commiter here IMO. I have always felt that way about anything I have been able to be a part even when I may feel like debating an issue or getting a little heated. So, I have written what I just wrote. I would like to send a sincere thank you to all of the Apache developers for their efforts and time: no hard feelings from me. Really, I have worked in this business long enough to have been in heated debates time and time again with co-workers and fellow developers about programming issues. Sorry for the confusion, and I want to reiterate another frustrated individual besides myself wrote comment #73. I don't want to argue. I just want to contribute to software, and make it the best it can be, and certainly as good as it should be. So, I'll continue to point out things I see as issues, and try to grin and bear it when they get pointed out to me.
(In reply to comment #77) > (In reply to comment #76) > > OK "patch" is probably mis-leading - its really a "plug-in". You still use the > Tomcat binary download unchanged. Its just a case of dropping in an extra jar > and configuring Tomcat to use it. > > (In reply to comment #75) > > (In reply to comment #73) > > would prefer it fixed in Tomcat, but I think your rant, Wade, at a Tomcat > > Sorry Wade, meant Gregory :-( Thanks for the apology Niall. Wade
just to give a short update on the patch (for those who doubt or/and are interested) we installed the patch on friday morning, so it's in the production (live) system for 5 nearly days now. Since that we haven't had a single tomcat crash or hangup. Same period a week ago it were about 15 servers. We haven't changed anything else, and the system has served about 500.000.000 (real) requests with the patch, so I assume it can be safely said, that the patch works and solves the bug.
(In reply to comment #73) > Fix it. Resolved my arse. And while you're at it, tell me what other areas of the "spec" I should be > misinterpreting, and working around in my code. Crying about a fork? Reading this, you practically > *deserve* one. Yes, unfortunately, you need to find yourself a replacement brain really fast so you could actually comprehend the issue. Syncing reads in addition to writes (which already prevents data corruption, and is done again in the entire 5.5 branch) provides very little, since your reads are going to be dirty anyway unless you sync yourself at the application level. Basically, the ConcurrentHashMap, which uses the same algorithm as the regular HashMap, just does an additional synced read if the result was null to the initial unsynced read. The problem should be solved soon, howver, and you'll be able to stop writing pointless rants, as the spec wording is going to be heavily changed. BTW, I am not crying about the Glassfish fork, I am mentionning that the way it was done was, as usual with companies doing forks, innappropriate. > its reputation as a stable platform Lol, most of the time, people (like you) say it's crap :D
(In reply to comment #78) > Guys, we're in active discussion with the Servlet Spec expert group about > resolving this. It's an open issue in the center of everyone's radar and > highest on the priority list. It'll get fixed in the best way possible as soon > as possible. Let's try to stay professional about it. Thanx Yoav, great news! I hope this issue will get solved, and we all get our "I've found a bug in the spec and all I got was this lousy t-shirt" t-shirts, and, most important, stable production systems soon. regards Leon
1) Take the rant for what it is: A demand that pointless excuses for why it is marked invalid end. 2) We'll all feel a lot better now that there's a fix on the cards instead of a lot of excuses. 3) "people like me" quietly use and support Tomcat in our community, and don't show up in forums because, up until reading this, we didn't even have issues like this *crossing our mind* to bring up. Just get it fixed. Don't care how - it's *not* an invalid bug, and there's no excuse for that.
Ok, so completely innocently here (despite what people may happen to think - I'm *still* incensed by reading the history on this bug, and still feel well and truly PO'd...) Why does the following not represent an issue in DeltaSession 1.35? http://cvs.apache.org/viewcvs.cgi/jakarta-tomcat-catalina/modules/cluster/src/share/org/apache/ catalina/cluster/session/DeltaSession.java?rev=1.35&view=markup Access to attributes.put() remains unsynchronized; synchronizing the remainder, especially when other methods are synchronized, strikes me as an oversight - it should cause the exact same problem (albeit less often, because reads are being serialized) as a put can still cause the hashmap to resize, and therefore still hit the problem; those puts are also allowed during retrieval of attributes.keyset() and other long-running actions as a result of this. removeAttributeInternal() also fails to synchronize the removal of the attribute from the map, and thus also creates the issue; attribute removal and insert can both cause a right-timed reader to fail. So it's not just limited to StandardSession, which... http://cvs.apache.org/viewcvs.cgi/jakarta-tomcat-catalina/catalina/src/share/org/apache/catalina/ session/StandardSession.java?rev=1.60&view=markup also fails to synchronize its session data in the current 5.0.11 branch - although it, too, does half the job by performing unsynchronized reads, synchronizing writes, failing to synchronize keyset retrieval, etc. So how is this not a 5.5 bug? StandardSession will definitely exhibit the behavior a lot more often, but the bug is still fundamentally *there* on both.
(In reply to comment #86) I have fix the unsync write session attribute operation at DeltaSession. Peter
(In reply to comment #87) > (In reply to comment #86) > > I have fix the unsync write session attribute operation at DeltaSession. > > Peter There are other places that have this issue I noticed while going through the Tomcat code such as the StandardSession get/setNote methods which are not synchronized and are used throughout the code without being synchronized. They are used in the authenticators I know and I'm sure they are used in other places because of their nature. Should I file a separate bug for each, or is it enough to give a generic bug which states there are a lot of cases of unsynchronized HashMap in Tomcat so they can be tracked down? I would work on a patch for it as long as I'm not working on it at the same time as anyone else. Are all instances being looked at as a result of issue 36541?
I am sorry if I am too late to push (the issue is already fixed), but in case it's not I have something to say: 1)SRV 15.1.7 says nothing about HttpSession being not thread-safe. And AFAIR this means the interface should be thread-safe. 2)SRV 7.7.1 says about synchronizing access to "session resources", but it says nothing about session object itself. If one thinks session itself was meant in it, he should agree that more exact method of synchronizing then "as appropriate" should be used in the specs: the Developer can choose any way it wishes to synchronize access to it's own resources but the method of synchronizing access to a object (HttpSession) that is used by different Developers (if such a synchronization is needed) MUST be defined. If you read "session resources" as "session", this means anyone can choose any appropriate method of synchronization of session object, that is totally incorrect. As for me, this means that neither 7.7.1 nor any other point of specs does NOT say that session object is unsafe and this means that in any implementation it MUST be safe.
Syncing reads in addition to writes > (which already prevents data corruption, and is done again in the entire 5.5 > branch) provides very little, since your reads are going to be dirty anyway > unless you sync yourself at the application level. Basically, the > ConcurrentHashMap, which uses the same algorithm as the regular HashMap, just > does an additional synced read if the result was null to the initial unsynced read. I don't want to flog a dead horse, but I do not think the above is correct. Normally "Dirty Read" means that you can read an uncommitted transaction. In this case, it means that you can get an incorrect read when looking at a peice of data that no one else is touching. E.g. If someone is adding Item X, you may be prevented from seeing Item Y. This is mostly caused by the resize issue alluded to above, and is addressed by ConcurrentHashMap much differently than it is addressed in HashMap. If you look at the Get code for ConcurrentHashMap: V get(Object key, int hash) { if (count != 0) { // read-volatile HashEntry<K,V> e = getFirst(hash); while (e != null) { if (e.hash == hash && key.equals(e.key)) { V v = e.value; if (v != null) return v; return readValueUnderLock(e); // recheck } e = e.next; } } return null; } HashEntry<K,V> getFirst(int hash) { HashEntry[] tab = table; return (HashEntry<K,V>) tab[hash & (tab.length - 1)]; } The code in getFirst first sets the table to a local variable, this is probabaly meant to get around the issue brought up by issue#72 where the table can change between gettting the table length and then accessing an index in that table. In addition to this, there are a lot of other differences between ConcurrentHashMap and Hashmap that address a number of these subtle issues. Look at the differences between ConcurrentHashMap$Segment.rehash() and HashMap.transfer(). Hashmap.transfer() simply transfers all the entries over to the new list. ConcurrentHashMap$Segment.rehash() needs to determine which entries can be moved over and which need to be copied so that it can keep the current table used by the ConcurrentHashMap valid. The overall point here is that concurrency issues are subtle, and using data structures that are not meant to be used concurrently in a concurrent manner can be dnagerous.
Actually, for completeness, I thought I would mention some of the other more subtle differences between HashMap and ConcurrentHashMap that make this dangerous. In both HashMap and ConcurrentHashMap adding a new entry comes down to a line that looks something like: table[bucketIndex] = new Entry<K,V>(hash, key, value, e); Looks safe, right? Problem is that Entry<K,V>.next is not final, so it need not be actually flushed to memory by the time table[bucketIndex] is set. For ConcurrentHashMap, HashEntry<K,V>.next is declared final, so it must be flushed to memory before the constructor is finished. This means that for the HashMap case, when doing a get, you can get this partially created instance and not look down the rest of the chain, and thus not find your element. In the ConcurrentHashMap case, this cannot happen because of the above mentioned "final" variable. For more info on this read section 17.5 of the JVM spec as it is revised for 1.5 http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.5
As mandated in the next Servlet specification (the change seems to be accepted), the session internal maps are now fully synchronized, by using a Hashtable.
could you post a link to the new spec? I can't find any changes under: http://www.jcp.org/aboutJava/communityprocess/maintenance/jsr154/index3.html
If session attributes are now going to be synchronized as per spec, what about servlet context attributes? They will face the same issue.
(In reply to comment #94) > If session attributes are now going to be synchronized as per spec, what about > servlet context attributes? They will face the same issue. The proposed (and now committed) change protects the attribute collection *inside* the HttpSession object, eliminating the opportunity for an application to cause that collection to become corrupted. But the fact that a session attribute can be accessed from multiple threads, and therefore might need internal thread safety checks internal to itself, is still the application designer's issue to solve. Indeed, it's not anything a container could do anything about, even if it wanted to. The attribute collection inside Tomcat's ServletContext implementation was already synchronized, and not subject to corruption. Only the technique used to represent *session* attributes was in question here.
(In reply to comment #95) > The attribute collection inside Tomcat's ServletContext implementation was > already synchronized, and not subject to corruption. Yeah, but what does the spec say (now)? If it is about as unclear as with the session attribute collection, history will repeat itself: Tomcat or some other container will implement ServletContext attribute collection unsynchronized (because its faster), some people will whine about lockups, spec needs to be changed one more time, container needs to be changed in order to be (new) spec compliant.