Summary: | RecycleRequiredException and missing AsyncListener.onTimeout call | ||
---|---|---|---|
Product: | Tomcat 7 | Reporter: | cg.throwaway |
Component: | Catalina | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 7.0.59 | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All | ||
Attachments: |
Servlet which starts asynchronous processing and intentionally waits for a timeout before dispatching to a static HTML file.
JMeter script that continuously does a GET on the test servlet and verifies the response is 200. Zip containing Tomcat 7.0.x logs after running JMeter script for an hour. WAR file for servlet used to recreate this bug. |
Description
cg.throwaway
2015-07-20 13:32:13 UTC
Just to add: have also seen this issue in 7.0.63. Something that may take 30 minutes to reproduce is going to be pretty much impossible to debug. It looks like a rare timing problem. We can try and find it by code inspection but anything you can do to provide an easier to reproduce test case would be appreciated. I think what you are seeing is a symptom of bug 57943 where the same socket ends up in the Poller twice. That would explain the stack trace you see. Are you able to build Tomcat 7.0.x from svn and test it? If not, you'll need to wait for the next 7.0.x release. I'm leaving this as NEEDINFO for now as we need to know if this issue is still reproducible with the current 7.0.x trunk. Tried the test servlet on the latest Tomcat 7.0.x trunk build and still seeing the checkRecycled exceptions in the Tomcat console. Created attachment 32978 [details]
JMeter script that continuously does a GET on the test servlet and verifies the response is 200.
For info - the checkRecycled exception was, on average, being generated within 1 minute of the JMeter script being run. Hi. Could you give 7.0.x trunk another try please. I've fixed an issue that might be the cause of this bug. I don't think it is but it is worth checking and I haven't been able to reproduce this yet. Hi, built the latest 7.0.x trunk and tried it today. Still seeing the checkRecycled exception - first one 4 minutes after starting the JMeter script. I will upload the Tomcat logs if they help at all - I ran the JMeter script against the test servlet for an hour. Created attachment 32988 [details]
Zip containing Tomcat 7.0.x logs after running JMeter script for an hour.
Created attachment 32989 [details]
WAR file for servlet used to recreate this bug.
Attaching the WAR file I have been running the JMeter script against.
Another thing to note is that it takes much longer to reproduce this exception using a JavaScript client versus using JMeter (where it can be seen within a few minutes). Thanks! Thanks for testing. I'll be taking another look at this over the next few days. Good news. I think I have managed to track down the root cause of this. It is a race condition between the Poller timing out the socket and new data arriving. I haven't - yet - figured out how to fix it. The test case only triggered one failure in a hour for me but this is going to be *very* sensitive to timing so that isn't a surprise. I think I can trigger the correct execution sequence in a debugger so - assuming that is the case - I'll be able to confirm any fix fairly easily. I've found and fixed a couple of concurrency issues that could have contributed to this but I'm still seeing the occassional failure. I haven't yet managed to reproduce the same exception with a debugger. Investigations continue. Hi Mark, just thought I'd try out your latest update - still seeing the exception (as you have stated above). Thanks. A quick update before the weekend. I have better grip on what is going wrong. The issue is that async dispatches are being processed multiple times. Depending on the timing, this can go wrong. The fix (for 7.0.x at least) is non-trivial. I've tried various small patches but all introduce a different problem. At this point, I think a reafctoring of the async dispatch handling will be required. I haven't looked at 9.0.x or 8.0.x as yet. The refactoring that has already taken place in those versions may have already addressed this. Or not. I'll continue to look at this next week. I've spent some more time looking at this and - having found the 'right' place to fix this - the patch was actually fairly small. I have fixed the issue in 9.0.x, 8.0.x (for 8.0.25 onwards) and 7.0.x (for 7.0.64 onwards). We have built the trunk and we can confirm that we haven't seen the bug appearing during our tests with the minimal servlet, and also with our full application. We'll continue testing for more time but it's looking really good. Thanks a lot for your help Thomas! Really appreciated! Sorry!. I meant to say Mark... or Mr.Thomas.... as you prefer :) I answer to most names - as long as they are polite ;) I did some local testing before committing this patch with the simple test case and didn't see any issues for over 15 hours so I'm reasonably confident that this issue is fixed. If your testing shows otherwise just re-open this issue. |