Created attachment 32916 [details] Servlet which starts asynchronous processing and intentionally waits for a timeout before dispatching to a static HTML file. Overview: When repeatedly polling an asynchronous servlet which (intentionally) times out, I occasionally see the following exception in the console: Jul 20, 2015 8:58:45 AM org.apache.catalina.connector.CoyoteAdapter checkRecycled INFO: Encountered a non-recycled request and recycled it forcedly. org.apache.catalina.connector.CoyoteAdapter$RecycleRequiredException at org.apache.catalina.connector.CoyoteAdapter.checkRecycled(CoyoteAdapter.java:590) at org.apache.coyote.http11.AbstractHttp11Processor.recycle(AbstractHttp11Processor.java:1809) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.release(Http11NioProtocol.java:199) at org.apache.tomcat.util.net.NioEndpoint$Poller.cancelledKey(NioEndpoint.java:1100) at org.apache.tomcat.util.net.NioEndpoint$Poller.timeout(NioEndpoint.java:1466) at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1234) at java.lang.Thread.run(Thread.java:745) This coincides with a request not receiving any response. The majority of times the servlet is called, the correct response is returned as expected once the request times out, however on occasion (sometimes within 1 min, sometimes not until 30 mins of successful requests have passed) I see this exception and AsyncListener.onTimeout() is never called to produce the response. Steps to reproduce: 0) Configure Tomcat to use the NIO connector 1) Deploy the attached TimeoutServlet (note: add a static HTML page 'timeout.html' to the root of the web application to serve as the timeout response). 2) Using a tool of your choice (I used JMeter), repeatedly send a request to the servlet and wait for the timeout response before sending the next request. Actual Results: Eventually, one of the requests will receive no response and the above exception will be written to the console. Expected Results: Every request should receive the timeout response (timeout.html) after 1 second and no exception should appear in the console. Additional Information: Appears to be very similar to https://bz.apache.org/bugzilla/show_bug.cgi?id=57011.
Just to add: have also seen this issue in 7.0.63.
Something that may take 30 minutes to reproduce is going to be pretty much impossible to debug. It looks like a rare timing problem. We can try and find it by code inspection but anything you can do to provide an easier to reproduce test case would be appreciated.
I think what you are seeing is a symptom of bug 57943 where the same socket ends up in the Poller twice. That would explain the stack trace you see. Are you able to build Tomcat 7.0.x from svn and test it? If not, you'll need to wait for the next 7.0.x release. I'm leaving this as NEEDINFO for now as we need to know if this issue is still reproducible with the current 7.0.x trunk.
Tried the test servlet on the latest Tomcat 7.0.x trunk build and still seeing the checkRecycled exceptions in the Tomcat console.
Created attachment 32978 [details] JMeter script that continuously does a GET on the test servlet and verifies the response is 200.
For info - the checkRecycled exception was, on average, being generated within 1 minute of the JMeter script being run.
Hi. Could you give 7.0.x trunk another try please. I've fixed an issue that might be the cause of this bug. I don't think it is but it is worth checking and I haven't been able to reproduce this yet.
Hi, built the latest 7.0.x trunk and tried it today. Still seeing the checkRecycled exception - first one 4 minutes after starting the JMeter script. I will upload the Tomcat logs if they help at all - I ran the JMeter script against the test servlet for an hour.
Created attachment 32988 [details] Zip containing Tomcat 7.0.x logs after running JMeter script for an hour.
Created attachment 32989 [details] WAR file for servlet used to recreate this bug. Attaching the WAR file I have been running the JMeter script against.
Another thing to note is that it takes much longer to reproduce this exception using a JavaScript client versus using JMeter (where it can be seen within a few minutes). Thanks!
Thanks for testing. I'll be taking another look at this over the next few days.
Good news. I think I have managed to track down the root cause of this. It is a race condition between the Poller timing out the socket and new data arriving. I haven't - yet - figured out how to fix it. The test case only triggered one failure in a hour for me but this is going to be *very* sensitive to timing so that isn't a surprise. I think I can trigger the correct execution sequence in a debugger so - assuming that is the case - I'll be able to confirm any fix fairly easily.
I've found and fixed a couple of concurrency issues that could have contributed to this but I'm still seeing the occassional failure. I haven't yet managed to reproduce the same exception with a debugger. Investigations continue.
Hi Mark, just thought I'd try out your latest update - still seeing the exception (as you have stated above). Thanks.
A quick update before the weekend. I have better grip on what is going wrong. The issue is that async dispatches are being processed multiple times. Depending on the timing, this can go wrong. The fix (for 7.0.x at least) is non-trivial. I've tried various small patches but all introduce a different problem. At this point, I think a reafctoring of the async dispatch handling will be required. I haven't looked at 9.0.x or 8.0.x as yet. The refactoring that has already taken place in those versions may have already addressed this. Or not. I'll continue to look at this next week.
I've spent some more time looking at this and - having found the 'right' place to fix this - the patch was actually fairly small. I have fixed the issue in 9.0.x, 8.0.x (for 8.0.25 onwards) and 7.0.x (for 7.0.64 onwards).
We have built the trunk and we can confirm that we haven't seen the bug appearing during our tests with the minimal servlet, and also with our full application. We'll continue testing for more time but it's looking really good. Thanks a lot for your help Thomas! Really appreciated!
Sorry!. I meant to say Mark... or Mr.Thomas.... as you prefer :)
I answer to most names - as long as they are polite ;) I did some local testing before committing this patch with the simple test case and didn't see any issues for over 15 hours so I'm reasonably confident that this issue is fixed. If your testing shows otherwise just re-open this issue.