If a servlet has finished it's init(ServletConfig), and is now ready to handle requests, if you tell the tomcat manager to reload the context containing that servlet (EG: w3m http://localhost:8080/manager/reload?path=/ServletRestartTest), and you access that servlet via GET or POST (EG: w3m http://localhost:8080/ServletRestartTest/ServletRestartTest) while the destroy() method on the servlet is still executing, the GET or POST request will fail with a 404 error after the destroy() method has been completed. This is always reproducible with my attached test case. I believe this may be the root cause of the issue described in #42518 which is described as an "Intermittent 404". With this test case, it is always reproducible, and not intermittent.
Attached is a context file and a test servlet. This servlet has alot of sleeps, and prints a bunch of debug which will make it clear what is going on when the servlet is being reloaded. This problem is reproducible even without any sleep statements in the servlet. To setup the test do this: extract the apache-tomcat 6.0.14 binary distro cd conf cd Catalina mkdir localhost cd localhost cp ../../../webapps/host-manager/manager.xml . (This sets up the manager servlet for you) Drop the ServletRestartTest.xml into this directory cd ../.. edit the tomcat-users.xml and paste in: (this lets you access the manager servlet with username tomcat password tomcat) <role rolename="manager"/> <user username="tomcat" password="tomcat" roles="manager"/> cd ../webapps extract the war file cd ../bin ./catalina.sh run (this runs tomcat printing the stdio to the console) To reproduce with this test case: 1. Visit this URL: http://localhost:8080/ServletRestartTest/ServletRestartTest You should get: ServletRestartTest - Servlet is in the RUNNING state 2. Visit this URL: http://localhost:8080/manager/reload?path=/ServletRestartTest 3. Then immediately reload the original url (http://localhost:8080/ServletRestartTest/ServletRestartTest) 4. You will then receive the 404 error approximately 5 seconds later: HTTP Status 404 - Servlet ServletRestartTest is not available type Status report message Servlet ServletRestartTest is not available description The requested resource (Servlet ServletRestartTest is not available) is not available. Apache Tomcat/6.0.14 Optionally you may use the apachebench program to see that you will get many 404s, depending on how many requests come in: Replace step 3 with: ab -c 10 -n 10 http://localhost:8080/ServletRestartTest/ServletRestartTest You will see that all 10 requests fail with non-200 responses: This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient).....done Server Software: Apache-Coyote/1.1 Server Hostname: localhost Server Port: 8080 Document Path: /ServletRestartTest/ServletRestartTest Document Length: 1081 bytes Concurrency Level: 10 Time taken for tests: 5.49371 seconds Complete requests: 10 Failed requests: 0 Write errors: 0 Non-2xx responses: 10 Total transferred: 12850 bytes HTML transferred: 10810 bytes Requests per second: 1.98 [#/sec] (mean) Time per request: 5049.371 [ms] (mean) Time per request: 504.937 [ms] (mean, across all concurrent requests) Transfer rate: 2.38 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 1 Processing: 5026 5043 6.4 5045 5048 Waiting: 5026 5038 8.0 5038 5048 Total: 5027 5043 6.1 5045 5048 Percentage of the requests served within a certain time (ms) 50% 5045 66% 5045 75% 5046 80% 5048 90% 5048 95% 5048 98% 5048 99% 5048 100% 5048 (longest request)
Created attachment 21025 [details] The ServletRestartTest file you need to extract
This is how I envision the reload function working: 1) The servlet is happily handling GET/POST requests 2) reload is requested from the manager servlet 3) All new requests should be put into a queue, and wait to be serviced 4) If there are any 'currently active requests', tomcat should wait up to StandardContext.unloadDelay seconds waiting for these requests to finish 5) Any requests that finish should be sent to the browser and finished up 6) When unloadDelay has been exceeded, or all currently active requests are completed, destroy() should be called on the servlet 7) Wait until destroy() has finished 8) Clean up the context, and switch the classloader to the new classloader 9) If there are any requests in the queue, start the servlet, call init(ServletConfig) on the servlet 10) wait for init(ServletConfig) to complete 11) All new requests should nolonger be put into queue, and should be serviced by the servlet 12) All requests which are in queue, should now be released and processed against the new servlet Looking at the code in StandardContext, I can see this is probably what it supposed to happen (it tries to pause incoming requests)... it just doesn't quite work. What we see is that if a request comes in while the servlet is being destroyed, it will get a 404 from the servlet when the destroy() method completes (EG: not right away!). I looked at the code very reasonably thoroughly in tomcat 5.5 (more thoroughly than I am looking at it right now) and I seem to remember discovering that it was the start() context method that was actually destroying the waiting request. But I can't vouch for that, it was probably a year ago :) It is worth noting that if you send in a SECOND request while the servlet is being destroyed, that request *IS* handled properly... However... you can't send that second request in TOO quickly after the first. If they all come in at the same time (see the apache bench testcase), they will *all* fail. However even if you're doing it reasonably quickly from firefox, the second request will come through correctly. I should also note that we had a patch for tomcat 4.1 which made the reload() function work the way described above (it was a rather minor change), however the patch is useless against tomcat 5 and greater, since the start/stop code has been rewritten.
Created attachment 21026 [details] The Context file for the ServletRestartTest
Thanks for the test case. The problem was that the wrapper originally allocated to the request is stopped as part of the reload. I have committed a fix to trunk and proposed it for 6.0.x.
This has been fixed for 6.0.x and will be in 6.0.17 onwards.
Sorry for the delay testing the fix... It's been a busy couple of weeks. So I checked out the code from svn trunk today (05-22-08) and tested it. With the test case I provided, everything was good... However when I tested it with a more thorough test case (EG: a real web application :) ), i ran into some crashes implying I was using an the OLD classloader. It seems like the Thread that is calling the init() function has the *wrong* classloader attached to it (EG: the old servlet's class loader, instead of the new one). I've narrowed down the issue (this... took some time :) ) and have a new test case for you. Just replace the ServletRestartTest.war file with this new one (and re-extract it). There are now two classes, the ServletRestartTest servlet itself, and another class (it happens to be blank) that I will try to load using the Classloader. Do the same test as before... run: ab -c 10 -n 100 http://localhost:8080/ServletRestartTest/ServletRestartTest then query the reload by going to this url at the same time: http://localhost:8080/manager/reload?path=/ServletRestartTest You'll see this: [...] servicing completed servicing completed destroy starting destroy completed init starting May 22, 2008 7:02:13 PM org.apache.catalina.loader.WebappClassLoader loadClass INFO: Illegal access: this web application instance has been stopped already. Could not load OtherClass. The eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no functional impact. java.lang.IllegalStateException at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1273) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233) at ServletRestartTest.init(ServletRestartTest.java:24) at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1161) at org.apache.catalina.core.StandardWrapper.allocate(StandardWrapper.java:806) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:129) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:183) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:595) init completed starting servicing [...] Let me know if I can be of any further assistance
Created attachment 21990 [details] Testcase 2 - ServletRestartTest.war
Thanks for your efforts on the test case. It made things so much easier to investigate. I have committed a fix to trunk that addresses this and proposed it for 6.0.x http://svn.apache.org/viewvc?rev=661074&view=rev If you are able to test my fix that would be very helpful.
Well thanks for the compliment on the test cases... These issues are very important to us, so I'm *extremely* grateful you're spending time looking into them. But I'm sorry to report that my test case for this next one isn't as good :) So first, your fix with the classloader issues seems to have fixed up that problem. However, if I use the 2nd testcase, and do: ab -c 10 -n 60 http://localhost:8080/ServletRestartTest2/ServletRestartTest then reload the context: http://localhost:8080/ServletRestartTest2/ServletRestartTest I am still getting some non 200 requests: Complete requests: 60 Failed requests: 11 (Connect: 0, Length: 11, Exceptions: 0) Write errors: 0 Non-2xx responses: 11 I swear I didn't this before... So maybe I'm crazy... But maybe I was distracted last time by trying to track down the classloader testcase. If I try to do the tests from my browser... they *usually* work just fine. But sometimes, I'll get the 404 in my browser too. I would just say it's a race condition as to when I reload the context, but can't quite put my finger on it. It seems like once it gets one 404... every time I try it after that (until I restart tomcat) it'll get the 404 on reload. *sigh* Sorry I can't be more helpful. Can you tell me if you are still seeing non-200s with the apache bench test? It seems like maybe you fixed one path where you can get a 404, but maybe there's another. It looks like requests that come in right after the destroy finishes, but before the init starts that are getting the 404s now. So I did find a *second* issue, which is also intermittent, and may have been introduced by your first fix (although I don't know). I've spent the past two days (well, a couple hours yesterday and a few hours today) trying to get a narrow testcase for you. I've got one, and I'll attach it as ServletRestartTest3. In your browser, you need to open Three tabs 1:http://localhost:8080/manager/reload?path=/ServletRestartTest3 2:http://localhost:8080/ServletRestartTest3/ServletRestartTest 3:http://localhost:8080/ServletRestartTest3/ServletRestartTest You need to start tomcat, then reload tab 2, then 1, then 3 This is what happens (sometimes!) the servlet will init() then the servlet will destroy() but while the destroy is running, it will service the request for tab 2 on the same servlet the destroy() will finish the servlet will init() the servlet will service tab 3. Unfortunately, what it's doing is serving a request (tab #2) *while* it's in the middle of destroying the context. Bad bad bad :( In the testcase I have rigged up it will cause a nullpointer exception in tab #2 when that happens: java.lang.NullPointerException ServletRestartTest.doGet(ServletRestartTest.java:43) javax.servlet.http.HttpServlet.service(HttpServlet.java:617) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) You'll see this on your tomcat console: init starting [GC [DefNew: 960K->64K(960K), 0.0029550 secs] 4176K->3426K(5056K), 0.0029980 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] Jun 5, 2008 10:13:07 PM org.apache.catalina.core.StandardContext reload INFO: Reloading this Context has started init completed destroy starting starting servicing destroy completed [GC [DefNew: 960K->64K(960K), 0.0034250 secs] 4322K->3539K(5056K), 0.0034690 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] [GC [DefNew: 960K->64K(960K), 0.0019920 secs] 4435K->3623K(5056K), 0.0020690 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] init starting init completed starting servicing servicing completed If you can't get it to happen first try, reload tab #1, which will shutdown the context... Then do tab #2, #1, #3. It may take you a few times to get it to happen. It's intermittent :( You'll see that Testcase #3 has a very tight loop at the init(). I tried a sleep(5000), and it was *significantly* more difficult to get the problem to happen. With the 5 second tight loop, it's much easier (although still not consistent). Let me know if there's anything I can do to help
Created attachment 22084 [details] Testcase 3 - ServletRestartTest.war
The classloader fix has been applied to 6.0.x and will be in 6.0.17. I'll take a look at the remaining issues.
I do have some progress to report. The issues you are seeing with test case 2 are due to the context.reload() calling context.stop() followed by context.start() context.start() unregisters and re-registers the context with JMX. Whilst the context is unregistered it does not appear in the mapper so the request is mapped to the ROOT webapp and a 404 results. I am currently looking for the least ugly hack around this issue. At first glance test case 3 is a separate issue which I will look at once I have test case 2 working properly.
I have committed another patch to trunk and proposed it for 6.0.x that fixes another source of 404s on reload. The patch is: http://svn.apache.org/viewvc?rev=667644&view=rev With this patch I can't get any 404s with test case 2. On to test case 3...
I can't reproduce the problem you had with case 3. Have you set the unloadDelay on the context? I have set mine to 20000 (20 seconds) for this bug to make sure that the servlet is given a chance to destroy() cleanly.
The Reloading looks like it works well now. Thanks! As for case #3... Yes I can still get that to happen. I thought maybe it was easier for me to reproduce because my laptop had dual core. I booted nosmp and I was still able to get it to happen (infact, I think it may have been easier). Yes, there is a setting for the 20 second timeout... but it's not hitting that. It's pretty much immediately blazing into servicing the request even though the destroy is in progress. The Destroy starting you see in my previous message and the starting servicing come basically at the same time. I think the key is to get the request on window 3 in before window 2 is finished causing the servlet to start. It doesn't happen every time... And it may be easier to recreate on a slower machine... it certainly appears easier to recreate when I replaced the init() sleep(5000) with a CPU bound loop Let me know if you can get it to happen... and if not I'll see what I can do for a better testcase. Thanks again
Sorry, I still can't repeat the issue with test case 3. I have tried on a couple of different machines.
Well that's frustrating :) I've done some work on it today to try to narrow down when it happens. I've written a java program that will make the required calls to the URLs in parallel to cause the issue. I wrapped it in a loop, and it happened the first time I ran it! Then I ran it again, and let it run for 2 hours without it happening once. Then I restarted tomcat, and it happened first time! So it seems to happen a lot *more* when tomcat has first started... but still not every time. I'm using Sun's 1.5.0_15 jvm if that somehow makes a difference. So I am attaching a unix shell script that will start+stop tomcat between each test run, and my java program that does the URL calling. You may need to tweak a couple variables to make this work on your machine: TestServiceDuringDestroy.sh: TOMCAT_BIN_DIR - Set this to where the catalina.sh script is TestServiceDuringDestroy.java: TESTCASE3_URL - Put the url to testcase3 (the one that spins cpu during init) RESTART_TOMCAT_URL - Put the url to restart that servlet RESTART_TOMCAT_BASIC_AUTH - Put in the basic authentication credentials for the manager servlet in format username:password With those changes... I think that will get it going for you. The theory here is: while (true) { start tomcat Request the servlet Restart the servlet Request the servlet stop tomcat } When the testcase fails you'll see something like this: Using CATALINA_BASE: /var/tmp/tomcat/apache-tomcat-6.0.16 Using CATALINA_HOME: /var/tmp/tomcat/apache-tomcat-6.0.16 Using CATALINA_TMPDIR: /var/tmp/tomcat/apache-tomcat-6.0.16/temp Using JRE_HOME: /usr/local/java Sleeping for tomcat to start Testcase Call 1: Requesting URL Restart call: Requesting URL Testcase Call 2: Requesting URL Testcase Call 1: IOException in URL request java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost:8080/ServletRestartTest3/ServletRestartTest at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1174) at WebRequestThread.run(WebRequestThread.java:28) Using CATALINA_BASE: /var/tmp/tomcat/apache-tomcat-6.0.16 Using CATALINA_HOME: /var/tmp/tomcat/apache-tomcat-6.0.16 Using CATALINA_TMPDIR: /var/tmp/tomcat/apache-tomcat-6.0.16/temp Using JRE_HOME: /usr/local/java Sleeping for tomcat to stop --- In the cases where it tomcat succeeds, I see this on the server side: INFO: Waiting for 1 instance(s) to be deallocated I do not see that when my testcase fails. FWIW, I am using a apache-tomcat-6.0.16 binary installation, and I dropped in lib/* and bin/*.jar from a newly compiled trunk svn checkout from yesterday (06-24-08) Good luck!
Created attachment 22176 [details] Test Harness #4
The patch for test case 2 has been applied to 6.0.x and will be in 6.0.17 onwards. I'm still looking at test case 3.
The good news is I can now repeat test case 3 and have found the root cause. There is a synchronisation issue with StandardWrapper. In short, one thread can enter allocate() and before countAllocated is incremented, another thread can enter unload(). The bad news is that I suspect the fix might not be pleasant. allocate() is on the critical path so just synchronising allocate() would likely be bad from a performance point of view. I haven't really thought about this much yet, so I am hoping there is a better fix to be found. I'll have a think about a solution for this over the next few days. Hopefully I'll have something next week. In the meantime, any suggestions are welcome.
I have committed a fix for test case 3 to trunk, It isn't perfect but it should be enough to fix the issue you are seeing. For the record, I could only get test case 3 to fail by using a debugger to control the execution sequence of the threads. The patch is http://svn.apache.org/viewvc?rev=672397&view=rev Any testing feedback would be appreciated. The patch has been proposed for 6.0.x
The patch for test case #3 has been applied to trunk. Whilst the fix isn't perfect is should be extremely difficult (verging on impossible) to reproduce this issue under normal usage. If you have a test case that still demonstrates this issue with the latest patches, please feel free to re-open this bug.
*** Bug 42518 has been marked as a duplicate of this bug. ***