|Summary:||memory leak protection : stopping TimeThreads should be optional and disabled by default|
|Product:||Tomcat 6||Reporter:||Sylvain Laurent <slaurent>|
|Component:||Catalina||Assignee:||Tomcat Developers Mailing List <dev>|
patch for tomcat 7
bug48971_DBCP_reproducer.zip - IllegalStateException caused by stopped timer
Description Sylvain Laurent 2010-03-23 22:56:08 UTC
While investigating to write this wiki page http://wiki.apache.org/tomcat/MemoryLeakProtection , I realized that WebAppClassLoader.clearReferencesStopTimerThread() can have dangerous side effects. Indeed, if the TimerThread was spawned by some library deployed in the common (or even JRE) classpath, then stopping it can impact other webapps still running. It would probably be a bug of that library to have such a thread with such a context classloader, but we have to be realistic, we cannot fix every third-party library (and the JRE). In my opinion, tomcat should better play it safe in such a case, and should not attempt by default to stop TimerThreads : it's safer to have a leak for a stopped webapp than break running apps ! WebAppClassLoader.clearReferencesStopTimerThread() should be conditioned with the existing boolean clearReferencesStopThreads. see attached proposed patch.
Comment 1 Sylvain Laurent 2010-03-23 22:58:39 UTC
For a better description of the side effect, see http://wiki.apache.org/tomcat/MemoryLeakProtection#cclThreadSpawnedByCommonClassLoader
Comment 2 Mark Thomas 2010-03-24 00:32:42 UTC
There is no patch attached to this bug report. I'd rather keep the protection by default and provide a way to disable it if necessary. That said, your expendable class-loader idea is probably a better way of dealing with this. Is there a complete patch for that anywhere? I don't recall seeing one.
Comment 3 Sylvain Laurent 2010-03-24 05:32:39 UTC
Created attachment 25170 [details] patch for tomcat 7
Comment 4 Sylvain Laurent 2010-03-24 20:08:44 UTC
you're correct, I did not submit any patch for the expendable classloader, I'd like to discuss another approach on the dev list in the next days. Regarding this present issue, I think that it would be more coherent/consistent to either stop all "leaking" threads or none. Currently, the default behavior depends on the actual class of the Thread, which is not self-evident for users... And my opinion is still that such an option should be "opt-in" rather than "opt-out".
Comment 5 Konstantin Kolinko 2010-03-28 21:30:07 UTC
Created attachment 25203 [details] bug48971_DBCP_reproducer.zip - IllegalStateException caused by stopped timer Any DBCP pool that has timeBetweenEvictionRunsMillis attribute set to a positive value will suffer from this issue. The class that creates the timer is org.apache.commons.pool.impl.EvictionTimer and is (after being renamed) in tomcat-dbcp.jar, which is in the Common class loader. It looks that the TimerThread created by the Timer will belong to the first web application that happens to access the pool. Forcibly stopping the timer creates an invalid state in the DBCP and the next web applications may fail to start. I am attaching configuration that reproduces this issue. To reproduce: 1. Unpack the zip archive, and place the provided files on top of the default configuration of Tomcat 6.0.26. It contains: 1) /data - Two sample HSQLDB databases (http://hsqldb.org/), "database" and "database2". These are identical and contain a single table with two rows of data - see the *.script files. 2) /lib - HSQLDB 188.8.131.52 jar. 3) /conf - server.xml that defines a database pool in GlobalResources. - tomcat-users.xml that contains a username and password for the manager application 4) /webapps - Two web applications, test1 and test2. These are nearly identical, with the following only difference between them (see META-INF/context.xml): - test1 uses connection pool defined in server.xml - test2 uses its own connection pool 5) /logs - Logs from my test run. 2. Copy jstl.jar and standard.jar from /webapps/examples/WEB-INF/lib/ of Tomcat into the /lib folder of Tomcat. 3. Start Tomcat 4. Go to http://localhost:8080/test1/ A test page should load and print two database rows. 5. Go to http://localhost:8080/manager/html/ and stop the test1 application. Note, that Tomcat stops TimerThread created by DBCP. 6. Go to http://localhost:8080/test2/ Expected result: The same output as for test1 before. Actual result: The page fails to load, displaying DataSource invalid: "java.lang.IllegalStateException: Timer already cancelled." It is not seen from the stack trace, but my understanding here is that that occurs because in Commons Pools, when a new pool is initialized, the GenericObjectPool#startEvictor(..) method creates a new Evictor and calls Timer.schedule(..) to schedule it, but the timer is already canceled. Other issues: 1) I suspect that when the pool defined in server.xml is initialized, the TCCL occurs to be equal to the classloader of the web application. I have not yet confirmed that, but that will be a separate bug (causing its own leaks) if it is the case. 2) HSQLDB creates its own "HSQLDB Timer" thread. That is not important, because DBCP issue I am talking about is independent of what database is used. If anyone is curious: that thread is created by the org.hsqldb.lib.HsqlTimer class. Regarding the feature of stopping TimerThreads: 1) I agree with Sylvain that it looks like that it would be better to have it disabled by default, but I would like to have a separate option to control enabling this feature. 2) I think that another strategy is possible here: to call Thread.setContextClassLoader(null) on the affected threads. That can have its consequences, but might be better than stopping the threads. Workaround: Do not set "timeBetweenEvictionRunsMillis" attribute on a pool.
Comment 6 Sylvain Laurent 2010-05-06 17:18:59 UTC
Mark, could you consider fixing this issue for 6.0.27 ? there's my (now old) patch for trunk attached, do you need a more recent one and one for tomcat 6 as well ?
Comment 7 Mark Thomas 2010-05-24 10:22:30 UTC
The approach suggested by Sylvain on the dev list was to set the Thread's context class loader to null. I'm not 100% about the side-effects of that either. However, that did trigger another idea - setting the context class loader to the parent of the webapp class loader, normally the common class loader. In that case, we still need to log an error - it is just the corrective action that changes. I'll test this idea out over the next day or so.
Comment 8 Sylvain Laurent 2010-05-24 17:43:33 UTC
Hello Mark, I had tried this too, but I still had the leak because I often had some ProtectionDomain instances that referenced the webapp classloader. The urgent thing to do is to fix the current issue by making the stopping of TimerThread optional, and open a new BZ issue for an enhancement in this matter.
Comment 9 Mark Thomas 2010-06-01 13:01:45 UTC
This has been fixed in trunk and proposed for 6.0.x.
Comment 10 Mark Thomas 2010-06-01 15:15:00 UTC
Note I have fixed the DBCP issue in commons pool. The fix will be in pool 1.5.5 onwards.
Comment 11 Sylvain Laurent 2010-06-01 17:05:31 UTC
reference to the commons-pool issue : https://issues.apache.org/jira/browse/POOL-161