Bug 59138 - checkThreadLocalMapForLeaks has false positives
Summary: checkThreadLocalMapForLeaks has false positives
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Catalina (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-07 17:48 UTC by Brett Kail
Modified: 2016-03-08 20:18 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brett Kail 2016-03-07 17:48:07 UTC
ThreadLocal$ThreadLocalMap weakly references keys but strongly references values.  However, it appears the checkThreadLocalMapForLeaks checking reports false positives if the key is a ThreadLocal subclass (e.g., anonymous class) but the value does not strongly reference the class loader (e.g., Integer, int[], List<SimpleDateFormatter>, etc.).  Example output:

07-Mar-2016 11:27:08.258 SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [servlettest-0.1] created a ThreadLocal with key of type [servlettest.TestServlet$1] (value [servlettest.TestServlet$1@40d92399]) and a value of type [java.lang.Integer] (value [1]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

For large web applications with many such false positives, this output makes tracking down (or even noticing new) real issues more difficult.  Third party libraries refuse to adjust their use of ThreadLocal because they believe (IMO rightly) their code is not causing leaks.  I have read bug 50175 comment 6, but given that the current heuristic has false positives, can some compromise be reached?  Perhaps some configuration for stifling the warning on a per key class name basis could be added?  It would even be acceptable for us if that configuration hid the per-instance message but issued a single overall "suppressing N ThreadLocal warnings based on config" info/warning message.
Comment 1 Mark Thomas 2016-03-07 20:19:06 UTC
The situation described is a memory leak.

The problem is not the value but the anonymous class. It is loaded by the web application class loader. That pins the web application class loader in memory creating the memory leak.

The Tomcat code has been carefully written to avoid false positives. Tomcat checks the implementing class of the key and the value and if either of them have been loaded by the web application class loader then there is a memory leak and it will be reported. 

The current code errs on the side of false negatives. It is possible to construct a memory leak that this code won't spot but, given how the code is written, a false positive could only occur as a result of a bug. There has been no such bug reported in the previous 5+ years that this code has been in place in this form (I could have gone back further but 5 years seem long enough to make my point).

If you see a warning, it is extremely likely that you have a memory leak.

If a 3rd-party library believes Tomcat is falsely reporting a false positive for a ThreadLocal that that library creates and it has been checked with a profiler that the implementing classes for both the key and the value have not been loaded by the web application class loader (or a child class loader) then I'll happily take a look and fix any bug. But experience tells me that it is far more likely that the library does have a memory leak.
Comment 2 Mark Thomas 2016-03-07 20:20:10 UTC
And for the record while I was 99% sure just from reading the description what the problem was I did go to the trouble of building a simple test case and confirming the memory leak with a profiler.
Comment 3 Brett Kail 2016-03-07 21:37:18 UTC
Entries in ThreadLocalMap weakly reference the key, which is the ThreadLocal subclass that is loaded by the application class loader.  Assuming there are no other retained references to the application class loader, the JVM will clear this weak reference at the next GC.  This is basically the same as a WeakHashMap, and the javadoc there describes the same semantics: in order for the entry to be cleared, you need to ensure the value does not refer to the key.  Assuming the ThreadLocal subclass is held in a static variable (common case), that means the value must not reference anything that directly/indirectly references the application class loader.  In this case, the value refers to the bootstrap class loader only, so the ThreadLocal entry does not pin the application class loader.

I cannot reproduce a memory leak, but I can reproduce the false positive:

1.  Start an empty Tomcat instance (no webapps)
2.  Use jvisualvm, collect a heapdump, run the "select x from org.apache.catalina.loader.WebappClassLoader x" OQL, and see "The query returned no results" (baseline)
3.  Deploy a WAR that contains a servlet with a static ThreadLocal subclass, and have the doGet method set the ThreadLocal to an integer.
5.  Use jvisualvm again, and see there is one class loader
6.  Ping the servlet so that the ThreadLocal is loaded
7.  Undeploy the WAR, and see the checkThreadLocalMapForLeaks message
8.  Use jvisualvm again, and see there are no class loaders

In this case, the message is a false positive.  The presence of the ThreadLocal does not prevent the class loader from being garbage collected.  However, if I change step #3 to set the ThreadLocal to the servlet instance itself, then I see the class loaders accumulate as I repeat this process.

Are you able to reproduce these steps?  Can you provide more details on what you did to create the memory leak?
Comment 4 Mark Thomas 2016-03-07 22:53:03 UTC
Hmm. The profiler was showing the key as strongly reachable. That explains why it wasn't collected but not why it was strongly reachable. Let me take another look at the results.
Comment 5 Mark Thomas 2016-03-07 23:51:17 UTC
I still had the memory dump so I was able to confirm that the key was strongly reachable.

There were a number of GC roots listed of which one was the ThreadLocal map. Digging into the GC roots found the problem. Somehow (and I haven't figured out how to reproduce this yet) one of the profiler's classes was loaded by the WebappClassLoader. That was pinning the class loader in memory hence the ThreadLocal key was strongly held.

I'm as sure as I can be that this is what I was seeing previously. If I run the test all the way through then attach the profiler (rather than attaching at the start so I can monitor what is going on) then the key is not strongly held, the garbage collector does it's job after which the class loader is released.

I am therefore re-opening this issue.

At first glance, the fix appears to be as simple as removing the class loader check for the ThreadLocal key.

However, we have had issues reported where objects being retained after web application stop until the next GC has been sufficient to cause problems. If memory serves me correctly, those were all JDBC related. In this case, I don't think that will be an issue. Currently, ThreadLocal issues are cleaned up by renewing the thread pool and that won't change with the proposed change. All that changes is that keys won't trigger the warning messages.

We could log issues with keys at debug level on the grounds that users might find it useful. That will complicate the code though. I'm not sure that is worth doing. I'll think about it over night and come back to this tomorrow.
Comment 6 Christopher Schultz 2016-03-08 14:24:49 UTC
(In reply to Mark Thomas from comment #5)
>
> However, we have had issues reported where objects being retained after web
> application stop until the next GC has been sufficient to cause problems. If
> memory serves me correctly, those were all JDBC related. In this case, I
> don't think that will be an issue. Currently, ThreadLocal issues are cleaned
> up by renewing the thread pool and that won't change with the proposed
> change. All that changes is that keys won't trigger the warning messages.
> 
> We could log issues with keys at debug level on the grounds that users might
> find it useful. That will complicate the code though. I'm not sure that is
> worth doing. I'll think about it over night and come back to this tomorrow.

I think a downgrade from SEVERE to ... something else is appropriate. Maybe not as low as DEBUG; perhaps INFO? AFAIK, there's no way for Java code to ask the JVM if a reference is only strongly reachable from a particular root (e.g. the ThreadLocalMap itself), so there's no way for Tomcat to prove that the application-loaded ThreadLocal can be collected after the ClassLoader is (otherwise) free. So there is certainly ample opportunity for a memory leak, here, and users should be notified about it.
Comment 7 Mark Thomas 2016-03-08 20:18:29 UTC
This has been fixed in 9.0.x for 9.0.0.M4 onwards, 8.0.x for 8.0.33 onwards, 7.0.x for 7.0.69 onwards and 6.0.x for 6.0.46 onwards.

I have down-graded this to DEBUG. After the 'fun' with the profiler I've double checked this and also confirmed that the JRE implementation is fundamentally the same. There is no chance of a memory leak just because the key for a ThreadLocal was loaded by the web application.

The logging is only useful and will only serve its intended purpose (fixing memory leaks) if it is accurate. Logging non-issues at anything above debug is just going to annoy people.