Bug 48895

Summary: WebAppClassLoader.clearThreadLocalMap() concurrency issues
Product: Tomcat 6 Reporter: Sylvain Laurent <slaurent>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Severity: normal    
Priority: P2    
Version: 6.0.24   
Target Milestone: default   
Hardware: All   
OS: All   

Description Sylvain Laurent 2010-03-11 22:20:01 UTC
I think that the memory leak protection of WebAppClassLoader.clearReferencesThreadLocals() which detects and clears ThreadLocals that would prevent GC the WebAppClassLoader instance has issues regarding concurrency :

- It enumerates Threads and looks into internal structures of the Thread class, but there are no "memory barrier" that would ensure a consistent state of the ThreadLocalMap being examined.
So, it is theoretically possible that a ThreadLocal in Thread A was properly cleaned up by the application, but the current thread B (that is undeploying the application) does not see the up to date state because there's no synchronization between those threads.

- Much more severe : after detecting such a leak, it invokes java.lang.ThreadLocal.ThreadLocalMap.remove(ThreadLocal) on Thread A's ThreadLocalMap instance but the invocation is done by Thread B (the thread that undeploys the app). The remove() method is not thread safe at all, and nor is the expungeStaleEntries() method which may also be invoked in clearThreadLocalMap().
So, if a webapp is being undeployed while other applications continue to receive a heavy load of requests, this could corrupt the internal structures of the ThreadLocalMap instance !

I propose to keep the detection of leaks as it is, but to make the actual clearing optional (and disabled by default) to avoid encountering big problems in production.

Idea to improve the clearing in a safe way : if the thread that is "provoking" the leak is one of tomcat's worker threads, we could mark it as "dirty", and then have a background task that would end such threads (renew those threads in the pool).
Comment 1 Mark Thomas 2010-03-29 15:04:58 UTC
This has been fixed in trunk and proposed for 6.0.x
Comment 2 Sylvain Laurent 2010-03-29 17:09:43 UTC
Thanks for the change.
So, I guess now we can safely clean thread locals only in threads controlled by tomcat (i.e. worker threads) and by one of these 2 methods :
- either renew threads in the pool
- cleanup all threadlocals after each request when the thread is returned to the pool (during the call to ThreadPoolExecutor.afterExecute() )
Comment 3 Mark Thomas 2010-04-20 13:31:23 UTC
This has been made optional in 6.0.x and will be included in 6.0.27 onwards.

I'll create a new BZ enhancement for Tomcat 7 to track the better ways of doing this.
Comment 4 quartz 2010-10-27 15:32:18 UTC
If the only threads accessing the instances of a context are the connector's threads, and if you kill those threads when you remove a context, then you cannot have a leak.

Sadly, tomcat is ass backwards and has a pool of thread per connectors owned by the service instead of owned by the context. Thus you cannot destroy threads without destroying all context under all hosts under all engines under the service.

Solution: destroying a context must mark a thread that visited it for termination, asap.

That means the thread would not return in the pool, and the pool would create a replacement thread.

That is subpar for other ctx/host/engine that would loose the threadlocal-cached values, but it is not a requirement for j2ee to be efficient under such operation, and the app is still supposed to work as it should expect a threadlocal to be null at any time.