Bug 57172

Summary: NullPointerException in findResources of WebappClassLoader
Product: Tomcat 8 Reporter: Joern Huxhorn <jhuxhorn>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Severity: enhancement CC: jhuxhorn
Priority: P2    
Version: 8.0.14   
Target Milestone: ----   
Hardware: PC   
OS: All   

Description Joern Huxhorn 2014-10-31 12:24:40 UTC
Enumeration<URL> findResources(String name) of class org.apache.catalina.loader.WebappClassLoaderBase (for Tomcat 8.0.14, the code was located in org.apache.catalina.loader.WebappClassLoader for Tomcat 8.0.12) throws a NullPointerException if the "resources" attribute is null.

I'm unsure why it is null in some cases (restarting the server, instead of simply redeploying one of our apps, fixes the issue for us) but this should be fixed in the classloader implementation anyway.

The code looks like this

        WebResource[] webResources = resources.getClassLoaderResources(path);
        for (WebResource webResource : webResources) {
            if (webResource.exists()) {

and should be replaced by

        if (resources != null) {
            WebResource[] webResources = resources.getClassLoaderResources(path);
            for (WebResource webResource : webResources) {
                if (webResource.exists()) {

Searching for this problem returns results as old as 2006.

Other code that's using "resources" without null check:
    public boolean modified()
    public void start()
    protected ResourceEntry findResourceInternal(final String name, final String path)

Code that already acknowledges that "resources" may be null:
    public String getContextName() {
        if (resources == null) {
            return "Unknown";
        } else {
            return resources.getContext().getName();

Full stacktrace of our specific issue (running on Tomcat 8.0.12):
	at org.apache.catalina.loader.WebappClassLoader.findResources(WebappClassLoader.java:998)
	at java.lang.ClassLoader.getResources(ClassLoader.java:1139)
	at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:348)
	at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
	at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
	at javax.xml.stream.FactoryFinder$1.run(FactoryFinder.java:352)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.xml.stream.FactoryFinder.findServiceProvider(FactoryFinder.java:341)
	at javax.xml.stream.FactoryFinder.find(FactoryFinder.java:313)
	at javax.xml.stream.FactoryFinder.find(FactoryFinder.java:227)
	at javax.xml.stream.XMLInputFactory.newFactory(XMLInputFactory.java:205)

The XMLInputFactory.newFactory() is executed in initialValue() of a ThreadLocal<XMLInputFactory>.
Comment 1 Mark Thomas 2014-11-01 16:46:40 UTC
resources should only be null when the web application class loader is not running. Methods that may be called when the class loader is stopped (generally management methods to get the name etc.) handle the case where resources is null. Methods that are not expected to be called while the class loader is stopped (such as findResources) do not handle the null.

The NPE you are getting is indicative of trying to use the web application class loader after it has been stopped.

I suspect you have a memory leak and something is retaining a reference the class loader that shouldn't be.
Comment 2 Joern Huxhorn 2014-11-02 00:00:16 UTC
How about indicating that the web application is trying to use the web application class loader after it has been stopped instead of throwing a non-descriptive NPE?

The webapp in question was simply using a Java 8 forkjoin.

	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1540)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
	at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747)
	at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721)
	at java.util.stream.AbstractTask.compute(AbstractTask.java:316)
	at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:902)
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1689)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Are you seriously suggesting that leaving the current behavior is preferable to, say, a proper error message?

Beside that: the webapp behaved like that without being stopped. This is related to undeploy magic performed by Tomcat regarding ThreadLocal and this problem shows up after RE-deploying an application without restarting Tomcat.

As I mentioned in my original report: if you search for this on the web then you'll find mentions of similar problems from 2006. I can't be sure since they just refer to generic NPE's in the findResources method spread over various Tomcat versions.

If you consider the usage of a ThreadLocal a memory leak in our app then, yes, we have a memory leak. I, on the other hand, would argue that it's the responsibility of the application server to shield web-applications from leaks like that by using separate worker threads for each webapp.

I'm fully aware how this is supposed to work. But it isn't. I just observed this NPE in a live web application.

Alright. Let's settle for a compromise. Just add

if(resources == null) {
    throw new IllegalStateException("OMG! SNAFU! This can't happen! findResources has been called on a (probably) stopped web context! /o\\");

This would be easier to search for than a simple NPE and would probably prove my point in the future.
Comment 3 Mark Thomas 2014-11-02 01:04:40 UTC
The INVALID response was more aimed at the suggestion that the correct way to handle this would be a check for null. Whatever the root cause, that would simply hide the symptom rather than fix the problem which is never a good idea.

The ISE is a much better idea although I'd change the wording since it is the state of the web application class loader rather than the application that is the primary concern. Better still, would be to include a check of that state as well just in case someone manages to trigger this via some other route.

First you say this happened without the web application being stopped. Then you say it happened after a redploy (which includes a stop followed by a start). I guess you mean the problem happened after the web app was redeployed but while that redployed web application was running. That would be consistent with the reported symptoms.

Yes I do consider the use of a ThreadLocal that exists outside the scope of a single web application without being cleaned up when that application stops a memory leak. The Servlet spec is (currently) silent on the use of ThreadLocals and on thread pools being shared across multiple applications. Feel free to share you views with the Servlet EG in this issue:

Personally I am in favour of a single thread pool. Partly for efficiency, partly for performance (per application thread pools would still require a common thread pool to handle a request until the correct web app was identified and then hand off the request to the correct thread pool). If you really want the isolation of per application thread pools then it is probably simpler to just deploy each application to a separate instance.

There are alternative solutions to using ThreadLocals that may be appropriate depending on the circumstances. For example, the SecureRandom instances Tomcat uses to generate session IDs:

I'm re-opening this as an enhancement request for a better error message in this case.
Comment 4 Mark Thomas 2014-11-17 07:49:19 UTC
Fixed in 9.0.x and 8.0.x for 8.0.16 onwards.