Bug 64506 - NullPointerException when loading webapp class
Summary: NullPointerException when loading webapp class
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 9.0.27
Hardware: All Linux
: P2 major (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2020-06-09 06:50 UTC by Arvind Talari
Modified: 2020-06-16 04:58 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Arvind Talari 2020-06-09 06:50:23 UTC
We have upgraded our Tomcat version to 9.0.27 from 8.0.42, since then we have been pretty frequently running into the below NullPointerException when our application classes get loaded, we believe this is due a concurrency issue (more details below). 
        at org.apache.catalina.webresources.CachedResource.getURL(CachedResource.java:317)
        at org.apache.catalina.webresources.FileResource.getCodeBase(FileResource.java:277)
        at org.apache.catalina.loader.WebappClassLoaderBase.findClassInternal(WebappClassLoaderBase.java:2350)
        at org.apache.catalina.loader.WebappClassLoaderBase.findClass(WebappClassLoaderBase.java:865)
        at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1334)
        at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1188)

Our Environment: 
Tomcat Version: 9.0.27. Our application classes exploded to WEB-INF/classes.

Our Investigation:
To troubleshoot the issue, we have looked at code in https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/Cache.java and  https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/CachedResource.java#L70, and it appears that we would run into this error if CachedResource is used when org.apache.catalina.webresources.CachedResource#webResource isn't initialized (i.e. stays null).

Upon further debugging and viewing code in https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/Cache.java#L59, it seems like it is possible that CachedResource could be used when its #webResource isn't initialized when two threads concurrently ask for the same resource but each with a different value for the boolean useClassLoaderResources.

Consider this for example with 2 threads calling into Cache#getResource(String path, boolean useClassLoaderResources) for the same resource but with two different values for useClassLoaderResources and the resource is not in cache, both threads end up at line https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/Cache.java#L82, then:

Thread 1: 
A new CachedResource is created at line https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/Cache.java#L77
and put into cache at line https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/Cache.java#L82
but CachedResource#validateResource (where CachedResource#webResource is initialized) is not called yet, this would happen at line  https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/Cache.java#L87.

Thread 2:
A new CachedResource is created at line  https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/Cache.java#L77  
but finds the CachedResource in the cache (put in by the above thread) at line https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/Cache.java#L82
and calls CachedResource#validateResource at line https://github.com/apache/tomcat/blob/9.0.27/java/org/apache/catalina/webresources/Cache.java#L112, 
but #validateResource doesn't do anything and returns because useClassLoaderResources is false, and so CachedResource#webResource remains uninitialized.  
Assuming that Thread 1 hasn't initialized #webResource yet, when this thread (Thread 2) calls into CachedResource#getURL we would run into this error.

Looks like the changes in the revision https://svn.apache.org/viewvc?view=revision&revision=1831828 are related. This seems like a concurrency issue, and we haven't seen this addressed in newer versions (from the changelog here https://ci.apache.org/projects/tomcat/tomcat9/docs/changelog.html)
Comment 1 Mark Thomas 2020-06-09 21:09:27 UTC
Thanks for the report and the careful analysis. I think the analysis has identified the root cause. I'l look into a fix.

Out of curiosity, what resources are being looked up as class loader resources and non-class loader resources in parallel?
Comment 2 Arvind Talari 2020-06-09 21:33:57 UTC
Thank you for your prompt reply.

The resource we found that was being looked up concurrently is "/WEB-INF/classes/".

Thread 1 could be doing something like this Thread#getContextClassLoader()#getResource("") for a class loader resource.

And Thread 2 could be doing org.apache.catalina.WebResourceRoot#getResource("/WEB-INF/classes/") (from org.apache.catalina.webresources.FileResource#getCodeBase) for a non-class loader resource.  This corresponds to the NPE stack trace shown in the description
Comment 3 Mark Thomas 2020-06-10 16:19:51 UTC
Fixed in:
- master for 10.0.0-M7 onwards
- 9.0.x for 9.0.37 onwards
- 8.5.x for 8.5.57 onwards

7.0.x is not affected.

It would be great if you were able to test this. Our you able to build Tomcat from source? If not, I can provide a snapshot build for you to test.
Comment 4 Arvind Talari 2020-06-10 18:16:10 UTC
Thank you Mark for the quick turn around.
Yes, we are able to build from source, being a race condition it isn't reproducible at will, we will need to manually simulate the race condition.

Although, we haven't run into this issue, I am also wondering if a similar fix needs to be made in org.apache.catalina.webresources.Cache#getResources.
Comment 5 Arvind Talari 2020-06-10 18:25:12 UTC
Hmmm.. actually may be it is not required, since org.apache.catalina.webresources.CachedResource#validateResources isn't conditionally validating.
Comment 6 Arvind Talari 2020-06-16 04:58:26 UTC
Tested the fix with a simulated race condition, and it worked. Thanks again for the quick turn around.