Bug 51276

Summary: Startup time is too high if there are few JARs in "lib/" and a few webapps.
Product: Tomcat 7 Reporter: Alex Dupre <ale>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Severity: regression CC: lapo
Priority: P2    
Version: 7.0.14   
Target Milestone: ---   
Hardware: PC   
OS: FreeBSD   
Attachments: Test case showing the issue.

Description Alex Dupre 2011-05-27 11:09:03 UTC
Startup time in tomcat 7.0.14 is 10x higher than 7.0.12, if there are a few shared JARs in tomcat/lib and a few webapps.
The issue is very simple to reproduce, simply add 10 empty directories in tomcat/webapps and (for example) the Metro webservices jars in tomcat/lib. On my machine the startup time increase from 0.5 secs (default installation) to 35.2 secs.
When using tomcat 7.0.12 the startup time is about 3 secs on my machine in this test case.
Comment 1 Alex Dupre 2011-05-27 14:49:54 UTC
I've debugged the problem and found the issue. This is the incriminated commit:

While scanning JARs for TLDs and fragments, avoid using JarFile and use JarInputStream as in most circumstances where JARs are scanned, JarFile will create a temporary copy of the JAR rather than using the resource directly. This change significantly improves startup performance for applications with lots of JARs to be scanned. (markt)

The last sentence is clearly false, since JarFile is optimized for random access, while JarInputStream has to read the entire file.

I'm attaching a very simple test case, showing the issue. This is a sample output:

%java TomcatSlowTest webservices-rt.jar
Time elapsed with JarFile (Tomcat 7.0.12): 1 ms
Time elapsed with JarInputStream (Tomcat 7.0.14): 458 ms
Comment 2 Alex Dupre 2011-05-27 14:51:01 UTC
Created attachment 27076 [details]
Test case showing the issue.
Comment 3 Alex Dupre 2011-05-27 15:14:18 UTC
Other notes:
- the issue is reproducible on all platforms (tested configurations: OpenJDK6 on FreeBSD and Oracle JDK 1.6 on Windows)
- even modifying the test case to look for an existing file (and in the first bytes of the jar), the JarInputStream is slower than JarFile
- JarFile has most parts in native code, while JarInputStream is 100% Java code
Comment 4 Mark Thomas 2011-05-27 15:43:20 UTC
The attached test case isn't representative of how Tomcat handles JARs. The references to the JARs are passed as URLs. The temporary copy that JarFile creates in this case has a much more significant impact on performance than using JarInputStream.

That said, the increase in start time isn't good and is worth investigating further. If (as is likely) it is related to the switch to JarInputStream, then it may be possible to determine if the URL points to a file and switch to JarFile in that case.
Comment 5 Chuck Caldarale 2011-05-27 16:06:59 UTC
(In reply to comment #3)
> - JarFile has most parts in native code, while JarInputStream is 100% Java code

Sorry, that's completely untrue.  All I/O is done by native code in the JRE, and all ZIP-format handling is done by native code in the standard zlib library.
Comment 6 Alex Dupre 2011-05-27 16:58:33 UTC
I wanted to create the most simple test case to show up the *enormous and unjustified* performance difference of JarFile vs JarInputStream on local files. Passing a URL pointing to a local file doesn't change anything (yes, I tried), probably you can see a speedup if the url is pointing to a very slow location (when does it happen?). So I agree with you to use JarFile if the file is local and JarInputStream for other cases.
Comment 7 Mark Thomas 2011-05-30 08:55:57 UTC
Your test cases are not representative of the typical jar handling within Tomcat.

As is explained in the code commentary in the patch, Jars in web applications are referenced via JNDI URLs. Using JarFile with these URLs (or any non-file URL) triggers the creation of a full copy of the jar file in the temp directory. This is significantly slower than accessing the jar with JarInputStream.

The issue you are seeing is as a result of using shared jars. While supported, such an approach is not recommended due to the complications it can create both when upgrading and with memory leaks on reload with some libraries. The shared jars are referenced via file URLs so the switch to JarInputStream will result in a slow down in that case.

As per comment #4, I'll see if there is an easy way to switch to JarFile for jars referenced via file URLs rather than JNDI URLs
Comment 8 Mark Thomas 2011-06-02 11:12:32 UTC
This was doable and has been added to 7.0.x. It will be included in 7.0.15 onwards.
Comment 9 Alex Dupre 2011-06-06 07:04:32 UTC
Is there an ETA for tomcat 7.0.15?