Bug 59226 - StandardJarScanner ignores jars in manifest Class-path header
Summary: StandardJarScanner ignores jars in manifest Class-path header
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Util (show other bugs)
Version: 8.0.32
Hardware: PC All
: P2 normal (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-23 23:33 UTC by Ignacio del Valle Alles
Modified: 2016-09-16 14:13 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ignacio del Valle Alles 2016-03-23 23:33:44 UTC
Running an embedded instance as 'java -jar <thin-jar-with-dependencies-in-manifest>' StandardJarScanner ignores jars in manifest Class-path header.

The reason is system class loader does not expose these jars via getURLs() as explained in the following SO question.
 
http://stackoverflow.com/questions/35922072/scanning-manifest-classpath-jars-in-embeeded-tomcat/36185408#36185408
Comment 1 Ignacio del Valle Alles 2016-03-23 23:38:34 UTC
Added PR in GitHub

https://github.com/apache/tomcat80/pull/5
Comment 2 Mark Thomas 2016-04-27 19:22:48 UTC
Notes:
- The web application class loader will ignore any Class-Path entries in a manifest.
- Any class loader above that (common, server, shared, etc.) will process Class-Path entries for any JAR on the class path.
- The proposed patch requires JARs referenced by the Class-Path header must contain a manifest in order to be scanned. A solution without this limitation is preferred.
Comment 3 Ignacio del Valle Alles 2016-04-28 10:28:11 UTC
Hi Mark, this  problem comes for system (application) class loader, that is responsible of handling class-path entries, not returning manifest jar urls when executed as java -jar.

If there is a portable way of knowing the executable jar at runtime, the solution would be trivial: just parse its manifest class path entries. But there is not (AFAIK)

The suggested change does not get jar entries from the class-path header of the manifest, instead it infers jar entries from the enumeration of manifest that it sees (and its parent don't)

It is true that it requires jars referenced in the manifest of the executable jar to have a manifest in order to be scanned, but this improves the current behavior of not scanning them at all

For code simplicity I added the same processing for every class loader in Tomcat, but it is true, that I adds a little overhead, so I have just created a new pull request that applies the hack only to the system class loader.
Comment 4 Mark Thomas 2016-04-28 10:39:30 UTC
(In reply to Ignacio del Valle Alles from comment #3)
> Hi Mark, this  problem comes for system (application) class loader, that is
> responsible of handling class-path entries, not returning manifest jar urls
> when executed as java -jar.

No, the problem is wider than that. See the second of my notes in comment #2 above.

> If there is a portable way of knowing the executable jar at runtime, the
> solution would be trivial: just parse its manifest class path entries. But
> there is not (AFAIK).

Not relevant. See above.

> The suggested change does not get jar entries from the class-path header of
> the manifest, instead it infers jar entries from the enumeration of manifest
> that it sees (and its parent don't).
>
> It is true that it requires jars referenced in the manifest of the
> executable jar to have a manifest in order to be scanned, but this improves
> the current behavior of not scanning them at all

It is arguable whether a flawed solution is better than no solution. My preference is to fix it properly.

> For code simplicity I added the same processing for every class loader in
> Tomcat, but it is true, that I adds a little overhead, so I have just
> created a new pull request that applies the hack only to the system class
> loader.

Limiting the scan to the system class loader is as flawed as including the web application class loader.
Comment 5 Mark Thomas 2016-04-28 10:47:24 UTC
Hint: Rather than inferring the JARs added to the class path via one or more manifests, process the manifests and determine the JARs that have been added.
Comment 6 Ignacio del Valle Alles 2016-04-28 11:53:13 UTC
(In reply to Mark Thomas from comment #4)
> (In reply to Ignacio del Valle Alles from comment #3)
> > Hi Mark, this  problem comes for system (application) class loader, that is
> > responsible of handling class-path entries, not returning manifest jar urls
> > when executed as java -jar.
> 
> No, the problem is wider than that. See the second of my notes in comment #2
> above.
> 
> > If there is a portable way of knowing the executable jar at runtime, the
> > solution would be trivial: just parse its manifest class path entries. But
> > there is not (AFAIK).
> 
> Not relevant. See above.
> 
> > The suggested change does not get jar entries from the class-path header of
> > the manifest, instead it infers jar entries from the enumeration of manifest
> > that it sees (and its parent don't).
> >
> > It is true that it requires jars referenced in the manifest of the
> > executable jar to have a manifest in order to be scanned, but this improves
> > the current behavior of not scanning them at all
> 
> It is arguable whether a flawed solution is better than no solution. My
> preference is to fix it properly.
> 
> > For code simplicity I added the same processing for every class loader in
> > Tomcat, but it is true, that I adds a little overhead, so I have just
> > created a new pull request that applies the hack only to the system class
> > loader.
> 
> Limiting the scan to the system class loader is as flawed as including the
> web application class loader.

My reasoning was based on the wrong assumption that these entries were only read by the System class loader and only when executed as "java -jar" (don't ask me why). So yes, the solution is simpler and can be implemented just by scanning the manifest. I'll work on it and send a new PR
Comment 7 Ignacio del Valle Alles 2016-04-29 08:51:05 UTC
Just added a new PR on GitHub.
Basically what it does is from the jar url returned by a url class loader start a DFS traversal over the graph of related jars (the relation is transitive)
Comment 8 Mark Thomas 2016-05-03 08:44:00 UTC
Update.

I've spent a fair amount of time looking at the proposed patch to see if there is a way to avoid processing a JAR once to look at the class path and again to scan it. I think it is possible but it is going to need some refactoring.

Regarding the proposed patch, it doesn't handle all of the cases that the Jar Scanner does (packed WARs, exploded JARs, etc.) which is why I'd prefer to do this in a single scan.

Reviewing the code has also uncovered some minor bugs that need to be fixed.
Comment 9 Ignacio del Valle Alles 2016-05-03 10:00:22 UTC
I see. I'll think about it to too.
Can you give more details on these minor bugs?
Comment 10 Mark Thomas 2016-05-04 13:05:47 UTC
Thanks for the report and the proposed patches. While the proposed patches weren't used, it was very helpful to have an alternative point of view when exploring options for resolving this issue.

The fix has been applied to:
9.0.x for 9.0.0.M5 onwards
8.5.x for 8.5.1 onwards
8.0.x for 8.0.34 onwards
Comment 11 Ignacio del Valle Alles 2016-05-04 13:22:06 UTC
Thank you
Comment 12 saschakarcher 2016-08-05 15:24:45 UTC
Just found an interesting problem:
We're trying to migrate a bunch of webapps from tomcat 7 to 8.5.4 with the goal in mind not to change too much.
Our Tomcat has got a whole bunch of 3rd party libraries in a separate folder which is loaded by the common (or shared) class loader (i.e. not in WEB-INF/lib).

Since current version of StandardJarScanner now also follows the class-path in header in the JARs manifest.mf, it tries and fails to load all those additional JARS mentioned in the manifest. But we never hat these additional JARS in classpath and collecting them all is a bit of not so funny work although feasible.

Don't get me wrong. I understand that following the class-path mentioned in the manifest.mf files is technically totally correct. But it somehow complicates migration of already existing apps.

Wouldn't it be useful to have sort of "don't resolve class-path header" switch (with default FALSE) but giving the the option to do so?

What do you guys thing?

Or have I just missed an already existing workaround?
Comment 13 Mark Thomas 2016-08-05 18:38:53 UTC
I haven't checked the code but you might be able to use jarsToSkip or a JarScanFilter to not scan them.

Failing that, a new enhancement request to make the manifest processing option sounds reasonable. I'd avoid the double negative though and have processManifest (or similar) with a default of true.
Comment 14 saschakarcher 2016-08-08 15:39:07 UTC
I totally agree that this double negation would be confusing.

Unfortunatelly jarsToSkip and <JarScanFilter> does not help in that case, because at least one of the JARs contains TLDs (Which I do want to be scanned) and additionally the unwanted class-path entry (which I don't want the StandardJarScanner to follow) in it's manifest.

I can't judge how specific this problem is and if other tomcat users experience similar problems but once I open an enhancement request the experienced developers hopefully judge according to their experience.
Comment 15 Mauro Molinari 2016-09-16 14:09:42 UTC
I think I also encountered this problem after deploying in Tomcat 8.5.5 a webapp previously run in Tomcat 8.0.14.
I have a lot of WARNING: file not found for JARs mentioned in useless Class-Path entries in JARs deployed in WEB-INF/lib, like jaxb-impl-2.1.13.jar (which mentions "jaxb-api.jar activation.jar jsr173_1.0_api.jar jaxb1-impl.jar") or 
xalan-2.7.2.jar (which mentions "xercesImpl.jar xml-apis.jar serializer.jar").

Fact is that some of those JARs (at least those I really need) are deployed with different names (with a version specifier, for instance), as resolved by tools like Gradle or Maven).

So my Tomcat log on startup is now full of these warnings that look like errors a lot (they're complete with stack traces).

Maybe lowering the log output to debug level and/or remove stack traces could also be much less intrusive than the current bevahiour?
Comment 16 Mark Thomas 2016-09-16 14:13:08 UTC
See bug 59961