Bug 59001

Summary: Unable to load jar files when they have exclamation in the path
Product: Tomcat 7 Reporter: Jayashankar Karnam <prashu16k>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 7.0.42   
Target Milestone: ---   
Hardware: PC   
OS: All   

Description Jayashankar Karnam 2016-02-13 20:00:05 UTC
My workspace contains an exclamation in the path, unfortunately when Tomcat tries to load the jar files, entire path is broken into pieces when the second exclamation mark is encountered.

Let me know if you need more information.

This is the stacktrace -

Feb 12, 2016 11:45:27 PM org.apache.catalina.startup.TldConfig tldScanJar
WARNING: Failed to process JAR [jar:file:/G:/TEST!Maven!/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps/test/WEB-INF/lib/asm-3.3.1.jar!/] for TLD files
java.io.FileNotFoundException: G:\TEST!Maven (The system cannot find the file specified)
	at java.util.zip.ZipFile.open(Native Method)
	at java.util.zip.ZipFile.<init>(ZipFile.java:219)
	at java.util.zip.ZipFile.<init>(ZipFile.java:149)
	at java.util.jar.JarFile.<init>(JarFile.java:166)
	at java.util.jar.JarFile.<init>(JarFile.java:103)
	at sun.net.www.protocol.jar.URLJarFile.<init>(URLJarFile.java:93)
	at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)
	at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:109)
	at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
	at sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
	at org.apache.tomcat.util.scan.FileUrlJar.<init>(FileUrlJar.java:41)
	at org.apache.tomcat.util.scan.JarFactory.newInstance(JarFactory.java:34)
	at org.apache.catalina.startup.TldConfig.tldScanJar(TldConfig.java:489)
	at org.apache.catalina.startup.TldConfig.access$100(TldConfig.java:59)
	at org.apache.catalina.startup.TldConfig$TldJarScannerCallback.scan(TldConfig.java:305)
	at org.apache.tomcat.util.scan.StandardJarScanner.process(StandardJarScanner.java:259)
	 at org.apache.tomcat.util.scan.StandardJarScanner.scan(StandardJarScanner.java:178)
	at org.apache.catalina.startup.TldConfig.execute(TldConfig.java:278)
	at org.apache.catalina.startup.TldConfig.lifecycleEvent(TldConfig.java:569)
	at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
	at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
	at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5322)
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
	at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
	at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Feb 12, 2016 11:45:27 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error getConfigured
Feb 12, 2016 11:45:27 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [/test] startup failed due to previous errors
Comment 1 Christopher Schultz 2016-02-16 15:40:12 UTC
Typically, the ! character means that the sought resource is actually inside of a JAR file. The URL you have has a ! after an on-disk path, but not one that (looks like it) is a JAR file (G:\TEST).

That looks like an invalid JAR URL to me.

Or has Tomcat built a bad URL out of some other path?

Or, are you saying that your on-disk path is actually "G:\TEST!Maven!"?

This may be a pathological use case, but neither NTFS nor any of the *NIX filesystems I checked have any prohibition against ! characters, which are special for JAR URLs. (None of those filesystems prohibit # marks, either, which could potentially be a problem.)
Comment 2 Jayashankar Karnam 2016-02-16 19:51:28 UTC
My disk path is "G:\TEST!Maven!\..."

Indeed, it is a corner use case. 
Is there any reason why tomcat is solely relying on "!" to identify end of the string? When we know that is the path to find specified resource, why "!" in the end?
Comment 3 Christopher Schultz 2016-02-16 22:09:19 UTC
Honestly, I'm surprised that it's failing where it is: I would have expected it to fail saying that "G:\TEST" didn't exist.

This isn't Tomcat doing this; it's the combination of a large number of components all of which are using URLs for certain purposes. There are many many corner cases where the JRE itself will fall-over even if Tomcat wasn't involved.

I don't believe Tomcat massages any of the URLs it's processing, so there may be an opportunity for Tomcat to escape the ! characters in an on-disk filename (i.e. "!" -> "%21"). But like I said, there's always more and more edge-cases and encoding once means possibly encoding multiple times (and sometimes having to decode a few times, too). It's just a giant mess.
Comment 4 Jayashankar Karnam 2016-02-16 22:59:52 UTC
JarURLConnection is responsible for the mishap. As per java doc "!/" is a terminator for the jar file. That's the reason why it is failing at G:/TEST!Maven not G:/TEST

And whatever comes after the "!/" becomes the context within the jar.

Java doc link - https://docs.oracle.com/javase/7/docs/api/java/net/JarURLConnection.html

Sorry for the confusion.

-Jay
Comment 5 Christopher Schultz 2016-02-17 20:09:44 UTC
No problem. The only question is whether or not Tomcat knows at the time that the URL it's building is a physical on-disk resource. If Tomcat does know this, it can escape special characters like "!".
Comment 6 Mark Thomas 2016-02-22 22:28:28 UTC
I really wanted to fix this but I'm not sure that supporting this use case is worth the cost.

There are two places I have found (so far) where changes would be required. The first is during start-up to ensure that the paths used to construct the URLs for the class loaders escape "!/" to "%21/".

The second is in the web resources implementation where FileResource.getURL() needs to escape "!/" to "%21/".

The problem stems from the fact that the only way to do this escaping (that I have been able to find) is URL -> toString() -> replaceAll() -> new URL(). And that is relatively expensive.

I'm not concerned about startup. That is a one-off cost. What concerns me is the performance impact of adding this to FileResource.getURL(). That gets called a lot. I'm concerned that the impact of adding this escaping is going to be measurable for end users.

The other option is to take the position that anytime code constructs a jar URL, that code is responsible for ensuring that any !/ sequences in the path it uses to construct that URL are escaped. While we could do this in Tomcat (there are ~20 places we'd need to fix this), I suspect a whole bunch of third-party code won't handle this correctly. And this is before we get into the mess that is JARs in WARs.

Given that most users don't need this (I don't recall seeing this issue reported previously and that's going back to Tomcat 4.1.x) I'm leaning heavily towards WONTFIX. There is going to need to be a really good reason to fix this to change my mind.
Comment 7 Martti 2016-02-23 16:22:16 UTC
I'm having similar problem but with + sign. Our build system (gradle) creates executable war with Mercurial revision hash. Sometimes this hash can contain "+" as a last character ala "kalkulaator-f83780e3571e+.war"


now when we try to run this war (embedded tomcat 8.0.30) jps will not work. If I rename war to "kalkulaator-f83780e3571e.war" everything is ok.

"java -jar build\libs\kalkulaator-f83780e3571e+.war"

...

"
...
2016-02-23 18:00:00,466 WARN  UUID [o.a.t.u.s.StandardJarScanner] - Failed to scan JAR [jar:file:/C:/Users/desin/code/kalk/build/libs/kalkulaator-f83780e3571e+.war!/WEB-INF/lib/jstl-1.2.jar] from /WEB-INF/lib
java.io.FileNotFoundException: JAR entry WEB-INF/lib-provided/tomcat-embed-core-8.0.30.jar!/javax/servlet/resources/web-jsptaglibrary_1_2.dtd not found in C:\Users\desin\code\kalk\build\libs\kalkulaator-f83780e3571e+.war
        at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:142) ~[na:1.8.0_73]
        at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:150) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:623) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1305) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1271) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:263) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1167) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1050) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:964) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:118) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) ~[na:1.8.0_73]
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643) ~[na:1.8.0_73]
        at org.apache.tomcat.util.digester.Digester.parse(Digester.java:1451) ~[tomcat-embed-core-8.0.30.jar!/:8.0.30]
        at org.apache.tomcat.util.descriptor.tld.TldParser.parse(TldParser.java:76) ~[tomcat-embed-core-8.0.30.jar!/:8.0.30]
        at org.apache.jasper.servlet.TldScanner.parseTld(TldScanner.java:279) [tomcat-embed-jasper-8.0.30.jar!/:8.0.30]
        at org.apache.jasper.servlet.TldScanner$TldScannerCallback.scan(TldScanner.java:315) ~[tomcat-embed-jasper-8.0.30.jar!/:8.0.30]
        at org.apache.tomcat.util.scan.StandardJarScanner.process(StandardJarScanner.java:306) ~[tomcat-embed-core-8.0.30.jar!/:8.0.30]
        at org.apache.tomcat.util.scan.StandardJarScanner.scan(StandardJarScanner.java:162) ~[tomcat-embed-core-8.0.30.jar!/:8.0.30]
        at org.apache.jasper.servlet.TldScanner.scanJars(TldScanner.java:262) [tomcat-embed-jasper-8.0.30.jar!/:8.0.30]
        at org.apache.jasper.servlet.TldScanner.scan(TldScanner.java:106) [tomcat-embed-jasper-8.0.30.jar!/:8.0.30]
        at org.apache.jasper.servlet.JasperInitializer.onStartup(JasperInitializer.java:103) [tomcat-embed-jasper-8.0.30.jar!/:8.0.30]
        at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5244) [tomcat-embed-core-8.0.30.jar!/:8.0.30]
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) [tomcat-embed-core-8.0.30.jar!/:8.0.30]
        at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1408) [tomcat-embed-core-8.0.30.jar!/:8.0.30]
        at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1398) [tomcat-embed-core-8.0.30.jar!/:8.0.30]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_73]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_73]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_73]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_73]
"
Comment 8 Christopher Schultz 2016-02-24 21:59:25 UTC
(In reply to Mark Thomas from comment #6)
> I really wanted to fix this but I'm not sure that supporting this use case
> is worth the cost.
> 
> There are two places I have found (so far) where changes would be required.
> The first is during start-up to ensure that the paths used to construct the
> URLs for the class loaders escape "!/" to "%21/".
> 
> The second is in the web resources implementation where
> FileResource.getURL() needs to escape "!/" to "%21/".
> 
> The problem stems from the fact that the only way to do this escaping (that
> I have been able to find) is URL -> toString() -> replaceAll() -> new URL().
> And that is relatively expensive.
> 
> I'm not concerned about startup. That is a one-off cost. What concerns me is
> the performance impact of adding this to FileResource.getURL(). That gets
> called a lot. I'm concerned that the impact of adding this escaping is going
> to be measurable for end users.

What about mutating the "classpath" used by a ClassLoader when it's constructed? That way, we could take the hit of URL -> String -> URL maybe one time for a context. I don't understand the implications of the way the ClassLoader works, so I may be making an insane proposal ;)

> The other option is to take the position that anytime code constructs a jar
> URL, that code is responsible for ensuring that any !/ sequences in the path
> it uses to construct that URL are escaped. While we could do this in Tomcat
> (there are ~20 places we'd need to fix this), I suspect a whole bunch of
> third-party code won't handle this correctly. And this is before we get into
> the mess that is JARs in WARs.
> 
> Given that most users don't need this (I don't recall seeing this issue
> reported previously and that's going back to Tomcat 4.1.x) I'm leaning
> heavily towards WONTFIX. There is going to need to be a really good reason
> to fix this to change my mind.

If we expect external code to do its own URL-escaping, it doesn't really change the current behavior. I don't think that would be a horrible change, since it would make common cases work (where only Tomcat is involved), and it wouldn't break any of the other cases because they would already be broken (right?).
Comment 9 Mark Thomas 2016-02-24 22:58:49 UTC
(In reply to Christopher Schultz from comment #8)
> (In reply to Mark Thomas from comment #6)

> > I'm not concerned about startup. That is a one-off cost. What concerns me is
> > the performance impact of adding this to FileResource.getURL(). That gets
> > called a lot. I'm concerned that the impact of adding this escaping is going
> > to be measurable for end users.
> 
> What about mutating the "classpath" used by a ClassLoader when it's
> constructed? That way, we could take the hit of URL -> String -> URL maybe
> one time for a context. I don't understand the implications of the way the
> ClassLoader works, so I may be making an insane proposal ;)

That is exactly what I am proposing for start-up. It is a one-off cost so no big deal.

> > The other option is to take the position that anytime code constructs a jar
> > URL, that code is responsible for ensuring that any !/ sequences in the path
> > it uses to construct that URL are escaped. While we could do this in Tomcat
> > (there are ~20 places we'd need to fix this), I suspect a whole bunch of
> > third-party code won't handle this correctly. And this is before we get into
> > the mess that is JARs in WARs.
> > 
> > Given that most users don't need this (I don't recall seeing this issue
> > reported previously and that's going back to Tomcat 4.1.x) I'm leaning
> > heavily towards WONTFIX. There is going to need to be a really good reason
> > to fix this to change my mind.
> 
> If we expect external code to do its own URL-escaping, it doesn't really
> change the current behavior. I don't think that would be a horrible change,
> since it would make common cases work (where only Tomcat is involved), and
> it wouldn't break any of the other cases because they would already be
> broken (right?).

I'm still on the fence about this (and that is without looking at the JAR in WAR issue).

I've thought of a few ways we could make FileResource.getURL() handle this without being horribly slow unless you have a path that includes a !/ sequence (in which case being slow is the price you pay for it working) but I'm becoming less convinced that this is the way to go.

The more I think about it, the more I am leaning towards the view that if you take a string and use it to construct a JAR URL then you are responsible for making sure any "!/" sequences are escaped. Before heading down that route I'd want to check how often the ~20 places we'd need to do this are called.
Comment 10 Jayashankar Karnam 2016-03-01 03:10:50 UTC
Either it get's fixed in the next releases or not, I strongly feel there has to be some exception handling which doesn't trick developers to invest more energy on debugging possible root cause within their application.
Comment 11 Mark Thomas 2016-03-01 14:26:35 UTC
Having looked at where we construct Jar URLs (and URLs that may be used to later build Jar URLs) there were a handful of places where we needed to add appropriate escaping. I've implemented the escaping for 9.0.x (it will be in 9.0.0.M4 onwards) and I'm looking at back-porting it. Given the various refactorings I'm not sure how far back this fix will be back-ported.

Regarding the use of '+' in file names (comment #7) that looks like a separate issue. Please open a separate BZ issue for that and provide s set of steps to reproduce the issue.
Comment 12 Mark Thomas 2016-03-01 23:05:18 UTC
The fix has been back-ported to 8.0.x for 8.0.33 onwards and to 7.0.x for 7.0.69 onwards.