Bug 55166 - schemaLocation references between servlet and jsp XSDs are invalid
Summary: schemaLocation references between servlet and jsp XSDs are invalid
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 8.0.x-trunk
Hardware: PC Mac OS X 10.4
: P2 enhancement (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
: 55313 (view as bug list)
Depends on:
Reported: 2013-06-30 20:03 UTC by Jeremy Boynes
Modified: 2014-02-17 13:46 UTC (History)
1 user (show)

Patch including XSDs downloaded from Sun/JCP website, modified to include CDDL election by ASF (52.88 KB, patch)
2013-06-30 20:09 UTC, Jeremy Boynes
Details | Diff
full output from test run (13.16 KB, text/plain)
2013-07-03 03:29 UTC, Jeremy Boynes
Patch to add testcase to trunk (2.65 KB, text/plain)
2013-07-03 03:34 UTC, Jeremy Boynes
Same as 30527 but with ASF header and link to this issue (3.54 KB, patch)
2013-07-03 03:46 UTC, Jeremy Boynes
Details | Diff
Start to refactor SchemaResolver not to key only on basename (25.15 KB, patch)
2013-07-07 00:22 UTC, Jeremy Boynes
Details | Diff
Updated patch (143.68 KB, patch)
2013-07-08 01:07 UTC, Jeremy Boynes
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Boynes 2013-06-30 20:03:24 UTC
The javax/servlet/resources package does not contain the JSP specification XSDs. These are referenced from the web-app or web-common xsds and define elements like <jsp-config>.
Comment 1 Jeremy Boynes 2013-06-30 20:09:30 UTC
Created attachment 30508 [details]
Patch including XSDs downloaded from Sun/JCP website, modified to include CDDL election by ASF

Modification only adds the text:

      The Apache Software Foundation elects to include this software under the
      CDDL license.

quoted from web-app_3_1.xsd and added per instructions in original files.
Comment 2 Konstantin Kolinko 2013-06-30 23:27:37 UTC
What is your issue?

Those DTDs do not belong to the Servlet spec, they belong to the JSP one.
You'll find them in jsp-api.jar

Closing as INVALID
Comment 3 Jeremy Boynes 2013-07-01 03:47:29 UTC
(In reply to Konstantin Kolinko from comment #2)

> You'll find them in jsp-api.jar
I missed that they were there, sorry for that. 

> What is your issue?
I ran into a problem trying to use xjc because web-common_3_1.xsd contains

  <xsd:include schemaLocation="jsp_2_3.xsd"/>

and which did not resolve. The same applies to jsp_2_3.xsd when it is in the jsp jar because it in turn contains

  <xsd:include schemaLocation="javaee_7.xsd"/>

which is back in the servlet jar.

I updated the issue summary to reflect this.
Comment 4 Mark Thomas 2013-07-02 09:52:18 UTC
Do you still see the issue if both the servlet-api.jar and jsp-api.jar are available on the class path?

I'd expect the answer to be no since the TCK tests run with validation enabled for the XML parser and there are no issues resolving the various schemas.
Comment 5 Jeremy Boynes 2013-07-03 03:29:45 UTC
Created attachment 30526 [details]
full output from test run

I was seeing the problem with xjc running against the resources directory. That may not be a typical manifestation.

However, I put together this testcase for validation

    public void testValidation() throws Exception {
        System.out.println("ServletContext = " + getClass().getResource("/javax/servlet/ServletContext.class"));
        System.out.println("JspFactory = " + getClass().getResource("/javax/servlet/jsp/JspFactory.class"));
        SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        schemaFactory.setErrorHandler(new ErrorHandler() {
            public void warning(SAXParseException exception) throws SAXException {

            public void error(SAXParseException exception) throws SAXException {

            public void fatalError(SAXParseException exception) throws SAXException {
        URL url = ServletContext.class.getResource("/javax/servlet/resources/web-app_3_0.xsd");
        System.out.println("url = " + url);
        Schema schema = schemaFactory.newSchema(url);

which fails with 

org.xml.sax.SAXParseException; systemId: jar:file:/Users/jeremy/apache/apache-tomcat-7.0.41/lib/servlet-api.jar!/javax/servlet/resources/web-common_3_0.xsd; lineNumber: 119; columnNumber: 46; schema_reference.4: Failed to read schema document 'jsp_2_2.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.

Caused by: java.io.FileNotFoundException: JAR entry javax/servlet/resources/jsp_2_2.xsd not found in /Users/jeremy/apache/apache-tomcat-7.0.41/lib/servlet-api.jar

The full output is attached. As you can see, it is looking for jsp_2_2.xsd in same location as the web-common_3_0.xsd resource that is including it. I ran against the lib directory from a download of 7.0.41.

I don't know how the TCK is validating. It may have a custom resolver for its own copies of the XSDs.
Comment 6 Jeremy Boynes 2013-07-03 03:34:27 UTC
Created attachment 30527 [details]
Patch to add testcase to trunk

Patch to add the prior testcase to trunk. It passes if I svn copy the jsp xsds to javax/servlet/resources but I didn't do that in the patch in order to keep file history.

It fails to validate on test/webapp/WEB-INF/web.xml so I also includes changes to make that file conform to the XSD (basically reordering a couple of elements).
Comment 7 Jeremy Boynes 2013-07-03 03:46:49 UTC
Created attachment 30528 [details]
Same as 30527 but with ASF header and link to this issue
Comment 8 Konstantin Kolinko 2013-07-05 11:50:02 UTC
You are doing it wrong.
a) You have to configure a resolver to be able to locate schemas.

Tomcat uses org.apache.catalina.util.SchemaResolver as configured by org.apache.catalina.startup.DigesterFactory.
See DigesterFactory#registerLocalSchema().

b) You can use SchemaFactory.newSchema(Source[]). I do not know whether it helps in your case (maybe it does not help with includes), but it is a way to pass several schema files to a SchemaFactory.

c) You can bundle and redistribute schema files on your own, without relying on ones used internally by Tomcat.
Comment 9 Jeremy Boynes 2013-07-05 20:21:52 UTC
(In reply to Konstantin Kolinko from comment #8)
> a) You have to configure a resolver to be able to locate schemas.

I should not need to use a resolver for this case. The base location of the schema file is known in the call to newSchema(url) and all the references from that file (its includes) are relative URIs. You can see that if you provide a LSResourceResolver to the schemaFactory where it gets called with
  type = http://www.w3.org/2001/XMLSchema
  namespaceURI = http://java.sun.com/xml/ns/javaee
  publicId = null
  systemId = jsp_2_2.xsd
  baseURI = jar:file:/Users/jeremy/apache/apache-tomcat-7.0.41/lib/servlet-api.jar!/javax/servlet/resources/web-common_3_0.xsd

Resolving the relative URI of the systemId against the base gives an absolute local URI for the resource being included. This works fine for all the XSDs used by the Servlet specification (for example, the include from web-app to web-common) but fails for the JSPs XSD as it is not in the same relative location as the spec assumes. 

> Tomcat uses org.apache.catalina.util.SchemaResolver as configured by
> org.apache.catalina.startup.DigesterFactory.
> See DigesterFactory#registerLocalSchema().

In the Digester code registerLocalSchema(), the local resources are being registered with the filename as the publicId. I don't believe that is correct for the XmlSchema resources - the publicIds are there DTD-based resources with a <!DOCTYPE>. For XSD-based documents, xsi:schemaLocation is used to provide the *systemId* for the schema i.e. "http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd" and the resolver should be using that to map to the local cached copy. It works because SchemaResolver uses a single lookup table for both publicId and systemId and because (on line 112) it strips the systemId down to just the basename. This stripping down is not actually needed if you see what systemId the EntityResolver is being called with:
    (returns jar:file:/Users/jeremy/apache/tomcat/trunk/output/build/lib/servlet-api.jar!/javax/servlet/resources/web-app_3_1.xsd)

As you can see, the parser is resolving the <include schemaLocation> as a URI relative to the containing document resulting in a systemId for the "jsp_2_3.xsd" relative URI that is in the same place as the containing ".../web-common_3_1.xsd" document. SchemaResolver's implementation works by stripping that down to "jsp_2_3.xsd" and mapping that to the JSP jar. This works but would not be needed if the XSDs were in the same location relative to each other as they are at http://xmlns.jcp.org/xml/ns/javaee/

This could simplify the Digester as its SchemaResolver would only need to be configured with the systemIds that are documented by the specifications (e.g. http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd) for use in deployment descriptors, or with other external schemas (e.g. http://www.w3.org/2001/xml.xsd). It would not need to be configured with the internal references that are not defined by the specifications (e.g. web-common_3_1.xsd, jsp_2_3.xsd and so on).

I'll put together a patch to see how much of a difference it makes.

> b) You can use SchemaFactory.newSchema(Source[]). I do not know whether it
> helps in your case (maybe it does not help with includes), but it is a way
> to pass several schema files to a SchemaFactory.

It does not help with the includes as they are using the relative URI references between schema documents.
Comment 10 Jeremy Boynes 2013-07-07 00:22:07 UTC
Created attachment 30556 [details]
Start to refactor SchemaResolver not to key only on basename

I started on the patch I mentioned above. This changes SchemaResolver to use the full systemId when resolving rather than just matching on the file name. This led to discovery of 55204 which previously worked because an incorrect path was being ignored.

As part of this I also made SchemaResolver a SAX2 EntityResolver2 so that is it potentially more useful.

The previous implementation also had a dependency on a Digester in order to set the publicId. This is not normally an EntityResolver's responsibility. I removed this call and updated Digester to use the publicId reported by a LexicalHandler, and did the same for the Digester's internal EntityResolver implementation. This also allows SchemaResolver to be a singleton.

Verified with test-bio (I'm assuming these changes are independent of connector).

I'm attaching this checkpoint as there are a few more changes I would like to make which would be broader reaching. I noticed that the entries in Constants do not reflect the distinction between public and system ids. For example, WebDtdPublicId_23 is a public id for the 2.3 DTD but WebSchemaPublicId_24 is a truncated version of the 2.4 system id. In the spirit of my previous comment I'm going to propose a patch to clean up that naming, make the constant values the fully qualified URIs (I'm going to assume for this that the values in o.a.c.startup.Constants are for internal use only and can be changed), and assume that the relative URIs can be made to work by moving the XSDs from jsp to servlet. This will also allow a bit of cleanup in WebXml as setPublicId() would only need to handle DOCTYPE-based versions and setVersion() only the XSD-based versions. Finally, as Digester is already SAX2 aware, I will update it to extend DefaultHandler2 and the corresponding EntityResolver2 and LexicalHandler methods.
Comment 11 Jeremy Boynes 2013-07-08 01:07:00 UTC
Created attachment 30567 [details]
Updated patch

This version assumes that the JSP XSDs can be moved to the same location as the Servlet XSDs they have relative references too.

Updated the resolver to only require the systemIds specified in the spec docs. These are used to resolve the initial schema and from there all relative references are resolved as relative URIs.

Moved the XML identifiers from o.a.c.startup to o.a.tomcat to be part of the API allowing them to be used from Jasper (which already depends on tomcat-api). Moved the resolver to o.a.t.u.res so that it too could be shared (there may be a better package for this but it seemed resource related).

Updated Digester to extend DefaultHandler2 with its built in implementations of EntityResolver2 and Lexical handler. On WebXml, setPublicId() is now used only for files with a <DOCTYPE> (2.2 and 2.3) whereas setVersion() is used for all files using XmlSchema (2.4 and later) which means it can be simplified a little.

Added test/webapp-2.2 to confirm 2.2 version was still correctly detected and added it to the other test cases using webapp-*.

This change is now starting to collide with my patch for 53737 and I'll ping the dev list about how that could be combined.
Comment 12 Mark Thomas 2013-07-08 08:35:53 UTC
I've been looking at the first issues raised - namely where the Servlet and JSP schemas are located.

Currently, the schemas are split between [1] and [2].

This split creates the a problem that can easily been seen when viewing the schemas in Eclipse. A number of these schemas include entries along the lines of:
<xsd:include schemaLocation="xxx.xsd"/>

I've had a look at the W3C specs and there is a requirement that these locations are relative so that means the included files all need to be in the same location. With the files split between [1] and [2] there are numerous references that can't be resolved.

I see two approaches to solving this:

a) Move files from [2] to [1]
b) Duplicate files between [1] and [2]

Option a)
Pros: Only a single copy of the files will exist
Cons: The taglib files are not required by the Servlet API (the JSP
      ones are)

Option b)
Pros: servlet-api JAR won't contain unnecessary tablib files
Cons: Duplication of a most of the files between the JARs

On balance I think I prefer option a). I'd rather have 4 files in the servlet-api JAR that are only required when processing JSPs than duplicate nearly all of the files in the jsp-api JAR especially given that the jsp-api.jar depends on the servlet-api.jar anyway.

In Tomcat 7 folks might be using these JARs from there current locations so I intend to fix this using option a) for Tomcat 8 only.

The issue with the schema resolver belong in a new Bugzilla issue (or possibly issues - I haven't looked too hard at whether there is a single issue there or multiple issues).

[1] http://svn.apache.org/viewvc/tomcat/trunk/java/javax/servlet/resources/
[2] http://svn.apache.org/viewvc/tomcat/trunk/java/javax/servlet/jsp/resources/
Comment 13 Mark Thomas 2013-07-08 09:35:51 UTC
The originally reported issue has been fixed in trunk for 8.0.x. Please open a new Bugzilla issue for any additional issues (one Bugzilla issues for each additional issue) you identified while working on this issue.
Comment 14 Mark Thomas 2013-07-30 09:54:22 UTC
*** Bug 55313 has been marked as a duplicate of this bug. ***
Comment 15 Konstantin Kolinko 2014-01-28 16:15:00 UTC
This eventually have been fixed in Tomcat 6 (r1561625) and Tomcat 7 (r1562103) and will be in 6.0.39, 7.0.51.