Bug 61875

Summary: Investigate whether Xalan can be removed
Product: Taglibs Reporter: Matthew Broadhead <nbmlaw>
Component: Standard TaglibAssignee: Tomcat Developers Mailing List <dev>
Status: NEW ---    
Severity: normal    
Priority: P2    
Version: 1.2.5   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X 10.1   
Bug Depends on: 27717    
Bug Blocks:    
Attachments: patch switching back to JAXP
JAXP with perf test

Description Matthew Broadhead 2017-12-08 16:07:13 UTC
Created attachment 35593 [details]
patch switching back to JAXP

I am currently evaluating whether the Xalan dependency can be dropped from taglibs and replaced with javax.xml.*. 

this is the code that is blowing up for me:
Reader xsl = new InputStreamReader(filepath.openStream());
TransformerFactory transformerfactory = TransformerFactory.newInstance();
StreamSource ssXsl = new StreamSource(xsl);
ssXsl.setSystemId(filepath.toExternalForm());
Templates templates = transformerfactory.newTemplates(ssXsl);
Transformer transformer = templates.newTransformer();

last line causes:
java.lang.ClassCastException: org.apache.xml.dtm.ref.DTMManagerDefault cannot be cast to org.apache.xml.dtm.DTMManager
    at org.apache.xml.dtm.DTMManager.newInstance(DTMManager.java:137)
    at org.apache.xpath.XPathContext.<init>(XPathContext.java:102)
    at org.apache.xpath.XPathContext.<init>(XPathContext.java:349)
    at org.apache.xpath.XPathContext.<init>(XPathContext.java:337)
    at org.apache.xalan.transformer.TransformerImpl.<init>(TransformerImpl.java:397)
    at org.apache.xalan.templates.StylesheetRoot.newTransformer(StylesheetRoot.java:200)
Comment 1 Jeremy Boynes 2017-12-08 16:49:25 UTC
Original thread on taglibs-user:
http://mail-archives.apache.org/mod_mbox/tomcat-taglibs-user/201711.mbox/%3C66442bd7-e0db-31ec-00d3-76dd2f660b48%40nbmlaw.co.uk%3E

This problem has been seen on TomEE which does not include the Xalan dependency needed by the XML tags.

Previous attempts to replace Xalan with JAXP did not solve the performance issue reported in #27717. There were also problems with XML types that were returned. See that issue for details.
Comment 2 Jeremy Boynes 2017-12-09 00:07:48 UTC
I applied the patch and made the ForEachTag tests work and am seeing the same performance problem we had before.

With the current implementation the testIterationPerformance test completes 200K iterations in 1250ms. 

With the patched version, this test did not complete within a reasonable time. When I lower the iteration count from 200K to 20K it completed in 46000ms. This looks like the same n^2 behaviour we saw before the fix for 27717.

This is the test I ran with the modifications to work with JAXP:

    @Ignore
    @Test
    public void testIterationPerformance() throws Exception {
        // create a large document
        test = newBenchmarkDocument(20000);

        XPathExpression dot = XPathFactory.newInstance().newXPath().compile(".");
        expect(pageContext.findAttribute("doc")).andStubReturn(test);
        tag.setSelect("$doc/root/a");
        replay(pageContext);
        long time = -System.nanoTime();
        String result = null;
        if (tag.doStartTag() == IterationTag.EVAL_BODY_INCLUDE) {
            do {
                result = dot.evaluate(tag.getCurrent());
            } while (tag.doAfterBody() == IterationTag.EVAL_BODY_AGAIN);
            tag.doFinally();
        }
        time += System.nanoTime();
        System.err.println("time = " + time / 1000000 + "ms.");
        assertEquals("199999", result);
    }

time = 46673ms.
Comment 3 Matthew Broadhead 2017-12-09 10:07:54 UTC
i tried running that test and i got 
javax.servlet.jsp.JspTagException: javax.xml.xpath.XPathExpressionException: javax.xml.transform.TransformerException: Unable to evaluate expression using this context
	at org.apache.taglibs.standard.xpath.jaxp.JAXPXPathExpression.evaluate(JAXPXPathExpression.java:67)
	at org.apache.taglibs.standard.xpath.jaxp.JAXPXPathExpression.iterate(JAXPXPathExpression.java:56)
	at org.apache.taglibs.standard.tag.common.xml.ForEachTag.prepare(ForEachTag.java:57)
	at javax.servlet.jsp.jstl.core.LoopTagSupport.doStartTag(LoopTagSupport.java:241)
	at org.apache.taglibs.standard.tag.common.xml.ForEachTagTest.testIterationPerformance(ForEachTagTest.java:261)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:539)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:761)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:461)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:207)
Caused by: javax.xml.xpath.XPathExpressionException: javax.xml.transform.TransformerException: Unable to evaluate expression using this context
	at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:204)
	at org.apache.taglibs.standard.xpath.jaxp.JAXPXPathExpression.evaluate(JAXPXPathExpression.java:65)
	... 28 more
Caused by: javax.xml.transform.TransformerException: Unable to evaluate expression using this context
	at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:368)
	at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:133)
	at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109)
	at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:191)
	... 29 more
Caused by: java.lang.RuntimeException: Unable to evaluate expression using this context
	at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:266)
	at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214)
	at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339)
	... 32 more
can you share your ForEachTagTest?
Comment 4 Matthew Broadhead 2017-12-09 10:21:28 UTC
JSTLXPathContext needs to be set in ForEachTagTest.setup() somehow?
Comment 5 Jeremy Boynes 2017-12-10 21:42:05 UTC
Created attachment 35598 [details]
JAXP with perf test

$ mvn -Dtest=ForEachTagTest#testIterationPerformance -DfailIfNoTests=false test
...

Running org.apache.taglibs.standard.tag.common.xml.ForEachTagTest
time = 39511ms.
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.729 sec
Comment 6 romain.manni-bucau 2017-12-11 07:29:28 UTC
A compromise can surely be to make xalan optional with a static check if it is here to select the impl. Also copying the needed classes should be doable if it doesnt import the whole xalan.
Comment 7 Matthew Broadhead 2017-12-11 10:57:04 UTC
i stepped through ForEachTagTest.java

line 140 if (tag.doStartTag() == IterationTag.EVAL_BODY_INCLUDE) {
seems to take a long time

line 142 result = dot.evaluate(tag.getCurrent());
takes virtually no time but the iterations seem to add up to a lot of time in total

i don't really understand the ins and outs but it seems like every element has been split up and stored separately instead of being stored as an xml structure.  so the xml search tools don't work very well it is like loading a separate document for each search.

is there any way to access the low level dtm stuff through the javax.xml.*?
Comment 8 Jeremy Boynes 2017-12-12 20:34:19 UTC
(In reply to Matthew Broadhead from comment #7)
> i stepped through ForEachTagTest.java
> 
> line 140 if (tag.doStartTag() == IterationTag.EVAL_BODY_INCLUDE) {
> seems to take a long time
> 
> line 142 result = dot.evaluate(tag.getCurrent());
> takes virtually no time but the iterations seem to add up to a lot of time
> in total
> 
> i don't really understand the ins and outs but it seems like every element
> has been split up and stored separately instead of being stored as an xml
> structure.  so the xml search tools don't work very well it is like loading
> a separate document for each search.
> 
> is there any way to access the low level dtm stuff through the javax.xml.*?

The xpath engine in Xalan was designed to allow repeated fast evaluation of xpath expressions during XSLT processing. It does that by taking an upfront hit to build a DTM model of the document being processed, a model that is then reused each time an xpath expression needs to be evaluated.

The current implementation mirrors that in the context of JSTL. It constructs the model once in ForEachTag#doStartTag then reuses it when evaluating the xpath expressions inside the loop just like the XSLT engine does. It takes advantage of a portion of the JSTL spec that allows other implementations of XML context than a DOM.

When using JAXP it is limited by the standard API to using DOM objects as context. That means every time an XPath expression is evaluated, the engine has to recreate the DTM model before it can evaluate the expression. It does that starting from the root had has to work over the N-1 preceding nodes, hence the O(N^2) behaviour.

In theory, the JAXP XPath API allows different "object models" to be selected. See
  https://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/xpath/XPathFactory.html#newInstance(java.lang.String)
so there may be a way to use DTM with the default JAXP implementation (which in Oracle's and OpenJDK is a shaded version of Xalan which would support DTM). The downside of this would be if there were other JDKs that did not support DTM.

I think Glassfish ended up simply shading Xalan and bundling it with their implementation. We could do that as well, perhaps adding a "bundle" jar that includes all the tags along with a shaded version of Xalan that was intended to used by containers rather than applications.

So perhaps multiple packagings:
- pure JAXP, smaller but with a known performance issue with XML processing
- Xalan bundled, larger but with no performance issue
- unbundled, so application users can choose what tags and dependencies they include
Comment 11 Matthew Broadhead 2018-02-19 11:30:14 UTC
i have just tried TomEE 8.0.0-SNAPSHOT and it is working better than 7.0.4.  i can transform xml on loadup.  i haven't tried reloading a webapp though, as that is where it used to fail on 7.0.3