Bug 25767

Summary: Performance issues in JAXPUtils' use of SAXParser & SAXParserFactory
Product: Ant Reporter: Jesse Glick <jglick>
Component: CoreAssignee: Ant Notifications List <notifications>
Status: NEW ---    
Severity: enhancement    
Priority: P3    
Version: 1.6.0   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Jesse Glick 2003-12-26 17:51:06 UTC
Comments on when JAXPUtils creates new factory / parser / document builder
objects and when it caches them.

1. Correctness:

private static SAXParserFactory parserFactory = null;
public static synchronized SAXParserFactory getParserFactory() {
    if (parserFactory == null) {
        parserFactory = newParserFactory();
    }
    return parserFactory;
}

(and similar methods) is likely not correct. An instance of SAXParserFactory is
not thread-safe and might not be reentrant, according to its Javadoc. If more
than one thread in the JVM is calling this method and using the same parser,
random corruption might result - this might happen to someone using <parallel>,
for example; or someone running several builds at once using an IDE integration.
Or if this method is called in the middle of a parse for some reason - to parse
some referenced file? - there might be corruption; at least SAX1's Parser says
it is not reentrant.

2. Performance: casual examination of the performance of running a tiny build
script w/ several antlib.xml's over and over using a profiler shows some 20-30%
of the CPU time being spent in JAXPUtils.getNamespaceXMLReader(), specifically
calling Xerces' SAXParserImpl.<init>(). Not an especially cheap call since it
sets up a lot of objects of various kinds; you are supposed to cache the result
from call to call if you want good performance and might be parsing a lot of
files. (E.g. a lot of small antlib.xml's, or many little build.xml's with
<subant>, etc.) Note that this time is not spent actually parsing the XML or
even opening the input stream to it - just getting a parser object.

Suggested fix to both problems:

1. Rework the helper methods in JAXPUtils to cache not just the factory objects
but the parsers & document builders themselves. I don't know how expensive
getting the factory is - JAXPUtils already caches them, and my profiling only
took place on a "warmed-up" JVM - but invoking the factories' new*() methods is
too expensive to do every time an XML file is parsed.

2. Leave uncached impls as e.g. SAXParser createSAXParser() and so on - these
would always create new ones. Or for compatibility, keep the current method
names, but deprecate them in favor of properly cached versions (see #3).

3. For cached usage, use e.g. SAXParser obtainSAXParser() and void
releaseSAXParser(SAXParser) methods. These should use java.lang.ThreadLocal:

private static final ThreadLocal/*<SAXParser>*/ saxParser = new ThreadLocal();
public static SAXParser obtainSAXParser() {
    SAXParser p = (SAXParser)saxParser.get();
    if (p != null) {
        saxParser.set(null);
    } else {
        p = createSAXParser();
    }
    return p;
}
public static void releaseSAXParser(SAXParser p) {
    saxParser.set(p);
}

// Usage:
SAXParser p = obtainSAXParser();
try {
   // parse...
} finally {
    releaseSAXParser(p);
}

Forgetting to release a parser is not a big deal, it will just need to be
recreated on the next obtain*() call. Using ThreadLocal ensures that different
threads never share the same object. Setting the ref to null in obtain*()
ensures that reentrant calls (i.e. calling obtain*() while the last-obtained
parser is still in use) will be safe.

One issue is that several utility methods do not return the original cacheable
object, but rather something derived from it, e.g.

return newSAXParser(getNSParserFactory()).getXMLReader();

For this usage you should probably have obtainNamespaceXMLReader +
releaseNamespaceXMLReader, etc. - i.e. keep a separate cache for each kind of
object - since you can't get back to the SAXParser from the XMLReader.

Could try to provide a patch if there is interest - at least modifications to
JAXPUtils, and usage of the new methods in e.g. ProjectHelper2.parse().
Comment 1 Jesse Glick 2003-12-27 03:05:03 UTC
Correction from the JAXP 1.2 spec - for factories it is only the configuration
methods which are not thread-safe; the new*() constructor methods are
thread-safe. So there is probably no correctness error in this regard in
JAXPUtils since the only cached objects are factories. Performance note still holds.