Bug 57341 - [PATCH] JUnitReport task causes OutOfMemoryError/StackOverflowError at junit_frames.br$dash$replace()
Summary: [PATCH] JUnitReport task causes OutOfMemoryError/StackOverflowError at junit_...
Status: RESOLVED FIXED
Alias: None
Product: Ant
Classification: Unclassified
Component: Optional Tasks (show other bugs)
Version: 1.9.4
Hardware: PC All
: P2 critical (vote)
Target Milestone: 1.9.5
Assignee: Ant Notifications List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-11 12:44 UTC by Ryan Bennitt
Modified: 2014-12-26 12:01 UTC (History)
0 users



Attachments
junit-frames.xsl fixed br-replace template as described (2.30 KB, patch)
2014-12-11 13:06 UTC, Ryan Bennitt
Details | Diff
junit-noframes.xsl fixed br-replace template as described (2.30 KB, patch)
2014-12-11 13:07 UTC, Ryan Bennitt
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Bennitt 2014-12-11 12:44:26 UTC
While performing a JUnitReport task, our build server started consistently reporting Stack Overflow errors:

build.xml:773: The following error occurred while executing this line:
build.xml:923: java.lang.StackOverflowError
	at com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1571)
	at com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1489)
	at com.sun.org.apache.xml.internal.serializer.ToHTMLStream.characters(ToHTMLStream.java:1529)
	at com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1614)
	at com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.characters(AbstractTranslet.java:621)
	at junit_frames.br$dash$replace()
	at junit_frames.br$dash$replace()
	at junit_frames.br$dash$replace()
	...
        at junit_frames.br$dash$replace()
	x1000

I tested locally and got a slightly different result:

build.xml:923: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:2694)
        at java.lang.String.<init>(String.java:203)
        at java.lang.String.substring(String.java:1877)
        at com.sun.org.apache.xalan.internal.xsltc.runtime.BasisLibrary.substring_afterF(BasisLibrary.java:329)
        at junit_frames.br$dash$replace()
        at junit_frames.br$dash$replace()
        at junit_frames.br$dash$replace()
	...
        at junit_frames.br$dash$replace()
	x1000

Monitoring the build on my workstation, Java Mission Control showed the memory spiking to over 1.47GB before I got the out of memory error.

I had a look at the br-replace template in the $ANT_HOME/etc/junit-frames.xsl and $ANT_HOME/etc/junit-noframes.xsl files and it is performing a recursive replace of line returns one by one, so as soon as you perform a br-replace on a file with more line returns than your stack limit, you're going to get this error, unless you run out of memory first.

I managed to reimplement the br-replace template to be less stack/heap intensive. After trying unsuccessfully to use the java String.replace/replaceAll and StringUtils.replace functions that are used elsewhere (which seem to have issues, possibly JVM dependent) I went for a plain XSLT 1.0 implementation using a binary-subdivision approach that splits the string approximately evenly on the nearest line break on large strings:

<xsl:template name="br-replace">
  <xsl:param name="word"/>
  <xsl:param name="splitlimit">32</xsl:param>
  <xsl:variable name="secondhalflen" select="(string-length($word)+(string-length($word) mod 2)) div 2"/>
  <xsl:variable name="secondhalfword" select="substring($word, $secondhalflen)"/>
  <!-- When word is very big, a recursive replace is very heap/stack expensive, so subdivide on line break after middle of string -->
  <xsl:choose>
    <xsl:when test="(string-length($word) > $splitlimit) and (contains($secondhalfword, '&#xa;'))">
      <xsl:variable name="secondhalfend" select="substring-after($secondhalfword, '&#xa;')"/>
      <xsl:variable name="firsthalflen" select="string-length($word) - $secondhalflen"/>
      <xsl:variable name="firsthalfword" select="substring($word, 1, $firsthalflen)"/>
      <xsl:variable name="firsthalfend" select="substring-before($secondhalfword, '&#xa;')"/>
      <xsl:call-template name="br-replace">
        <xsl:with-param name="word" select="concat($firsthalfword,$firsthalfend)"/>
      </xsl:call-template>
      <br/>
      <xsl:call-template name="br-replace">
        <xsl:with-param name="word" select="$secondhalfend"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="contains($word, '&#xa;')">
      <xsl:value-of select="substring-before($word, '&#xa;')"/>
      <br/>
      <xsl:call-template name="br-replace">
        <xsl:with-param name="word" select="substring-after($word, '&#xa;')"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
  <xsl:value-of select="$word"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

This implementation is much more heap/stack friendly. JMC only reported a peak of 621MB, compared to the 1.47GB it hit before overflowing.
Comment 1 Ryan Bennitt 2014-12-11 13:06:14 UTC
Created attachment 32282 [details]
junit-frames.xsl fixed br-replace template as described
Comment 2 Ryan Bennitt 2014-12-11 13:07:16 UTC
Created attachment 32283 [details]
junit-noframes.xsl fixed br-replace template as described
Comment 3 Ryan Bennitt 2014-12-11 15:30:41 UTC
Just realised these files are in the main ant repository. This probably belongs in core tasks component.
Comment 4 Stefan Bodewig 2014-12-24 13:19:55 UTC
Ryan, please don't close the issue until the patch has made its way into the official repository. :-)
Comment 5 Ryan Bennitt 2014-12-24 13:21:17 UTC
Oops, cheers.
Comment 6 Stefan Bodewig 2014-12-24 14:40:14 UTC
I can see how your approach limits the amount of stack being used in certain cases, but am unsure about splitlimit's default.  In a degenerated case where I have a long text with a line break every 72 columns (roughly) I'd get as many recursive calls as before, wouldn't I?

What does the big amount of text that causes the stack overflow or OOM in your case look like?  A very long stacktrace?  In my experience individual lines of a stack trace tend to be longer than 32 characters.
Comment 7 Ryan Bennitt 2014-12-24 15:11:36 UTC
(In reply to Stefan Bodewig from comment #6)
> I can see how your approach limits the amount of stack being used in certain
> cases, but am unsure about splitlimit's default.  In a degenerated case
> where I have a long text with a line break every 72 columns (roughly) I'd
> get as many recursive calls as before, wouldn't I?
> 
> What does the big amount of text that causes the stack overflow or OOM in
> your case look like?  A very long stacktrace?  In my experience individual
> lines of a stack trace tend to be longer than 32 characters.

It turned out to be a large XML file in the end. Average line length 55, but with significant variance, very long lines, and some runs of line returns.

The original implementation would parse the first line and give the rest to the recursive call, which resulted in N-1 + N-2 + N-3... lines being copied and passed down each time, amount of memory required of the order N^2 until you get to the last line and can return all the way back down the N-length stack.

With this implementation the stack never exceeds log2(N) and memory of the order 2N, until you get down to small chunks of text (of length splitlimit/2) at which point we shouldn't be in any danger of running out of memory. The default splitlimit is a bit arbitrary, you don't want it too large that the remaining text can overflow your stack in the worst case. It just seemed a lot of work to keep splitting text in two at some point, rather than parse the rest in the original recursive manner, especially given expected and best/worst case line lengths. You've also got the added factor that it gives up splitting when it doesn't find a line return in the second half of the chunk of text being processed, at which point it checks for a line return (in the first half) and does the normal recursive replace until there are no line returns left.

By all means tweak the splitlimit default.
Comment 8 Stefan Bodewig 2014-12-26 11:31:16 UTC
My mistake, I simply did my calculations on recursion depth wrong.
Comment 9 Stefan Bodewig 2014-12-26 12:01:53 UTC
I've just fiddled with whitespace to make the diff smaller:

http://git-wip-us.apache.org/repos/asf/ant/commit/f7f5327d

and since the same applies to the stylesheets shipping with AntUnit:

http://git-wip-us.apache.org/repos/asf/ant-antlibs-antunit/commit/7396c8b6