Bug 54128

Summary: Ant 1.8 exec task changes have slowed exec to a crawl
Product: Ant Reporter: David F. Elliott <dfe>
Component: Core tasksAssignee: Ant Notifications List <notifications>
Status: CLOSED FIXED    
Severity: regression CC: dopey, jglick, mica, msunde, paltzern, simon.lieschke, timw, xtim
Priority: P2 Keywords: PatchAvailable
Version: 1.9.1   
Target Milestone: 1.9.3   
Hardware: PC   
OS: All   
Bug Depends on: 5003    
Bug Blocks:    
Attachments: Restores the exec task speed to ant 1.7 times
Alternative patch to speed up exec
Alternative patch to speed up exec

Description David F. Elliott 2012-11-09 20:26:11 UTC
Created attachment 29578 [details]
Restores the exec task speed to ant 1.7 times

Changes made between ant 1.7 and ant 1.8 slowed the exec task to a crawl.

The reason is that it changed blocking I/O to to polled I/O with a period of 100 ms.  In effect any task that completes inside of the 100 ms window will have to wait for the sleep(100) call to finish at which point the non-blocking read call realizes it has exhausted all data and actually exits.

The correct approach is to go back to blocking reads but upon detection of process termination follow a few simple steps:

1. Use non-blocking available() call on the input ends of the subprocess's stdout,stderr streams and spin w/ 100 ms sleep until no bytes are available.  You can be certain this will exhaust all output because it runs after it is known that the subprocess has already exited.
2. Call destroy() on the process object.  This closes the input streams which has the effect of unblocking the blocked read calls.
3. Continue with the rest of the original code which is to stop the stream handler which joins the reading threads.  Because the read calls have been unblocked and exhausted all input the join will complete immediately in all cases.

This has been tested on OS X 10.8 Mountain Lion and it restores the exec task speed back to ant 1.7 series levels (i.e. 20 seconds to run our exec-heavy build task instead of 2-3 minutes).

It will obviously require testing on the Windows platform where the original Bug 5003 was reported to verify it didn't regress that bug fix.
Comment 1 Chee Yang Chau 2013-02-02 00:52:46 UTC
I have an ANT build.xml that perform <exec> task for 400++ times.  The job use about 84 seconds in Ant 1.7.1.

I perform the same task in Ant 1.8.4 and it use 227 seconds.

I then patch my ANT 1.8.4 with your diff patch and now it takes 99 seconds to finish the job.  There is still a significant 17% increase of time in seconds to finish the job compare to ANT 1.7.1.  Is that possible to further optimize?

Thank you.
Comment 2 Chee Yang Chau 2013-03-11 04:33:47 UTC
This problem doesn't fix in version 1.9.0
Comment 3 Chee Yang Chau 2013-03-14 08:26:42 UTC
The patch doesn't seems to fix all the speed issue in ANT 1.8.

An exec task with redirector and outputfilterchain perform badly in ANT 1.8 even the patch has applied:

<exec executable="${p.tdump.bat}" failonerror="false">
  <redirector output="${p.file.bpl}" append="true">
    <outputfilterchain>
      <headfilter lines="-1" skip="2" />
      <linecontainsregexp>
        <regexp pattern="^IMPORT:"/>
      </linecontainsregexp>

      <replacestring from="IMPORT:" />
      <tokenfilter delimoutput=" ">
        <StringTokenizer />
      </tokenfilter>
    </outputfilterchain>
  </redirector>
</exec>
Comment 4 Tim Whittington 2013-08-06 02:45:32 UTC
This also appears to affect the symlink task.

I have builds using repeated symlink invocations to link reasonably sized (10s-100s) collections of files - with Ant 1.7 this is almost instantaneous, but with Ant 1.9.2 there is a large pause (~300 ms per invocation, which adds up quickly).
Comment 5 Tim Whittington 2013-08-11 21:45:03 UTC
Created attachment 30720 [details]
Alternative patch to speed up exec

I've attached an alternative fix, which addresses the core problem of the PumpStreamHandler blindly joining the StreamPumper thread for 200ms without signalling it to exit.
StreamPumper is also changed to drain available output on finish, which is required when it is in 'use available' mode as fast executing tasks will most often exit before the output is read.

A fixed tree is also available at https://github.com/orionhealth/ant/tree/features/fast-exec.
Comment 6 Tim Whittington 2013-08-12 09:49:13 UTC
Created attachment 30721 [details]
Alternative patch to speed up exec

I've updated my fix to correct a problem with dropping data blocks if the stream pumper was stopped between reading and writing a block.

Also fixed a race condition observed in testing where the stream pumper is stopped before the thread running it enters the run() method, which can cause hanging/require interruption.
Comment 7 Laurent Goujon 2013-12-13 20:18:42 UTC
Any chance that this patch is reviewed and merged? I tested it on some builds which use apply tasks a lot, and reduced build times from several minutes to several seconds...
Comment 8 Antoine Levy-Lambert 2013-12-16 01:43:04 UTC
this looks interesting.
Comment 9 Antoine Levy-Lambert 2013-12-16 04:23:24 UTC
I have applied the patch supplied by Tim Whittington and yes, it has a dramatic effect. On my Mac I have been able to run 100 exec invocations of "ls" in 1 second with your patch as opposed to  41 seconds before. Well done ! The code is checked in in ant trunk and will be part of the release when someone creates Ant 1.9.3
Comment 10 paltzern 2013-12-16 08:29:54 UTC
Awesome! Thank you Tim and Antoine! I'm looking forward to be fast again! :-)
Comment 11 Stefan Bodewig 2013-12-21 19:53:33 UTC
*** Bug 50229 has been marked as a duplicate of this bug. ***
Comment 12 Stefan Bodewig 2013-12-21 19:53:38 UTC
*** Bug 51387 has been marked as a duplicate of this bug. ***
Comment 13 Stefan Bodewig 2013-12-21 19:53:47 UTC
*** Bug 48734 has been marked as a duplicate of this bug. ***
Comment 14 Stefan Bodewig 2013-12-21 19:53:56 UTC
*** Bug 49384 has been marked as a duplicate of this bug. ***
Comment 15 paltzern 2014-01-09 10:56:52 UTC
I tested exec in the release 1.9.3 and unfortunately there is no speed up for me!

I could reduce the problem to one single (sic!) exec-call that produces a lot of logging-output (about 10'000'000 chars in about 24'000 lines).

This exec-call takes 10 seconds with 1.7.1 and 210 seconds with 1.9.3.
Comment 16 Antoine Levy-Lambert 2014-01-10 00:15:31 UTC
On which operating system, and with which JDK are you observing that slow behavior ?
Comment 17 paltzern 2014-01-10 07:22:40 UTC
Win 7 Prof SP 1 and JDK 1.7.0_40-b43.
Comment 18 Antoine Levy-Lambert 2014-01-12 00:22:50 UTC
I created this build file to test :
<project name="doit">
<echo>ant.version=${ant.version}</echo>
 <script language="javascript"> <![CDATA[

      for (i=1; i<=1000; i++) {
        exec = doit.createTask("exec");
        exec.setExecutable("attrib");
        exec.createArg().setValue("-r");
        exec.createArg().setValue("doit.xml");
        exec.perform();
      }

    ]]> </script>

</project>

---
I ran the build file on Windows 7 with JDK 1.7 using this script :
export ANT_HOME=c:/opt/apache-ant-1.7.1
ant -f doit.xml
export ANT_HOME=c:/opt/apache-ant-1.9.2
ant -f doit.xml
export ANT_HOME=c:/opt/apache-ant-1.9.3
ant -f doit.xml
----
here is the output :

$ ./doit.sh
Buildfile: doit.xml
     [echo] ant.version=Apache Ant version 1.7.1 compiled on June 27 2008

BUILD SUCCESSFUL
Total time: 8 minutes 6 seconds
Buildfile: C:\dev\54128\doit.xml
     [echo] ant.version=Apache Ant(TM) version 1.9.2 compiled on July 8 2013

BUILD SUCCESSFUL
Total time: 14 minutes 50 seconds
Buildfile: C:\dev\54128\doit.xml
     [echo] ant.version=Apache Ant(TM) version 1.9.3 compiled on December 23 2013

BUILD SUCCESSFUL
Total time: 8 minutes 5 seconds

The test build file as you can see runs 1000 times attrib on a single file.
The performance of Ant 1.7.1 and Ant 1.9.3 in this test are the same, Ant 1.9.2 is much slower.

Is there something specific to the command that you are running which is such that it is slow under Ant 1.9.3 ? Did you set ANT_HOME to point to the installation of Ant 1.9.3 when you tested it ?
Comment 19 paltzern 2014-01-13 09:48:58 UTC
Thank you for looking into this again.

I can confirm the results of your test.

I modified it to the following:

<project name="doit" >
  <echo>ant.version=${ant.version}</echo>

  <exec executable="cmd" dir="c:\">
    <arg value="/c" />
    <arg value="test.bat" />
  </exec>
</project>

test.bat just produces a lot of output (10'000 lines with 1'000 chars each) and containts the following:
@for /L %%i IN (1 1 10000) do @echo 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890



The results:

     [echo] ant.version=Apache Ant version 1.7.1 compiled on June 27 2008
[...]
BUILD SUCCESSFUL
Total time: 1 second


     [echo] ant.version=Apache Ant(TM) version 1.9.2 compiled on July 8 2013
[...]
BUILD SUCCESSFUL
Total time: 38 seconds


     [echo] ant.version=Apache Ant(TM) version 1.9.3 compiled on December 23 2013
[...]
BUILD SUCCESSFUL
Total time: 37 seconds


Here, the performance fo 1.9.2 and 1.9.3 are the same and 1.7.1 is much faster!
Comment 20 msunde 2014-01-13 17:25:44 UTC
See Bug 50229, which is the single exec with a lot of output use case.
Comment 21 paltzern 2014-03-04 16:24:40 UTC
I'm still stuck with Ant 1.7.1 because of this issue.

So should I reopen Bug 50229?
Comment 22 Tim Whittington 2014-03-04 21:11:47 UTC
I've tested all 3 of the test projects on this issue and Bug 50229:
1. 'doit' with repeated calls to attrib
2. 'doit' calling a batch file that produces large amounts of output
3. 'test' that uses cat to pipe a large (40MB) file from stdin to stdout

All 3 test projects were run on:
- Mac OS X Mavericks with Java 1.7.0_51
- Windows 7 Enterprise SP1 in a VMware Fusion VM with Java 1.7.0_51

(There were obviously local adaptations for the execs).

All 3 test projects were run with the following versions of Ant:
- 1.7.1
- 1.9.2 with my patch applied
- 1.9.3
- 1.9.4beta (built from trunk)

All of the tested Ant versions had comparable (to within a second or two) execution times on all of the test projects.
Comment 23 Tim Whittington 2014-03-04 21:19:56 UTC
(In reply to paltzern from comment #19)

> 
> The results:
> 
>      [echo] ant.version=Apache Ant version 1.7.1 compiled on June 27 2008
> [...]
> BUILD SUCCESSFUL
> Total time: 1 second
> 
> 
>      [echo] ant.version=Apache Ant(TM) version 1.9.2 compiled on July 8 2013
> [...]
> BUILD SUCCESSFUL
> Total time: 38 seconds
> 
> 
>      [echo] ant.version=Apache Ant(TM) version 1.9.3 compiled on December 23
> 2013
> [...]
> BUILD SUCCESSFUL
> Total time: 37 seconds
> 
> 
> Here, the performance fo 1.9.2 and 1.9.3 are the same and 1.7.1 is much
> faster!


The Ant 1.7.1 result here looks anomalous - in my OS X and Windows 7 environment this test takes about 20 seconds on 1.7.1 and 1.9.3.

If the result is piped to a file this goes down to about 3 seconds (limited mainly by I/O).

Writing that output file to the console in Windows (type fu.txt) takes about as long as the execution with Ant, indicating that the execution time is dominated by console output time. 

It seems implausible for the entire test to execute in 1 second, bettering the execution time of the 'type' command.
Comment 24 Tim Whittington 2014-03-04 21:22:21 UTC
(In reply to Antoine Levy-Lambert from comment #18)
> here is the output :
> 
> $ ./doit.sh
> Buildfile: doit.xml
>      [echo] ant.version=Apache Ant version 1.7.1 compiled on June 27 2008
> 
> BUILD SUCCESSFUL
> Total time: 8 minutes 6 seconds
> Buildfile: C:\dev\54128\doit.xml
>      [echo] ant.version=Apache Ant(TM) version 1.9.2 compiled on July 8 2013
> 
> BUILD SUCCESSFUL
> Total time: 14 minutes 50 seconds
> Buildfile: C:\dev\54128\doit.xml
>      [echo] ant.version=Apache Ant(TM) version 1.9.3 compiled on December 23
> 2013
> 
> BUILD SUCCESSFUL
> Total time: 8 minutes 5 seconds

All of these times look anomalous - I get times of about 20 seconds on all versions in my Windows 7 environment, and these times indicate about 480ms per attrib execution on 1.7.1 which seems incredible.

Is it possible the loop count of the script was much higher when these timings were taken?
Comment 25 Tim Whittington 2014-03-04 21:58:29 UTC
> It seems implausible for the entire test to execute in 1 second, bettering
> the execution time of the 'type' command.

Having said that, I can get the execution time down to about 2 seconds with Ant 1.7.1 by piping the output to a file or the NUL device.

With that approach Ant 1.7.1 is clearly faster (about 2 seconds) than 1.9.3+ (about 4 seconds) which I imagine will be the 100ms pause penalty applied in StreamPumper.waitForInput(InputStream) each time there is 0 bytes available (which is often the case after reading a line from STDIN).

Some experimentation reveals that reducing that pause time does speed up the 1.9.x execution, but if it's put too low the execution times ramp up a lot - it appears that there's a significant penalty to calling available() too much. The penalty isn't in calling available() (that takes about 10-20 microseconds on average), so it might some kind of contention.
Comment 26 paltzern 2014-03-05 07:59:41 UTC
Hi Tim,

thank you for doing more investigation on this issue!

I figured that the difference seems to be whether you run the script via Eclipse or Windows console!

I can observe the huge performance difference between ant 1.7.1 and the others only (sic!) when run with Eclipse (4.3.0).

With the Win7 console it takes as long as you state.

Any idea what could cause this strange behaviour and how to fix it?
Comment 27 Tim Whittington 2014-03-05 17:23:47 UTC
(In reply to paltzern from comment #26)
> 
> I can observe the huge performance difference between ant 1.7.1 and the
> others only (sic!) when run with Eclipse (4.3.0).
> 
> With the Win7 console it takes as long as you state.
> 
> Any idea what could cause this strange behaviour and how to fix it?

The Eclipse console will slow down a lot if console output isn't limited and you try to output too much (which your test does).

I wonder if this issue can be re-resolved now.
Comment 28 paltzern 2014-03-06 06:58:26 UTC
In this case, the eclipse console speeds things up for 1.7.1!

Sorry, that my last comment didn't make that clear.

Here are the times for my doit-example on my machine:

        win console    eclipse console
1.7.1    42 seconds          1 second
1.9.2    43 seconds         38 seconds
1.9.3    43 seconds         37 seconds

And console output in eclipse is not limited!
Comment 29 paltzern 2014-05-02 10:25:26 UTC
Are there any new insights on this one?

A possible workaround?

Anything I can test to help you find the problem?
Comment 30 David F. Elliott 2014-05-02 20:44:29 UTC
Tim's fix is confirmed working here.

If I had to guess I'd say paltzern's outlier time for 1.7.1 outputting to the Eclipse console (1 second compared to around 40 for every other case) is due to the fact that Tim Whittington's patch also fixed the problem of output getting dropped in a race when the process on the other side of the pipe does not consume it fast enough.

As the originator of the bug I am marking it as resolved in ant 1.9.3. Paltzern can open up another bug if he believes there is another problem.
Comment 31 paltzern 2014-05-05 07:56:37 UTC
As suggested I filed the new Bug 56487.
Comment 32 Chee Yang Chau 2018-03-21 02:52:06 UTC
Any news on this issue with ANT 1.10.2 release?  Is is fix?  Do we need to apply the patch too?
Comment 33 David F. Elliott 2018-03-21 02:58:42 UTC
Marking verified as of ant 1.9.3, several years ago now.  The fix worked.
Comment 34 Chee Yang Chau 2018-03-21 03:07:23 UTC
I am trying the 1.10.2 release without applying any fix.  It happen again, very slow in exec task.
Comment 35 David F. Elliott 2018-03-21 05:03:12 UTC
Please file a new bug then instead of spamming my email inbox with your comments on a bug that I consider to have been resolved over 4 years ago.

The issue I reported was fixed.  If you are still having problems that is a different issue.