Bug 43144 - tar task is very slow when a <zipfileset> is used as the source resource
Summary: tar task is very slow when a <zipfileset> is used as the source resource
Status: NEW
Alias: None
Product: Ant
Classification: Unclassified
Component: Core tasks (show other bugs)
Version: 1.8.2
Hardware: Other other
: P2 normal with 3 votes (vote)
Target Milestone: ---
Assignee: Ant Notifications List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-16 08:08 UTC by Steve Loughran
Modified: 2020-09-09 09:50 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Loughran 2007-08-16 08:08:17 UTC
I dont know where the blame is, but I'm trying to create a zip and tar. So I
<zip> everything up (a few seconds; its a big source tree) and then jump to <tar>. 
   <tar destFile="${full.tar}" longfile="gnu">
      <zipfileset src="${full.zip}"/>
    </tar>

Which takes a few minutes with the CPU and 100% and memory consumption at the
limit a JVM can handle. Much, much slower than using a file resource. Is it the
fact that the zip files are compressed?
Comment 1 Stefan Bodewig 2008-07-18 01:39:55 UTC
no, it is an implementation artifact.  We open and close the zip archive for each entry (because right now we wouldn't know when to close it otherwise).
Comment 2 Noel Grandin 2010-03-15 14:06:55 UTC
Having the same problem when using

<zip destfile="proguard_input.zip" >
  <archives>
    <zips>
      <filelist refid="peralex.libs" />
    </zips>
  </archives>
</zip>

This makes it pretty much unusable when operating from a network drive.
Comment 3 Mathieu Champlon 2013-03-28 10:55:01 UTC
This is probably also why using <copy> with a nested <zipfileset> is awfully slow for any decent size archive making it rather useless.
Comment 4 Michael Vorburger 2013-05-28 09:56:28 UTC
+1.. something's really wrong here, on Ant v1.9.0 we're seeing <zip> of our 666 MB product takes 1 minute, 930 MB <tar> (using two <zipfileset>) takes.. 30 minutes!
Comment 5 Ramapriya 2013-06-13 10:52:57 UTC
I had similar problem, using <zipfileset> consumes more than 30 minutes to create a tar. This is how I solved, tar creation takes approximately 40s.

    	<unzip src="product.zip" dest="extractedProduct"/>
		<tar tarfile="product.tar.gz" longfile="gnu" compression="gzip">
			<tarfileset dir="extractedProduct" includes="**/PS" filemode="755" />
			<tarfileset dir="extractedProduct" excludes="PS" />
			<fileset dir="${rootdir}" includes="resources/"/> 
		</tar>
    	<delete dir="extractedProduct" quiet="true"/>
Comment 6 Vincent Privat 2018-10-29 13:08:37 UTC
This is a major limitation of Ant. With other build systems (Maven, Gradle...) making an uber-jar that contains all dependencies is very fast. With Ant it is painfully slow and we have to manually unzip/zip them to get decent performance.
Comment 7 Jaikiran Pai 2018-10-31 14:49:24 UTC
I have opened a pull request[1] with a potential way to solve this.

[1] https://github.com/apache/ant/pull/76
Comment 8 Jaikiran Pai 2018-10-31 14:56:06 UTC
> With other build systems (Maven, Gradle...) making an uber-jar that contains all dependencies is very fast.

Vincent, you mention uber-jar whereas this issue started off with the tar task and the proposed patch is currently only applied in the tar task. Are you using the jar task with a zipfileset and that one too is showing a slow performance? I haven't checked the code for that yet but I won't be surprised if that task too is impacted. Do you have a sample build file showing your usage?
Comment 9 Stefan Bodewig 2018-10-31 18:59:54 UTC
Jaikiran, see Noel's example here which probably tries to build an ueberjar. Likely background: https://stackoverflow.com/q/35577351/4524982
Comment 10 Vincent Privat 2018-10-31 22:02:25 UTC
Hi,
Thanks for replies! Yes I was building an uber-jar using jar > restrict > archives > zips > fileset, resulting in a 85 Mb jar file with ~2000 files. The build time took more than 30 minutes.

I changed it using unzip + jar and the build time is now about 1 minute, see this commit for details:
https://trac.openstreetmap.org/changeset/34703/subversion/applications/editors/josm/plugins/javafx/build.xml

I debugged Ant quickly with VisualVM and found out the processing of all different files resulted in hundreds of thousands of calls to ZipResource.getInputStream(). The symptom looks the same as Stefan first comment ("We open and close the zip archive for each entry").

I will try your PR (thanks for creating it so fast!)
Comment 11 Jaikiran Pai 2018-11-01 13:13:26 UTC
Thanks Stefan and Vincent for those examples. I had a look at those and the PR that I have open won't address this usecase and will only address the tar task. This other usecase(s) will need a similar fix and I'll look into how to address it.
Comment 12 KC Wong 2020-09-09 09:50:09 UTC
I encountered the same problem today.

Trying to pack the following into a JAR using <archive> took 5 minutes: 
<path id="runtime_classpath">
  <pathelement path="${javax:javaee-api:jar}"/>
  <pathelement path="${com.solacesystems:sol-jms:jar}"/>
  <pathelement path="${com.fasterxml.jackson.core:jackson-databind:jar}"/>
  <pathelement path="${com.fasterxml.jackson.core:jackson-core:jar}"/>
  <pathelement path="${com.fasterxml.jackson.core:jackson-annotations:jar}"/>
  <pathelement path="${commons-logging:commons-logging:jar}"/>
  <pathelement path="${commons-lang:commons-lang:jar}"/>
  <pathelement path="${log4j:log4j:jar}"/>
  <pathelement path="${org.slf4j:slf4j-api:jar}"/>
  <pathelement path="${org.slf4j:slf4j-log4j12:jar}"/>
</path>
<jar destfile="${out.dir}/${env}/${project.build.finalName}.jar">
  <fileset 
	dir="${project.build.directory}\${project.build.finalName}\WEB-INF\classes" 
	includes="**/*.class, **/*.json, **/*.properties" 
  />
  <archives>
    <zips>
	<path refid="runtime_classpath" />
    </zips>
  </archives>
  <manifest>
    <attribute name="Main-Class" value="${mainClass}"/>
  </manifest>
</jar>

I changed to use Unzip to a temp folder, and add the temp folder as a fileset instead. That took 1 minute.

The archive tag is doing something really inefficiently...