Bug 47373

Summary: Zip.java patch to support setting last modified date
Product: Ant Reporter: Mark Farnsworth <farnsworth2008>
Component: Core tasksAssignee: Ant Notifications List <notifications>
Status: NEW ---    
Severity: enhancement CC: jglick
Priority: P3 Keywords: PatchAvailable
Version: 1.7.1   
Target Milestone: 1.8.0   
Hardware: All   
OS: All   
Attachments: This is a patch for ANT to add an option to Zip allowing the user to set last modified date.
Patch ready for code review

Description Mark Farnsworth 2009-06-16 08:17:59 UTC
Currently with ANT Tasks it is not possible to fully control the way Zip, Jar, Ear, or War files are produced. The issue is that the default behavior of the task includes the last modified date information from the file system when in creates an archive. The result is that with the exact same source code the archive produced from any given build will vary based on where and when the build was run. In an environment like HUDSON where you wish to build on different servers, from different branches, and at different times this is problematic since the MD5 hash value of the artifact the build produces will not reflect the content of the code but rather reflect a combination of the content of the code and the current time of the server where the build was run.

ANT should support an option for the task to define the last modified date of the archive. If support for this option is added to the Zip task then it will be automatically available in tasks that extend it.

The patch file attached to this issue provides an implementation of the last modified attribute.  I had some trouble with the ANT build system and am not 100% sure the patch is correct.  If someone with commit access could look over the patch and let me know if it works for them I would very much appreciate.

BTW, the ANT build.xml file will not open in Eclipse Ganymede (likely an unrelated Ganymede bug).
Comment 1 Mark Farnsworth 2009-06-16 08:18:32 UTC
This patch solves a problem I have with the current ANT zip, jar, and war tasks producing files with different content each time the build is run. While some may see this as a feature; I find the behavior to be annoying because now a MD5 fingerprint for the zip does not reflect the content and without the patch my build system can not reproduce a build (down to every last bit). This means that with the exact same source from SVN two builds that should produce the same result will actually have different MD5 hash values because without the patch the build system does not reflect what is defined in the SVN environment.

I have workarounds but none of them are as clean since it means that my build system can't use the standard tasks. As a result I really hope that ANT is willing to include this patch in the next release.

The way the task is defined the default behavior of ANT is not changed. The patch simply allows for an option where the script can define what last modification date the zip file should use. In addition to solving my MD5 fingerprint issue this option would also provide a way for users to control the dates that are applied to content and in doing so solve problems with app servers that use date stamps in relation to resource caching. Having the option puts the script in control if it wants to define how the zip, war, jar, or ear is produced.
Comment 2 Mark Farnsworth 2009-06-16 08:21:05 UTC
Created attachment 23810 [details]
This is a patch for ANT to add an option to Zip allowing the user to set last modified date.
Comment 3 Mark Farnsworth 2009-06-16 08:25:40 UTC
This issue was first entered in JIRA:
https://issues.apache.org/jira/browse/IVY-1095

I moved it to here after I realized that core ANT issues are not tracked in JIRA.
Comment 4 Jesse Glick 2009-06-17 10:12:43 UTC
Background:

http://www.nabble.com/Should-Hudson-have-an-option-for-a-content-fingerprint--td24022683.html

Do you really need modifications to <zip>? Seems like you could just use <touch> on build/classes (or whatever it is you are packing) before running <zip>. Slightly more work but not much; downgrading priority therefore. From the Ant manual:

  <touch datetime="09/10/1974 4:30 pm">
    <fileset dir="src_dir"/>
  </touch>

changes the modification time to Oct, 09 1974 4:30 pm of all files and directories found in src_dir.
Comment 5 Mark Farnsworth 2009-06-17 16:27:05 UTC
Touch will work with zip task but it is not as clean a way of controlling the content of the archive.  Also the jar, war, and ear tasks all share the problem and in those tasks their is code that pulls the system current time as it creates manifest.  My patch to zip is clean in that it fixes the problem for all types of archives and does not involve changing the time stamps.  

I am not 100% sure about the effect touch technique would have on javac but my gut tells me that it would wreak havic and cause problems the next time the build runs since the compiler would have trouble deciding if it should re-compile the .class files from the .java files.  (actually I am fairly sure that would happen).  The result would be slow builds and would defeat much of the value of ANT.
Comment 6 Jesse Glick 2009-06-18 09:23:35 UTC
Both issues you mention could be solved, again at the expense of a more cumbersome script (but doable using Ant 1.7.1), by running the final <zip> from a dir reserved for this purpose:

1. For <jar> etc., create the JAR as usual, then <unzip> to a new dir, <touch>, <zip>.

2. If only using <zip> but the archive contains classes from <javac> or other tasks which do up-to-date checks based on timestamp, then compile as usual, then <copy> to a new dir, touch, <zip>.
Comment 7 Mark Farnsworth 2009-06-18 11:23:49 UTC
I agree that there are workarounds but they are not very clean and with MD5 becoming more important I think the ability to control zip content directly via ANT would be a good feature.  In our build system we have a slow build (clean get from SVN and totally clean target).  The slow build is 50 minutes.   The fast build is two minutes with a SVN update to an existing workspace in Hudson using an ant script that does not do a clean prior to the build.   With my tuning of the system I can use MD5 hashes to PROVE beyond a shadow of a doubt that the clean build is just as good as the slow build.   

We now do clean builds after every commit and once a day do a slow build to check and verify that the normal fast build is working correctly.

In addition MD5 fingerprinting in Hudson lets us find the source build associated with a given binary.

Having the ability in ANT to control the archive process without resorting to copy and touch would be a good thing especially for larger projects.

BTW, my current workaround is the a custom task that I developed to extend Zip and provide the feature via the normal Ant custom task feature.  This works quite well but it means we can never use the standard zip, war, ear, or jar tasks plus we need to explain it to everyone who comes along and needs to review the ant script.

Sure there are work arounds but having a 1st class feature in ANT for this might be a good thing and most of the work is already done in my patch.  (i don't think the unit test in the patch is working but if ANT is willing to accept the feature I would be willing to complete the work and ensure that the unit test for the new feature and all docs are updated).
Comment 8 Mark Farnsworth 2009-06-18 11:27:47 UTC
I had a typo in my last comment.

I intended to say that MD5 allows me to prove that the fast build is just as good as the slow build.   This is because with MD5 I can compare the result of the fast build once a day to the daily slow build and report that the fast build is accurate.  Without a way to prove the accuracy of the build we would not be confident deploying builds produced by the fast build process and as such we would be forced to use the slow build process all the time.
Comment 9 Stefan Bodewig 2009-06-19 05:33:09 UTC
I agree that having a task level attribute would be cleaner (tar would need it as well, though).
Comment 10 Jesse Glick 2009-06-19 08:26:53 UTC
Using (unpatched) trunk sources I am getting a test failure in an area possibly related to the patch:

Build File: .../src/tests/antunit/types/resources/archives-test.xml
Tests run: 5, Failures: 3, Errors: 0, Time elapsed: 0.087 sec
Target: testCircularReference took 0.001 sec
Target: testMixingZipsAndTars  FAILED
	at line 57, column 69
	Message: Expected file '/tmp/testoutput/org/apache/tools/ant/launch/Launcher.class' to exist
	took 0.007 sec
Target: testReference  FAILED
	at line 74, column 69
	Message: Expected file '/tmp/testoutput/org/apache/tools/ant/launch/Launcher.class' to exist
	took 0.003 sec
Target: testExtractJars  FAILED
	at line 33, column 69
	Message: Expected file '/tmp/testoutput/org/apache/tools/ant/launch/Launcher.class' to exist
	took 0.002 sec
Target: testZipManualExample took 0.042 sec

I guess this should be resolved first.
Comment 11 Mark Farnsworth 2009-06-19 10:48:44 UTC
I will look at TAR and add the changes to my patch if we are going forward with this.  I will also make sure that the code changes are covered by unit tests and that the docs are appropriately updated.

I am also open to ideas on syntax.

I was thinking of calling the attribute "lastmodified" and allowing the user to set it with any value that can be parsed via the standard java date format instance.  If the property is set it would control the output.  Ommiting the property, setting to empty string, or using "auto" would produce the current behavior. 

<zip 
lastmodified="2009-06-19 01:36 PM"
destfile="out.zip"
basedir="input"
/>

The following is pseudo code for the prop include the options for supporting "auto".  I have not done ANT coding in the past so bear with me a bit.  I will format and document all the code in the next patch update as per the exiting ANT standards page on the wiki.

public void setLastModified(final String dateString) {
  lastModifedSet = false;
  if(dateString.equals("")) return;
  if(dateString.equals("default")) return;
  try {
    lastModifed = DateFormat.getInstance().parse(dateString).getTime();
    lastModifedSet = true;
  } catch (final ParseException e) {
    throw new RuntimeException(e);
  }
}


We could change the terms if someone has a better idea then the using "lastmodified".  I notice that ANT often uses shorter names (i.e. destfile).  We could use "modified" or we could use "contentdate".   

The parsing of the date string respects the local where ANT is running.  I think that is a good thing but others might want it to force support for a given style of dates.  I am neutral on the issue.

Taking the patch from where it is to completion will be a little bit of work.  I am happy to do the grunt if someone assures me that they will review the feature and that it has at least a reasonable chance of making the cut for the next release.
Comment 12 Jesse Glick 2009-06-19 12:57:49 UTC
I will do my best to review proposed patches, and it looks like Stefan will as well.


I think "contentdate" (or "entrydate"?) is more likely to be understood immediately. "lastmodified" to me might sound like it has something to do with the timestamp of the ZIP file itself - perhaps affecting the up-to-date check.


For consistency I would recommend using the same date syntax as <touch> uses: a string parsed according to

DateFormat.getDateTimeInstance(DateFormat.SHORT, DateFormat.MEDIUM, Locale.US)

with the option to select a different "pattern" a la SimpleDateFormat.

Relying on the current locale for anything seems like a bad idea to me; build scripts should run the same on anyone's computer.


It occurs to me that putting this new attribute on the task is wrong. Rather, it should be on the <zipfileset>, i.e. ArchiveFileSet generally. (Compare existing idiom for setting file mode.) If using the 'src' attr, ArchiveScanner could be given a 'long timestamp' property, which would be passed along to each ArchiveResource. If using the 'dir' attr, I presume this would also work somehow, though I have not yet figured out how since AFS seems to just call super.getDirectoryScanner and I don't see where mode & prefix are used when src == null.
Comment 13 Mark Farnsworth 2009-06-19 14:14:16 UTC
thanks!

I will look at the improving the patch over the next week and look at getting something together for review i about two weeks from now.

since touch uses "datetime" to set the datetime I think I will use the same attribute name.  I agree about dates especially since builds need to be portable and will follow the pattern used by touch in that area as well.  zip file set is also a good idea.

I will check back with you guys in about two weeks.
Comment 14 Mark Farnsworth 2009-06-21 15:37:36 UTC
Created attachment 23841 [details]
Patch ready for code review

This patch can be applied to SVN revision 787061; trunk or ANT_17_BRANCH.

The patch adds support for Touch style datetime and pattern attributes to the Zip task.  The task includes the changes to Zip.java and changes to documentation files zip.html, ear.html, jar.html, and war.html.  

The scope of the patch is limited to zip, jar, war, and ear.  This patch does not address Tar because while conceptually similar the Tar task does not have a common base and as such I think it would best to create a different task item for Tar.

The patch also does not implement anything with ZipFileSet because once again I view that this would be better addressed as a separate issue/patch.  Even if we decide to support this touch style syntax at the ZipFileSet level we would still need to have the work of this patch because many users of Zip,Jar,War,Ear tasks do not use ZipFileSet and as such core support for the feature needs to be at the Zip.java level.  

We can later add support for a similar feature at ZipFileSet and in Tar but to keep a handle of patch scope I would view these as separate issues.

When reviewing the patch, please provide comments here and/or send me an email (farnsworth2008@gmail.com).
Comment 15 Jesse Glick 2009-06-22 09:49:07 UTC
I will repeat my advice that the new feature be added to ArchiveFileSet, not any Task impl such as Zip. Ant already includes the ability to configure various things on an ArchiveFileSet which would override values otherwise inferred from File inputs, and the existing 'filemode' (or 'dirmode') attr most closely resembles the proposed timestamp.

As to the objection that existing users of <zip> etc. do not always use <zipfileset> - well if they wish to change the behavior of the task to specify a timestamp, then they need to change their build script somehow. Replacing <fileset> with <zipfileset> (in addition to specifying the timestamp attr) does not seem to me to be a burden.

A complete patch along the lines I recommend would involve changes to:

- ArchiveFileSet: XML-configured setter for timestamp, something along the lines of your current patch, plus a getter (for a Long?)

- Zip: addResources should when checking "if (fileset instanceof ArchiveFileSet)" also check for the timestamp, and subsequent calls to zipFile and addParentDirs methods should be modified to use that timestamp if specified

- Tar: tarResource should, if tarFileSet != null, call te.setModTime with the specified time if set (rather than using r.getLastModified())

(Note: it would be much cleaner if ArchiveFileSet would _always_ create custom resources (even when src==null), so that these ArchiveResource's could specify their own mode, last mod time, etc., and tasks like Zip and Tar would not need hacks like doing runtime type checks on the fileset - so would not need to be patched at all to support a feature like this. But probably too late for such a change.)
Comment 16 Mark Farnsworth 2009-06-22 10:59:46 UTC
The use-case I have for setting the value is to make the zip file reflect build sources and provide a stable MD5 fingerprint.   For my usage setting datetime at the Zip level is better then the ZipFileSet level because I can control the entire zip file in one pace as opposed to setting the attribute on each of the ZipFileSet tags within the parent.  

In the psuedo ant below you can see that having support for datetime at the zip level allows the user to control the entire zip file in one place and this is the best solution for my use-case.   Having support for the feature at zfs level  might be nice for some usage but given the use case presented in this issue it would be sub optimal since it would require setting the attribute five on five tags to achieve the same result.

<zip>
   <zipfileset/>
   <zipfileset/>
   <zipfileset/>
   <zipfileset/>
   <zipfileset/>
</zip>

I don't see anything wrong with ALSO having the feature at Tar, ZipFileSet, and TarFileSet but really I think it is important to support the feature in both places.  Conceptually the feature is a bit similar to the Roundup feature already supported at the Zip.java level and for most ANT users this is likely to be a place where they would expect to see such a feature.

Adding support at the ZipFileSet is a good thing and I would be happy to do the work but view it as an independent patch and really an independent feature plus I think it would be lower priority vs. support at the Zip level.
Comment 17 Stefan Bodewig 2009-06-24 06:53:21 UTC
if we want to implement this on the resource rather than the task side (and I like the idea) than the improved resource infrastructure in svn trunk will help.

It should be straight forward to write a "fixedlastmodfied" resource collection that wrapped any other resource collection returning (org.apache.tools.ant.types.resources.) ResourceDecorators around the originals while overriding the getLastModified method.

Using such a data type would turn Mark's example into

<zip>
  <fixlastmodified datetime=...>
    <resources>
      <zipfileset/>
      <zipfileset/>
      <zipfileset/>
      <zipfileset/>
      <zipfileset/>
    </resources>
  </fixlastmodified>
</zip>

This means two additional elements compared to implementing it at the task level but OTOH it will be available to all tasks (like future archive formats, or copy) and thus be much more powerful.
Comment 18 Mark Farnsworth 2009-06-24 07:30:22 UTC
I like the idea of <fixlastmodified/> but given the historical use of <zip/>, et al it may make sense to support multiple techniques.  The current roundup option at the <zip/> layer is actually conceptually similar and as such IMHO it makes sense to provide the user the option to set datetime attribute at <zip/>.

We could support datetime and pattern attributes on <zip/>, <ear/>, <jar/>, <war/>, <tar/>, <touch/>, <zipfileset/> and <fixlastmodified/>.  Providing the user syntactical options in this area seems like a good thing.

That said support for <fixlastmodified/> is the cleanest technique and if we decide to support only one syntax I think this is the best option although it would mean reworking my patch :(

If we decide to support multiple options I think my patch as is implements the support for <zip/>.  Viewing the other tags as independent issues and patches makes thing simple and then when the support is in place syntactically we could make a follow on patch to unify the implementation to provide a common class similar to (org.apache.tools.ant.taskdefs.Touch.DateFormatFactory).   Making the DateFormatFactory a package scope helper class seems like a good idea because then all the tags could use consistent implementation.  (my patch currently uses an equivalent parsing technique for dates).

Still given the nature of the problem the simple solution is to simply include the zip patch as is and provide support for now.  The patch provides the function needed for most usage using the <zip/> syntax.  Given the niche nature of the core issue it might be reasonable to simply leave the support only at <zip/>  (i.e. I don't think we have universal support for the roundup attribute)
Comment 19 Jesse Glick 2009-06-24 12:16:11 UTC
While I like Stefan's idea of a resource decorator in principle, and would support this being the one and only way to implement this feature (-1 on multiple syntactic options for accomplishing the same task), I fear that it won't work if some of the nested <zipfileset>s use filemode="..." (or prefix="...", etc.): the Zip task does a checkcast on ZipFileSet to use these extended features not available in a general Resource.
Comment 20 Stefan Bodewig 2009-06-25 02:21:03 UTC
In general the cast to ArchiveFileSet should be fixable by introducing an interface for resources that have a prefix/permissions/... (or extend ArchiveResource) and using the new as() method - but it would involve a major rewrite of the addResources methods which might be pretty hard to do without breaking backwards compatibility of the API.

In addition we take a shortcut for ZipFileSets that actually read zip files to avoid repeatedly opening and closing the archive - this would be impossible to do using the decorator approach so anybody using the decorator would take a (major) performance hit when repackaging zips.

I'll need to look around the code a bit more and maybe take some experiments to finalize my thought on this.
Comment 21 Jesse Glick 2009-06-25 15:11:57 UTC
To your first concern - I guess Resource could use the adapter pattern:

Resource r = ...;
ZipEntryResource r2 = r.as(ZipEntryResource.class);
if (r2 != null) ...;

(This is better than instanceof/casts because you can still create a generic proxy Resource impl without static knowledge of all possible extensions.)

To your second concern - could perhaps be solved using an external "session" object which can keep track of open ZIPs and close them when the task is done.

Anyway, I agree that going this route would involve some nontrivial changes and the risk of API incompatibility.