Bug 41924

Summary: Untar target does not handle long filenames in POSIX tar files
Product: Ant Reporter: Peter Liljenberg <pliljenberg>
Component: CoreAssignee: Ant Notifications List <notifications>
Status: RESOLVED FIXED    
Severity: normal CC: bob, sagi.benakiva, vsizikov
Priority: P2 Keywords: PatchAvailable
Version: 1.7.0   
Target Milestone: 1.9.0   
Hardware: All   
OS: All   
Attachments: Testtar file
Proposed patch for this issue
Patch to fix Posix prefix handling

Description Peter Liljenberg 2007-03-21 15:46:00 UTC
When running untar with a POSIX created tar file, long filenames (more than 100
characters) are not handled correctly. This results in the file in the tar file
will be untarred in the root folder instead of the correct subfolder.
Comment 1 Peter Liljenberg 2007-03-21 15:46:51 UTC
Created attachment 19769 [details]
Testtar file

Test tar file that will break the untar target
Comment 2 Peter Liljenberg 2007-03-21 15:47:47 UTC
The problem is recreated by using the supplied tar file (test.tar) with the
untar target.
Comment 3 Peter Liljenberg 2007-03-21 15:49:46 UTC
Created attachment 19770 [details]
Proposed patch for this issue

Proposed patch for this issue
Comment 4 Peter Liljenberg 2007-03-21 15:50:34 UTC
I've provided a proposal for a patch to resolve the issue. Can someone verify
that I haven't created some other bugs with this patch.
Comment 5 J.M. (Martijn) Kruithof 2007-04-15 11:00:11 UTC
Currently untar only supports gnu tar long file names. In order to also support
the posix 2001 format for long file names a further check should be done on the
header to verify it this is a file in such format.
Comment 6 Vladimir Sizikov 2008-05-18 08:38:15 UTC
This is a pretty serious issue for us, since now Git uses POSIX tar format for the tarballs, and this makes it impossible to extract the content of such tarballs properly via Ant means.
Comment 7 Peter Liljenberg 2008-05-18 10:57:17 UTC
You could try and use the supplied patch, it did work for me. It's not 100% tested or verified, but solved the troubles with long filenames for me. Not using it anymore though since we migrated to Maven.
Comment 8 Bob Toxen 2008-07-09 08:48:34 UTC
Documenting a bug is not fixing it.  I did not see the note that subtly pointed out that extracting ant with standard tar WILL FAIL.  This caused me unnecessary wasted time.  I have NEVER seen a GNU or other Free Software project tolerate such sloppyness, only Microsoft.

Recommended fix #1:
  Shorten pathnames to 100 characters.

Recommended fix #2:
  Use nested tar files with the "inside" tar archives having files relative
  to higher directories.  In other words, for the file long1/long2/long3.java,
  have the main "top level" tar file have the file long1/long2/short.tar with
  short.tar having files relative to long1/long2, such as just long3.java.
  Then, as part of the build procedure do "cd long1/long2;tar -xf short.tar".

Recommended fix #3 (and least desirable):
  Have the ./configure test for the existence of one of the long file names.
  If it does not exist (and maybe even test for the existence of the name
  truncated to 100 characters).  If the long file name does not exist then
  the configure should fail with an explanation.  This should be trivial to
  add.
Comment 9 Matt Benson 2008-07-09 10:32:58 UTC
(In reply to comment #8)
> Documenting a bug is not fixing it.  I did not see the note that subtly pointed
> out that extracting ant with standard tar WILL FAIL.  This caused me
> unnecessary wasted time.  I have NEVER seen a GNU or other Free Software
> project tolerate such sloppyness, only Microsoft.

Your problem is with extracting Ant itself?  That actually isn't related to this issue.

For what it's worth, though, poor spelling might itself be taken as a sign of "sloppiness," as might failure to read directions.  Further, your invocation of the holy name of GNU leads me to point out that the issue in question being with GNU tar formats, it's obvious "plain" tar didn't satisfy that organization either.

> 
> Recommended fix #1:
>   Shorten pathnames to 100 characters.

That's like saying "Some cars are small.  I'll cut off my head so I can fit into one of these."  You wouldn't do that; you'd just use a car into which you can fit.

> 
> Recommended fix #2:
>   Use nested tar files with the "inside" tar archives having files relative
>   to higher directories.  In other words, for the file long1/long2/long3.java,
>   have the main "top level" tar file have the file long1/long2/short.tar with
>   short.tar having files relative to long1/long2, such as just long3.java.
>   Then, as part of the build procedure do "cd long1/long2;tar -xf short.tar".

There really isn't a build procedure, per se.  Extract and go.

> 
> Recommended fix #3 (and least desirable):
>   Have the ./configure test for the existence of one of the long file names.
>   If it does not exist (and maybe even test for the existence of the name
>   truncated to 100 characters).  If the long file name does not exist then
>   the configure should fail with an explanation.  This should be trivial to
>   add.
> 

Once again, there is no configure script shipped with Ant, nor is there a makefile.  You DO know what project this is, right?
Comment 10 Richard Gussmann 2009-08-10 11:18:15 UTC
When you work on solaris 10 a posix compliant tar file would be created for entries longer (or equal) 100 chars (solaris special). When you process such a file with ant, the file will be extracted into the root which is definitely wrong and the prefix part will be ignored. So I strongly recommend a fix to deal with this issue.

To make sure this is done only with posix compliant tar archives you need to check whether the ustar followed by a zero byte marker is present. Then also the check supplied with the patch should be sufficient. 

A modified the TarEntry.java as follows

at the end of the function I added ... 

    public void parseTarHeader(byte[] header) {

        ... original code here 

        boolean ustarFormat = false;  
	//
	// NOTE Recognize archive header format.
	//
	if (  header[257] == 'u'
			&& header[258] == 's'
			&& header[259] == 't'
			&& header[260] == 'a'
			&& header[261] == 'r'
			&& header[262] == 0 ) {
		ustarFormat = true;
	} /* if */
        
        if (ustarFormat && header[offset] != 0) {
            offset += DEVLEN;
            StringBuffer buf = new StringBuffer(156);
            buf = TarUtils.parseName(header, offset, 155);
            buf.append('/');
            buf.append(name);
            name = buf;           
        } /* if */
Comment 11 Sebb 2011-08-20 12:22:46 UTC
This was fixed in Commons Compress some while ago - see https://issues.apache.org/jira/browse/COMPRESS-110

[Note that WinZip 9.0 also has the same issue; 7-Zip does not]
Comment 12 Sebb 2011-08-20 19:17:36 UTC
Created attachment 27419 [details]
Patch to fix Posix prefix handling
Comment 13 Stefan Bodewig 2011-08-21 04:22:12 UTC
Are you sure POSIX longfile support in Commons Compress is complete?

If it is, then using the Compress Antlib with Commons Compress 1.2 will work.
Comment 14 aditsu 2012-06-16 15:43:30 UTC
Is that why ant fails to extract this file correctly? http://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/snapshot/jetty-8.1.4.v20120524.tar.bz2
Comment 15 Stefan Bodewig 2012-06-17 05:03:52 UTC
(In reply to comment #14)
> Is that why ant fails to extract this file correctly?
> http://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/snapshot/jetty-
> 8.1.4.v20120524.tar.bz2

yes
Comment 16 Stefan Bodewig 2012-06-17 05:05:23 UTC
fixed with svn revision 1350857 by merging Commons Compress' (1.4.1) code into Ant
Comment 17 Sagi 2012-07-22 11:09:05 UTC
Hi,

I'm suffering from a similar problem, but in my case the file with the long name is not a regular file but a soft link.
In my case the link name is very long and not the link full path (as in the Testtar file).

I looked at tar source code and I think that the solution for this issue is not complete.
Beside the definition for GNUTYPE_LONGNAME, 
there's a definition for GNUTYPE_LONGLINK, i.e. (from tar.h) :
  /* Identifies the *next* file on the tape as having a long linkname.  */
  #define GNUTYPE_LONGLINK 'K'

in my testcase the function TarEntry::isGNULongNameEntry returns FALSE
because linkFlag != LF_GNUTYPE_LONGNAME (linkFlag == (byte)'K')

I even noticed that there's no definition for LF_GNUTYPE_LONGLINK in TarConstants.java

Thank you,
  Sagi.