Bug 57822 - Untar task untars into rubbish
Summary: Untar task untars into rubbish
Status: RESOLVED FIXED
Alias: None
Product: Ant
Classification: Unclassified
Component: Core tasks (show other bugs)
Version: 1.9.4
Hardware: Other other
: P2 major (vote)
Target Milestone: 1.9.5
Assignee: Ant Notifications List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-16 14:41 UTC by Jerel
Modified: 2015-04-21 19:39 UTC (History)
1 user (show)



Attachments
Test .tar file (10.00 KB, application/x-tar)
2015-04-20 13:57 UTC, Jerel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jerel 2015-04-16 14:41:34 UTC
On Ant 1.7, 1.8 there are no problems. If you choose to use Ant 1.9 at runtime then this happens when trying to untar a file:

untar:
    [untar] Expanding: /MV5C/colony/worker/cics.ds.v200/PI37397/com.ibm.cics.php/natives/DSObjects.tar into /MV5C/colony/worker/cics.ds.v200/PI37397/dist

    [untar] expanding øÇø%ÑÂ%ÑÂ%?/ÀÁÊË? to /MV5C/colony/worker/cics.ds.v200/PI37397/dist/øÇø%ÑÂ%ÑÂ%?/ÀÁÊË?

    [untar] expanding øÇø%ÑÂ%ÑÂÌ/øÑÄË? to /MV5C/colony/worker/cics.ds.v200/PI37397/dist/øÇø%ÑÂ%ÑÂÌ/øÑÄË?

DSObjects.tar should untar into php/lib/** and works perfectly on previous versions of Ant.

Something seems to have gone wrong with the encoding - but the untar task does not allow any encoding modification on Ant 1.9.
Comment 1 Stefan Bodewig 2015-04-18 20:11:20 UTC
Is there any chance you could provide us with an example tar that gets garbled with Ant 1.9.x but works with 1.8.x?
Comment 2 Jerel 2015-04-20 12:50:47 UTC
Re comment 1:

One vital piece of infomation - this is on z/OS and the local codepage is ebcdic IBM-1047. Nonetheless an untar of the same file works fine before ant 1.9. 

The ant task has the job of converting UTF-8 file names to the local codepage - which appears to have regressed. Any tar can be used to test this eg ..
Comment 3 Jerel 2015-04-20 13:57:58 UTC
Created attachment 32665 [details]
Test .tar file

I used compressed some files encoded in UTF-8 (on my linux machine) and ftp'ed it to the Mainframe. There, I tried to run the untar task on it and again at Ant 1.9.4 the output filenames are completely messed up:

Expanding: /MV54/colony/worker/cics.ds.v200/JAM/untar.test.tar into /MV54/colony/worker/cics.ds.v200/JAM/test
[untar] expanding ÈÇÑË to /MV54/ÈÇÑË
[untar] expanding ÈÇ/È to /MV54/ÈÇ/È

This tar contains two sampel text files I made before compressing.
Comment 4 Stefan Bodewig 2015-04-21 18:32:24 UTC
(In reply to Jerel from comment #2)

> Any tar can be used to test this

as long as you've got access to a z/OS machine :-)

The task work fine for your archive on other platforms.

Ant 1.9 has ported a few improvements from Commons Compress, for example support for PAX extensions (special entries created by POSIX conforming tar implementations used - for example - for Unicode file names).  I wanted to see an archive causing problems to know where to look.  Your example is not using POSIX extensions at all, so something must have changed in the way Ant parses "normal" file names in tar archives.

I'll look into it.
Comment 5 Stefan Bodewig 2015-04-21 18:49:49 UTC
Yes, this is a regression.

Ant used to read the file name by a crude algorithm that effectively read the file name as ASCII <https://git-wip-us.apache.org/repos/asf?p=ant.git;a=blob;f=src/main/org/apache/tools/tar/TarUtils.java;h=1c4d960feb47021c3db819e0fd30f76217d2c4eb;hb=7105ec785cdcc0faa9afcb9b8384d4864f08a5d6#l81>

With 1.9.0 it uses the platform's default encoding by default.  Unfortunately turning it back now, would break backwards compatibility for the 1.9.x series.  I'm going to add an encoding attribute to <untar> and you could then set it to ASCII or UTF-8 when expanding the archive.
Comment 6 Stefan Bodewig 2015-04-21 19:39:19 UTC
added encoding attributes as a workaround with git commit 1a58420