Bug 62849

Summary: DirectoryScanner does not efficiently handle filesystem loops
Product: Ant Reporter: Michael Barker <mikeb01>
Component: CoreAssignee: Ant Notifications List <notifications>
Status: NEW ---    
Severity: normal    
Priority: P2    
Version: 1.9.13   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Patch to handle file system loop in directory scanner.
Patch to handle file system loop in directory scanner v2
Patch to handle file system loop in directory scanner v3
Patch to handle file system loop in directory scanner v4

Description Michael Barker 2018-10-23 19:35:05 UTC
If you have a large file structure being scanned and the file system contains a loop caused by a symbolic link then it is possible for the directory scanner to run out of memory before the 'maxLevelsOfSymlinks' protection kicks in.

This affects us in Jenkins which uses the DirectoryScanner to pick up artefacts for publishing/archiving.
Comment 1 Michael Barker 2018-10-23 19:37:51 UTC
Created attachment 36206 [details]
Patch to handle file system loop in directory scanner.

This patch will skip over directories that create a file system loop.  It does this by checking if the canonical path of a child directory is a prefix of the canonical path of the parent.

There is a test included.

The test also highlighted that the delete task will follow symlinks, which can also fall afoul of file system loops, so I've changed the delete task to only follow directories that are not symlinks and just delete the symlink itself instead.
Comment 2 Michael Barker 2018-10-23 19:38:59 UTC
I've attached a proposed patch for this issue.
Comment 3 Michael Barker 2018-10-24 07:35:27 UTC
Created attachment 36210 [details]
Patch to handle file system loop in directory scanner v2

Updated patch that contains more useful log message.
Comment 4 Jaikiran Pai 2018-10-24 08:12:19 UTC
Hello Michael,

I haven't thoroughly reviewed the patch or checked the implication of the change. However, this change:

--- src/main/org/apache/tools/ant/DirectoryScanner.java	(date 1531197234000)
+++ src/main/org/apache/tools/ant/DirectoryScanner.java	(date 1540366425000)
@@ -1258,6 +1258,18 @@ 
             final String name = vpath + newFile;
             final TokenizedPath newPath = new TokenizedPath(path, newFile);
             final File file = new File(dir, newFile);
+
+            try {
+                String fileCanonicalPath = file.getCanonicalPath();
+                String dirCanonicalPath = dir.getCanonicalPath();
+                if (dirCanonicalPath.startsWith(fileCanonicalPath)) {
+                    continue;
+                }
+            } catch (IOException e) {
+                System.err.println("Failed to get canoncial paths to determine " +
+                    "if filesyste loop exists, continuning");
+            }
+


Could you instead try something like this (just pseudo-code, haven't checked for compile errors):

            try {
                if (FileUtils.getFileUtils().isLeadingPath(file.getAbsoluteFile(), dir.getAbsoluteFile(), true)) {
                    continue;
                }
            } catch (IOException e) {
                // log it
            }

The FileUtils#isLeadingPath with resolveSymlink = true would do the same as what you have attempted in the patch.
Comment 5 Michael Barker 2018-10-24 08:18:21 UTC
Created attachment 36211 [details]
Patch to handle file system loop in directory scanner v3

Use FileSystemUtils.
Comment 6 Jaikiran Pai 2018-11-01 13:22:09 UTC
Hello Michael,

The v3 patch mostly looks fine to me. However, I'm not too sure about the changes you have in the Delete task where you no longer delete a directory if it is a symbolic link. I'll need inputs from Stefan and/or others on this one before merging it.
Comment 7 Michael Barker 2018-11-01 19:41:23 UTC
The reason to change the delete task is that that unit test would get stuck in a loop when I added the cases where there was a file system loop.  There are a couple of options that I could take with the delete task:

1) Use the same loop detection as in the Directory scanner an not follow links that create a file system loop.

2) Change the delete task to take a parameter of something like followSymLinks (defaulted to true) so that existing behaviour remains the default and the new behaviour is optional.

My preference is for #1 as it requires less input from a user to prevent it from breaking.  I'll have a look at code some of this up this weekend.
Comment 8 Michael Barker 2018-11-02 01:58:45 UTC
Created attachment 36231 [details]
Patch to handle file system loop in directory scanner v4

This changes the delete task to similarly handle file system loops.  It won't follow a symlink on delete if that symlink results in a file system loop.  It will follow it in other cases.
Comment 9 Jaikiran Pai 2018-11-03 12:26:32 UTC
Michael, I have now merged this patch (with minor changes that include change to the error messages and a comment to the changed code) to our 1.9.x and master branches.

It seems to have introduced a failure in an existing test[1]. I'll look into it to see what needs to be done. I'll let this issue stay open until that test failure is sorted out.

[1] https://builds.apache.org/job/Ant-Build-Matrix-1.9.x-Linux/OS=xenial,jdk=JDK%201.5%20(latest)/286/
Comment 10 Michael Barker 2018-11-04 08:06:31 UTC
The names from those tests suggest that the functionality I've added has changed the behaviour such that those tests are violated.  We may need to remove those tests if we want to us my patch.
Comment 11 Jaikiran Pai 2018-12-08 04:13:38 UTC
Hello Michael,

I have been away a bit, so couldn't respond earlier. Coming back to this, you mention OOM issue that initiated this patch. Would it be possible to attach a build file that reproduces this and also the exception stacktrace that you see (without the patches)? That way, we can see if this can be solved in a different manner.
Comment 12 Michael Barker 2018-12-09 19:51:14 UTC
The current scenario we are running into:

- We have a build with roughly the following structure.  This is an artefact of the build tool we happen to be using (BUCK).

<project root>
  <source directory>
  ...
  <source directory>
  <build output>
     <symlink -> project root> A
     <output directory>
        <symlink -> A>
     ...
     <output directory>
        <symlink -> A>

The build has a total of around 1M files, of which ~800K are in the output directory.  Note that we don't use Ant to build any of this.  Where this hits us is that we use Jenkins for CI  and the Ant DirectoryScanner is used find artefacts from the build to archive.

When Jenkins runs the scanner is continues to recurse the project through the project root symlink.  Because the DirectoryScanner holds onto all of the files it visits (not just the ones that match) it rapidly builds a very large list of files.  Some heap debugging shows ~50M entries in the directory scanner before the Jenkins archive task starts to fail with OOM errors.

I'll have a look later this week about trying to build a test case for this.
Comment 13 Michael Barker 2019-04-08 22:34:15 UTC
I've fixed the problems caused by this issue elsewhere (Jenkins).  Feel free to close.