Bug 5288 - [review] ArchiveIterator fails on filenames with leading spaces
Summary: [review] ArchiveIterator fails on filenames with leading spaces
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: 3.1.7
Hardware: All All
: P5 normal
Target Milestone: 3.1.8
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard: ready to commit
Keywords:
Depends on:
Blocks:
 
Reported: 2007-01-09 21:30 UTC by Daniel A. Steffen
Modified: 2007-02-04 11:40 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
patch to lib/Mail/SpamAssassin/ArchiveIterator.pm patch None Daniel A. Steffen [NoCLA]
suggested patch patch None Theo Van Dinter [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel A. Steffen 2007-01-09 21:30:36 UTC
While trying to run sa-learn on some mailboxes with leading spaces in their
names, I discovered that ArchiveIterator.pm fails to open such mailboxes because
it uses the 2 argument version of open() which strips leading whitespace from
filenames.
The attached patch fixes this fixes by prepending ./ to such filenames (as per
the  Camel book)
Comment 1 Daniel A. Steffen 2007-01-09 21:31:33 UTC
Created attachment 3821 [details]
patch to lib/Mail/SpamAssassin/ArchiveIterator.pm
Comment 2 Theo Van Dinter 2007-01-09 21:40:21 UTC
Hrm, interesting!  I didn't know about this "special" behavior of open()... 
Unfortunately, both leading and trailing whitespace is removed, and redirection
characters are honored, which is bad.  So we'll have to deal with this a little
more directly.

If anyone's interested, here's a snippet from "perldoc -f open":

               The filename passed to 2-argument (or 1-argument) form of
               open() will have leading and trailing whitespace deleted, and
               the normal redirection characters honored.  This property,
               known as "magic open", can often be used to good effect.  A
               user could specify a filename of "rsh cat file |", or you could
               change certain filenames as needed:

                   $filename =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
                   open(FH, $filename) or die "Can't open $filename: $!";

               Use 3-argument form to open a file with arbitrary weird charac-
               ters in it,

                   open(FOO, '<', $file);

               otherwise it's necessary to protect any leading and trailing
               whitespace:

                   $file =~ s#^(\s)#./$1#;
                   open(FOO, "< $file\0");
Comment 3 Daniel A. Steffen 2007-01-09 21:46:34 UTC
indeed, it would probably be better to use the 3 argument version of open() in
mail_open (execpt in the .gz/.bz2 cases)
Comment 4 Theo Van Dinter 2007-01-09 22:12:19 UTC
(In reply to comment #3)
> indeed, it would probably be better to use the 3 argument version of open() in
> mail_open (execpt in the .gz/.bz2 cases)

Actually we need to do it there as well to deal with the whitespace issue.  I
have a patch that I'm almost done with which does this. :)
Comment 5 Theo Van Dinter 2007-01-09 22:26:40 UTC
ok, patch applied to 3.2:

Sending        lib/Mail/SpamAssassin/ArchiveIterator.pm
Transmitting file data .
Committed revision 494733.

will put up a 3.1 version shortly
Comment 6 Theo Van Dinter 2007-01-09 22:27:47 UTC
Created attachment 3823 [details]
suggested patch

This patch backports 3.2's mail_open function, so it includes the fix for this
bug as well as bug 5249 (use binmode after opening the file).
Comment 7 Daniel A. Steffen 2007-01-09 23:48:49 UTC
there is one thing your patch does not take into account: opening of STDIN via a
filename of '-', which the 2 arg version of open() does. This breaks e.g. piping
mail into spamassassin...
Comment 8 Theo Van Dinter 2007-01-09 23:59:02 UTC
(In reply to comment #7)
> there is one thing your patch does not take into account: opening of STDIN via a
> filename of '-', which the 2 arg version of open() does. This breaks e.g. piping
> mail into spamassassin...

Actually, ArchiveIterator doesn't support "-", so that's not an issue. 
spamassassin and sa-learn handle reading from STDIN and creating a temp file,
which is what's actually used.  (bug 5145 dealt with this for 3.1.8 -- you're
probably right that things would break if you only apply this patch to 3.1.7)

:)
Comment 9 Daniel A. Steffen 2007-01-10 00:02:20 UTC
yes, the patch breaks 3.1.7, which is how I found out about it...
the following extra condition at the top of mail_open fixes it:

  if ($file eq "-") {
    $mode = '<&';
    @expr = ( "STDIN" );
  }
Comment 10 Michael Parker 2007-01-31 20:29:04 UTC
+1
Comment 11 Justin Mason 2007-02-02 02:56:25 UTC
+1
Comment 12 Theo Van Dinter 2007-02-04 11:40:11 UTC
Sending        lib/Mail/SpamAssassin/ArchiveIterator.pm
Transmitting file data .
Committed revision 503461.