Bug 144 - mass-check should support maildir
Summary: mass-check should support maildir
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Score Generation (show other bugs)
Version: 2.30CVS
Hardware: Other other
: P2 normal
Target Milestone: ---
Assignee: Craig Hughes
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-03-27 17:11 UTC by Duncan Findlay
Modified: 2002-06-15 03:58 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan Findlay 2002-03-27 17:11:39 UTC
The mass-chech script really should support maildirs -- there becoming quite
common, and it really shouldn't be that hard to implement.

From the given directory name, the files should be in the subdirs cur and
new, (I don't think tmp should be checked) with funny file names, like
1015916112.18902_0.red.daf.2y.net:2,S (although I don't think filenames are
common to different MTAs -- all files in the dir should be checked)

I'll try to look at this later, if I get around to it.
Comment 1 Duncan Findlay 2002-03-27 20:51:08 UTC
Curse of lynx.

Here's the patch:

--- mass-check  Wed Mar 27 19:56:37 2002
+++ mass-check.new      Wed Mar 27 22:48:09 2002
@@ -73,23 +73,28 @@
   foreach my $folder (@_) {
     if ($folder =~ /.tar$/)
     {
-       # it's an MH or Cyrus folder in a tar file
+       # it's an MH or Cyrus folder or Maildir in a tar file
        use Archive::Tar;
-       mass_check_mh_tar_file($sub, $folder);
+       mass_check_tar_file($sub, $folder);
     }
     elsif (-d $folder &&
          (-f "$folder/1" || -f "$folder/1.gz" || -f "$folder/cyrus.index"))
     {
       # it's an MH folder or a Cyrus mailbox
       mass_check_mh_folder($sub, $folder);
-
+    }
+    elsif (-d $folder && -d "$folder/cur" && -d "$folder/new" )
+    {
+      # Maildir!
+      mass_check_maildir($sub, $folder);
+
     } elsif (-f $folder) {
       mass_check_mailbox($sub, $folder);
     }
   }
 }

-sub mass_check_mh_tar_file {
+sub mass_check_tar_file {
   my $sub = shift;
   my $filename = shift;
   my $tar = Archive::Tar->new();
@@ -98,7 +103,7 @@
   foreach my $mail (@files) {
       next if $mail =~ m#/$# or $mail =~ /cyrus\.(index|header|cache)/;
       my $msg_data = $tar->get_content($mail);
-      my @msg = split("\r\n",$tar->get_content($mail));
+      my @msg = split("\n",$tar->get_content($mail));
       $mail =~ s/\s/_/g;
       &$sub ($mail, \@msg);
   }
@@ -132,6 +137,36 @@
     &$sub ($mail, \@msg);
   }
 }
+
+sub mass_check_maildir {
+  my $sub = shift;
+  my $folder = shift;
+  my @files = <$folder/{cur,new}/*>;
+  foreach my $mail (@files)
+  {
+    # jm: commented size checks here; I can't see how they could be
working,
+    # as "250_000" is not an int.  New size check implemented below
+
+    if ($mail =~ /\.gz$/) {
+      # next if `gunzip -c $mail|wc -c` > 250_000; #skip messages bigger than
250k
+      open (STDIN, "gunzip -cd $mail |") or warn "gunzip $mail failed: $@";
+    } elsif ($mail =~ /\.bz2$/) {
+      # next if `bzip2 -dc $mail|wc -c` > 250_000; #skip messages bigger than
250k
+      open (STDIN, "bzip2 -cd $mail |") or warn "bunzip2 $mail failed: $@";
+    } else {
+      # next if `wc -c $mail` > 250_000; #skip messages bigger than 250k
+      open (STDIN, "<$mail") or warn "open $mail failed: $@";
+    }
+
+    # skip too-big mails
+    if (-s STDIN > 250*1024) { close STDIN; next; }
+    my @msg = (<STDIN>);
+    close STDIN;
+
+    &$sub ($mail, \@msg);
+  }
+}
+

 sub mass_check_mailbox {
   my $sub = shift;
Comment 2 Craig Hughes 2002-04-01 10:57:06 UTC
Duncan, can you attach the patch file?  I'm having some weird copy/paste issue
or something where I can't get the patch file to be valid.
Comment 3 Duncan Findlay 2002-04-01 11:11:20 UTC
Stupid links won't attach the file... and the e-mail interface didn't seem to either.

I hate this. I'll be using mozilla by the end of the week hopefully (upgrade time!)

Anyways, I'll e-mail you the patch
Comment 4 Craig Hughes 2002-04-05 10:30:09 UTC
Oops, forgot to mark this fixed.