Bug 5444 - Non-text temp files not removed
Summary: Non-text temp files not removed
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 3.2.0
Hardware: Other Linux
: P1 normal
Target Milestone: 3.2.2
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-05-02 13:39 UTC by Jason Bertoch
Modified: 2007-07-10 10:04 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Bertoch 2007-05-02 13:39:22 UTC
Using version 3.2.0 .spamassassinXXXXXXXXXXtmp files containing non-text are
being left behind.  This does not occur for all messages, only a small subset of
those actually being scanned.  Mail is still being processed as expected.
Comment 1 Simon Matter 2007-05-05 02:45:07 UTC
I see the same behaviour. The tmpfiles are not really left behind, they just
stay there for quite a long time, in my case many hours.
From what I found the the issue comes from the following lines in Message.pm:

  # If the part type is not one that we're likely to want to use, go
  # ahead and write the part data out to a temp file -- why keep sucking
  # up RAM with something we're not going to use?
  #
  if ($msg->{'type'} !~ m@^(?:text/(?:plain|html)$|message\b)@) {
    my $filepath;
    ($filepath, $msg->{'raw'}) = Mail::SpamAssassin::Util::secure_tmpfile();

    if ($filepath) {
      # The temp file was created, add it to the list of pending deletions
      # we cannot just delete immediately in the POSIX idiom, as this is
      # unportable (to win32 at least)
      push @{$self->{tmpfiles}}, $filepath;
      $msg->{'raw'}->print(@{$body});
    }
  }

  # if the part didn't get a temp file, go ahead and store the data in memory
  if (!exists $msg->{'raw'}) {
    $msg->{'raw'} = $body;
  }

Simply commenting out the first section of the code makes those tempfiles not
appear, so I know the problem is here. From what I understand the tmpfile here
is created but only removed when the process terminates, to overcome the
inability of win32 to unlink an open file. Maybe there is still a more elegant
solution than keeping those file on disk for a long time.
Comment 2 Loren Wilton 2007-05-05 03:00:03 UTC
You would think they could be removed after SA has completed processing the 
message that they were created from.  By that point anything profitably looking 
at them should have terminated, as the score has already been generated.
Comment 3 Justin Mason 2007-05-05 03:51:52 UTC
(In reply to comment #2)
> You would think they could be removed after SA has completed processing the 
> message that they were created from.  By that point anything profitably looking 
> at them should have terminated, as the score has already been generated.

that's what should be happening -- iirc, the Mail::SA::Message->finish() method
deletes them?
Comment 4 Simon Matter 2007-05-05 09:46:35 UTC
At least in my case using spampd from
http://www.worlddesign.com/index.cfm/rd/mta/spampd.htm the problem seems to be
that it does a $status->finish(); but no $mail->finish(); for whatever reason.
My guess, whithout really understanding the whole thing, is that
$mail->finish(); would cleanup those tmpfiles. Spampd uses
Net::Server::PreForkSimple for it's worker childs and per default run with
max_requests = 20 before the child dies. So I guess only after the 20 requests,
the child dies and therefore the tmpfiles are cleaned up. And because many
childs also run in parallel one can imagine how many tmpfiles are always lying
around before they get removed.
Maybe a kind person who understands more of the code could have a look at the
spampd code and tell me whether adding $mail->finish(); is the solution without
breaking anything. Thanks in advance.
Comment 5 Justin Mason 2007-05-05 10:47:57 UTC
> Maybe a kind person who understands more of the code could have a look at the
> spampd code and tell me whether adding $mail->finish(); is the solution without
> breaking anything. Thanks in advance.

yep, spampd (and all users of the Mail::SpamAssassin classes) should always call 
$mail->finish() on every mail object, otherwise it'll leak memory, leave
temporary files around, etc.
Comment 6 Simon Matter 2007-05-05 14:37:09 UTC
Justin, thanks for the clarification. I'll suggest the change to the spampd
maintainer.
Comment 7 Dianne Skoll 2007-05-08 10:44:24 UTC
IMO, having to call finish() is a bit of a hack.  When the object goes out
of scope, its destructor should do the equivalent of finish().
Comment 8 Justin Mason 2007-05-08 10:54:14 UTC
David -- the problem is that we have to maintain circular dependencies
internally, so using a DESTROY hook won't work, unless we switch to use weak
references, which aren't supported in perl 5.6.x, one of our supported
platforms.  The end result is that we need an explicit API to break those weak
refs so that GC'ing will work.
Comment 9 Dianne Skoll 2007-05-08 11:13:26 UTC
(In reply to comment #7)
 > ... destructor ...

Never mind... I forgot about the circular references strewn all throughout
SpamAssassin. :-(

Comment 10 Dianne Skoll 2007-05-08 11:14:41 UTC
OK... circular references are a problem.  Should that be opened as a separate
bug?  I'm sure there's a way to architect it so you don't need circular
references.  (I realize that would be a *huge* change to SpamAssassin, but I
think it's a worthy long-term goal.)

Regards,

David.
Comment 11 Justin Mason 2007-05-08 11:26:05 UTC
sure, open away.  it may be possible to redo without them...
Comment 12 Theo Van Dinter 2007-05-08 13:42:19 UTC
FWIW, I can't think of any circular references in the Message/Message::Node
code.  The objects are trees without parent references, so it's all
unidirectional.  However, the last time we tried to let Perl's garbage
collection "do the right thing", it barfed horribly (ie: didn't work), so we had
to go back to the finish() method.

FWIW, calling Message::finish() has been a documented/required part of the API
for a long time (since the release of 3.0.0), so regardless of SA's code getting
improved, it's a bug in MD imo.

Also FWIW, this is one of the reasons I like having a decently long RC cycle
before a new major release so that people can test with third party tools to
shake loose issues in both tools.  Apparently no one with MD tested 3.2 until it
was released. :(


(In reply to comment #10)
> OK... circular references are a problem.  Should that be opened as a separate
> bug?  I'm sure there's a way to architect it so you don't need circular
> references.  (I realize that would be a *huge* change to SpamAssassin, but I
> think it's a worthy long-term goal.)
> 
> Regards,
> 
> David.
> 

Comment 13 Justin Mason 2007-05-08 15:41:54 UTC
(In reply to comment #12)
> FWIW, I can't think of any circular references in the Message/Message::Node
> code.  The objects are trees without parent references, so it's all
> unidirectional.  However, the last time we tried to let Perl's garbage
> collection "do the right thing", it barfed horribly (ie: didn't work), so we had
> to go back to the finish() method.

There's probably a few cases we can clean up a little though...

> Also FWIW, this is one of the reasons I like having a decently long RC cycle
> before a new major release so that people can test with third party tools to
> shake loose issues in both tools.  Apparently no one with MD tested 3.2 until it
> was released. :(

I was pretty sure this was going to happen anyway ;)  We had a pretty long set
of rc's, but it was clear most people were avoiding them until we made a GA release.
Comment 14 Kevin A. McGrail 2007-05-09 06:51:33 UTC
As reported by others, this bug also causes tmp files to be orphaned by
MIMEDefang using SA 3.2.0.

The following patch from David F. Skoll against the latest MIMEDefang should fix
it for MIMEDefang.  (This patch is against SVN trunk, but it applies cleanly
to 2.62.)

Testing shows that this resolves the issue for MIMEDefang / SpamAssassin 3.2.0
users.

Index: mimedefang.pl.in
===================================================================
--- mimedefang.pl.in (revision 14638)
+++ mimedefang.pl.in (working copy)
@@ -6317,6 +6317,7 @@
     my $status;
     push_status_tag("Running SpamAssassin");
     $status = $object->check($mail);
+    $mail->finish();
     pop_status_tag();
     return $status;
 }
Comment 15 Simon Matter 2007-05-09 08:53:53 UTC
The maintainer of spampd will address the issue in a new upcoming release. As a
quick solution I implemented the patch below in my spampd-2.30 rpms and things
look good after 24h in production.

--- spampd-2.30/spampd.orig     2005-10-31 20:45:53.000000000 +0100
+++ spampd-2.30/spampd  2007-05-08 10:34:57.000000000 +0200
@@ -582,6 +582,7 @@
                                $self->log(2, "%s", "rules hit for $msgid: " .
$status->get_names_of_tests_hit); }
 
                    $status->finish();
+                   $mail->finish();
      
                    # set the timeout alarm back to wherever it was at
                    alarm($previous_alarm);
Comment 16 Justin Mason 2007-06-15 05:35:08 UTC
hmm, I'm looking at this, and I don't see any SA bug left; it's all in the
calling apps needing to call ->finish().  reopen if there's an SA bug to fix ;)
Comment 17 Thomas Eisenbarth 2007-07-10 09:55:56 UTC
(In reply to comment #16)
> hmm, I'm looking at this, and I don't see any SA bug left; it's all in the
> calling apps needing to call ->finish().  reopen if there's an SA bug to fix
 ;)

I am sorely tempted to reopen, because it's time to mention Windows.

The calling app here is SpamAssassin.exe, built with perl 5.8.7, and one should 
believe that it does call ->finish().

While most of the time SpamAssassin does seem to clean up OK after itself, 
there are cases where it doesn't. Some 3.2.1 users were reporting thousands of 
leftover tmp files. And no, SpamAssassin.exe was not forcibly killed with 
TerminateProcess() or similar.

The exe built from SpamAssassin 3.1.7 still works fine, so something happened 
along the way.
Comment 18 Justin Mason 2007-07-10 10:04:40 UTC
if there's a separate issue affecting only Windows, please open a separate bug
for that.