SA Bugzilla – Bug 5444
Non-text temp files not removed
Last modified: 2007-07-10 10:04:40 UTC
Using version 3.2.0 .spamassassinXXXXXXXXXXtmp files containing non-text are being left behind. This does not occur for all messages, only a small subset of those actually being scanned. Mail is still being processed as expected.
I see the same behaviour. The tmpfiles are not really left behind, they just stay there for quite a long time, in my case many hours. From what I found the the issue comes from the following lines in Message.pm: # If the part type is not one that we're likely to want to use, go # ahead and write the part data out to a temp file -- why keep sucking # up RAM with something we're not going to use? # if ($msg->{'type'} !~ m@^(?:text/(?:plain|html)$|message\b)@) { my $filepath; ($filepath, $msg->{'raw'}) = Mail::SpamAssassin::Util::secure_tmpfile(); if ($filepath) { # The temp file was created, add it to the list of pending deletions # we cannot just delete immediately in the POSIX idiom, as this is # unportable (to win32 at least) push @{$self->{tmpfiles}}, $filepath; $msg->{'raw'}->print(@{$body}); } } # if the part didn't get a temp file, go ahead and store the data in memory if (!exists $msg->{'raw'}) { $msg->{'raw'} = $body; } Simply commenting out the first section of the code makes those tempfiles not appear, so I know the problem is here. From what I understand the tmpfile here is created but only removed when the process terminates, to overcome the inability of win32 to unlink an open file. Maybe there is still a more elegant solution than keeping those file on disk for a long time.
You would think they could be removed after SA has completed processing the message that they were created from. By that point anything profitably looking at them should have terminated, as the score has already been generated.
(In reply to comment #2) > You would think they could be removed after SA has completed processing the > message that they were created from. By that point anything profitably looking > at them should have terminated, as the score has already been generated. that's what should be happening -- iirc, the Mail::SA::Message->finish() method deletes them?
At least in my case using spampd from http://www.worlddesign.com/index.cfm/rd/mta/spampd.htm the problem seems to be that it does a $status->finish(); but no $mail->finish(); for whatever reason. My guess, whithout really understanding the whole thing, is that $mail->finish(); would cleanup those tmpfiles. Spampd uses Net::Server::PreForkSimple for it's worker childs and per default run with max_requests = 20 before the child dies. So I guess only after the 20 requests, the child dies and therefore the tmpfiles are cleaned up. And because many childs also run in parallel one can imagine how many tmpfiles are always lying around before they get removed. Maybe a kind person who understands more of the code could have a look at the spampd code and tell me whether adding $mail->finish(); is the solution without breaking anything. Thanks in advance.
> Maybe a kind person who understands more of the code could have a look at the > spampd code and tell me whether adding $mail->finish(); is the solution without > breaking anything. Thanks in advance. yep, spampd (and all users of the Mail::SpamAssassin classes) should always call $mail->finish() on every mail object, otherwise it'll leak memory, leave temporary files around, etc.
Justin, thanks for the clarification. I'll suggest the change to the spampd maintainer.
IMO, having to call finish() is a bit of a hack. When the object goes out of scope, its destructor should do the equivalent of finish().
David -- the problem is that we have to maintain circular dependencies internally, so using a DESTROY hook won't work, unless we switch to use weak references, which aren't supported in perl 5.6.x, one of our supported platforms. The end result is that we need an explicit API to break those weak refs so that GC'ing will work.
(In reply to comment #7) > ... destructor ... Never mind... I forgot about the circular references strewn all throughout SpamAssassin. :-(
OK... circular references are a problem. Should that be opened as a separate bug? I'm sure there's a way to architect it so you don't need circular references. (I realize that would be a *huge* change to SpamAssassin, but I think it's a worthy long-term goal.) Regards, David.
sure, open away. it may be possible to redo without them...
FWIW, I can't think of any circular references in the Message/Message::Node code. The objects are trees without parent references, so it's all unidirectional. However, the last time we tried to let Perl's garbage collection "do the right thing", it barfed horribly (ie: didn't work), so we had to go back to the finish() method. FWIW, calling Message::finish() has been a documented/required part of the API for a long time (since the release of 3.0.0), so regardless of SA's code getting improved, it's a bug in MD imo. Also FWIW, this is one of the reasons I like having a decently long RC cycle before a new major release so that people can test with third party tools to shake loose issues in both tools. Apparently no one with MD tested 3.2 until it was released. :( (In reply to comment #10) > OK... circular references are a problem. Should that be opened as a separate > bug? I'm sure there's a way to architect it so you don't need circular > references. (I realize that would be a *huge* change to SpamAssassin, but I > think it's a worthy long-term goal.) > > Regards, > > David. >
(In reply to comment #12) > FWIW, I can't think of any circular references in the Message/Message::Node > code. The objects are trees without parent references, so it's all > unidirectional. However, the last time we tried to let Perl's garbage > collection "do the right thing", it barfed horribly (ie: didn't work), so we had > to go back to the finish() method. There's probably a few cases we can clean up a little though... > Also FWIW, this is one of the reasons I like having a decently long RC cycle > before a new major release so that people can test with third party tools to > shake loose issues in both tools. Apparently no one with MD tested 3.2 until it > was released. :( I was pretty sure this was going to happen anyway ;) We had a pretty long set of rc's, but it was clear most people were avoiding them until we made a GA release.
As reported by others, this bug also causes tmp files to be orphaned by MIMEDefang using SA 3.2.0. The following patch from David F. Skoll against the latest MIMEDefang should fix it for MIMEDefang. (This patch is against SVN trunk, but it applies cleanly to 2.62.) Testing shows that this resolves the issue for MIMEDefang / SpamAssassin 3.2.0 users. Index: mimedefang.pl.in =================================================================== --- mimedefang.pl.in (revision 14638) +++ mimedefang.pl.in (working copy) @@ -6317,6 +6317,7 @@ my $status; push_status_tag("Running SpamAssassin"); $status = $object->check($mail); + $mail->finish(); pop_status_tag(); return $status; }
The maintainer of spampd will address the issue in a new upcoming release. As a quick solution I implemented the patch below in my spampd-2.30 rpms and things look good after 24h in production. --- spampd-2.30/spampd.orig 2005-10-31 20:45:53.000000000 +0100 +++ spampd-2.30/spampd 2007-05-08 10:34:57.000000000 +0200 @@ -582,6 +582,7 @@ $self->log(2, "%s", "rules hit for $msgid: " . $status->get_names_of_tests_hit); } $status->finish(); + $mail->finish(); # set the timeout alarm back to wherever it was at alarm($previous_alarm);
hmm, I'm looking at this, and I don't see any SA bug left; it's all in the calling apps needing to call ->finish(). reopen if there's an SA bug to fix ;)
(In reply to comment #16) > hmm, I'm looking at this, and I don't see any SA bug left; it's all in the > calling apps needing to call ->finish(). reopen if there's an SA bug to fix ;) I am sorely tempted to reopen, because it's time to mention Windows. The calling app here is SpamAssassin.exe, built with perl 5.8.7, and one should believe that it does call ->finish(). While most of the time SpamAssassin does seem to clean up OK after itself, there are cases where it doesn't. Some 3.2.1 users were reporting thousands of leftover tmp files. And no, SpamAssassin.exe was not forcibly killed with TerminateProcess() or similar. The exe built from SpamAssassin 3.1.7 still works fine, so something happened along the way.
if there's a separate issue affecting only Windows, please open a separate bug for that.