SA Bugzilla – Bug 5145
[review] M::SA::ArchiveIterator::scan_file() eats headers, rendering them lost if INPUT is STDIN
Last modified: 2006-12-11 11:37:56 UTC
If $self->{determine_receive_date} is true (which there is four ways it can be), scan_file() will read the headers to feed them to M::SA::Util::receive_date(). Unfortunately, this causes them to be lost if a target is '-' (stdin). Easiest fix would probably be always to write standard input to a temporary file in sa-learn (not only if mb[o]x format is specified). (By the way, what if you run $ sa-learn --spam file1 :mbox:- < mbox1 ? The code in sa-learn only checks $targets[0].)
I'd just like to put in a big vote for this bug. It bites me time after time. It would be better for sa-learn to refuse to work from stdin than have the current, broken, behaviour.
(In reply to comment #0) > If $self->{determine_receive_date} is true (which there is four ways it can > be), scan_file() will read the headers to feed them to > M::SA::Util::receive_date(). Unfortunately, this causes them to be lost if a > target is '-' (stdin). Hrm. Yeah, AI doesn't handle STDIN so well. It really expects to get a list of files and directories, STDIN/"-" is this extra hackish thing mostly because perl will accept "-" for open(). > Easiest fix would probably be always to write standard input to a temporary > file in sa-learn (not only if mb[o]x format is specified). I can live with that. > $ sa-learn --spam file1 :mbox:- < mbox1 > ? The code in sa-learn only checks $targets[0].) Part of me would want to tell you to just run "sa-learn --spam file1 :mbox:mbox1" which will work fine. So IMO what should happen is: a) go through all the targets and look for "-" b) the first time "-" is found, generate a temp file and shove STDIN into it c) if "-" is found more than once, remove it from the target list I'm tempted to remove the current hackish STDIN/"-" support from ArchiveIterator and just make it only deal with proper files and directories, perhaps even punting when it gets "-". I'll see if I can get a patch done up this week.
Created attachment 3760 [details] suggested patch Ok, this puts a more comprehensive STDIN handler in spamassassin and sa-learn as per my previous comment. It also modified ArchiveIterator slightly to specifically skip "-" and not do any special handling. (I thought about having AI modify the target listing, but since a random new temp is generated, the caller needs to know about it, so the caller really has to deal with this, hence AI now explicitly not dealing with it.)
Looks good, tested. +1
+1
[dos@FC5-VPC 3.1]$ svn ci -m "bug 5145: better deal with STDIN in spamassassin and sa-learn, since ArchiveIterator doesn't deal with it so well" Sending lib/Mail/SpamAssassin/ArchiveIterator.pm Sending sa-learn.raw Sending spamassassin.raw Transmitting file data ... Committed revision 485836.