Bug 5145 - [review] M::SA::ArchiveIterator::scan_file() eats headers, rendering them lost if INPUT is STDIN
Summary: [review] M::SA::ArchiveIterator::scan_file() eats headers, rendering them los...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 3.1.4
Hardware: All All
: P5 normal
Target Milestone: 3.1.8
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard: can be commited
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-24 07:42 UTC by Magnus Holmgren
Modified: 2006-12-11 11:37 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
suggested patch patch None Theo Van Dinter [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Magnus Holmgren 2006-10-24 07:42:13 UTC
If $self->{determine_receive_date} is true (which there is four ways it can 
be), scan_file() will read the headers to feed them to 
M::SA::Util::receive_date(). Unfortunately, this causes them to be lost if a 
target is '-' (stdin).

Easiest fix would probably be always to write standard input to a temporary 
file in sa-learn (not only if mb[o]x format is specified).

(By the way, what if you run

$ sa-learn --spam file1 :mbox:- < mbox1

? The code in sa-learn only checks $targets[0].)
Comment 1 Richard van der Hoff 2006-11-13 00:51:24 UTC
I'd just like to put in a big vote for this bug. It bites me time after time. It
would be better for sa-learn to refuse to work from stdin than have the current,
broken, behaviour.
Comment 2 Theo Van Dinter 2006-12-02 20:54:09 UTC
(In reply to comment #0)
> If $self->{determine_receive_date} is true (which there is four ways it can 
> be), scan_file() will read the headers to feed them to 
> M::SA::Util::receive_date(). Unfortunately, this causes them to be lost if a 
> target is '-' (stdin).

Hrm.  Yeah, AI doesn't handle STDIN so well.  It really expects to get a list of
files and directories, STDIN/"-" is this extra hackish thing mostly because perl
will accept "-" for open().

> Easiest fix would probably be always to write standard input to a temporary 
> file in sa-learn (not only if mb[o]x format is specified).

I can live with that.

> $ sa-learn --spam file1 :mbox:- < mbox1
> ? The code in sa-learn only checks $targets[0].)

Part of me would want to tell you to just run "sa-learn --spam file1
:mbox:mbox1" which will work fine.

So IMO what should happen is:

a) go through all the targets and look for "-"
b) the first time "-" is found, generate a temp file and shove STDIN into it
c) if "-" is found more than once, remove it from the target list

I'm tempted to remove the current hackish STDIN/"-" support from ArchiveIterator
and just make it only deal with proper files and directories, perhaps even
punting when it gets "-".

I'll see if I can get a patch done up this week.
Comment 3 Theo Van Dinter 2006-12-03 12:50:08 UTC
Created attachment 3760 [details]
suggested patch

Ok, this puts a more comprehensive STDIN handler in spamassassin and sa-learn
as per my previous comment.  It also modified ArchiveIterator slightly to
specifically skip "-" and not do any special handling.

(I thought about having AI modify the target listing, but since a random new
temp is generated, the caller needs to know about it, so the caller really has
to deal with this, hence AI now explicitly not dealing with it.)
Comment 4 Daryl C. W. O'Shea 2006-12-08 16:27:12 UTC
Looks good, tested.

+1
Comment 5 Doc Schneider 2006-12-08 16:50:11 UTC
+1
Comment 6 Daryl C. W. O'Shea 2006-12-11 11:37:56 UTC
[dos@FC5-VPC 3.1]$ svn ci -m "bug 5145: better deal with STDIN in spamassassin
and sa-learn, since ArchiveIterator doesn't deal with it so well"
Sending        lib/Mail/SpamAssassin/ArchiveIterator.pm
Sending        sa-learn.raw
Sending        spamassassin.raw
Transmitting file data ...
Committed revision 485836.