Bug 5127 - mimeheader rules cannot match newlines
Summary: mimeheader rules cannot match newlines
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 major
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-13 05:27 UTC by Justin Mason
Modified: 2006-10-18 03:56 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
patch to add tests to mimeheader.t patch None Justin Mason [HasCLA]
fix draft patch None Justin Mason [HasCLA]
fix r2 patch None Justin Mason [HasCLA]
fix r3 patch None Justin Mason [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2006-10-13 05:27:38 UTC
try adding these tests to t/mimeheader.t:


  mimeheader MATCH_NL       Content-Type =~ /msword;\n\tname/
  mimeheader MATCH_NL_RAW   Content-Type:raw =~ /msword;\n\tname/

they'll never match, because the Message/Node.pm code modifies the header
strings -- even for ":raw" -- to sanitise whitespace.
Comment 1 Justin Mason 2006-10-13 05:29:00 UTC
Created attachment 3714 [details]
patch to add tests to mimeheader.t

here's a patch that adds those 2 failing test cases to the 't' script,
in TDD style ;)
Comment 2 Justin Mason 2006-10-13 05:40:27 UTC
Created attachment 3715 [details]
fix draft

well, that seemed easy!  Theo, what have I broken? ;)
Comment 3 Theo Van Dinter 2006-10-13 08:13:58 UTC
(In reply to comment #2)
> Created an attachment (id=3715) [edit]
> fix draft
> 
> well, that seemed easy!  Theo, what have I broken? ;)

:)   Well, I would have to think and prod about in the code for a bit to answer
that.  Just remember that the header and raw_header functions are for *all*
headers, both the message headers and the MIME part headers.

In a quick run through, I don't like the Message.pm changes -- the leading and
trailing whitespace is generally irrelevant (the type and amount is ignorable,
according to the RFC).  As for raw() including all whitespace...  I'm not
necessarily opposed, though header vs raw_header was modeled after body vs
rawbody, and it's all about encoding.

Also, your Message::Node change breaks the current 3.x described API:

"Will have a newline at the end and will be unfolded."
Comment 4 Justin Mason 2006-10-13 08:56:19 UTC
'As for raw() including all whitespace...  I'm not necessarily opposed, though
header vs raw_header was modeled after body vs rawbody, and it's all about
encoding.'

Well, in the body-vs-rawbody case, there's also 'full', which offers the
pristine string.  I suppose I could use that here, but it really would be quite
messy, and in my opinion this behaviour (ie. not allowing matching against
the folding whitespace) is counter-intuitive and a bug anyway!

(by the way, this is cropping up because a certain newline-whitespace sequence
in a folded MIME part header is a spam sign, and I want to match it.  I think
this is likely to be quite a common situation.)


'Also, your Message::Node change breaks the current 3.x described API:
"Will have a newline at the end and will be unfolded."'

Argh.  OK, I've got a new patch that fixes that, by going direct to the
raw_header() API instead from MIMEHeader.pm.

I've also decided that you're generally right ;)  It only preserves
this whitespace for the raw_header hash; the header(), get_header() and
{header} data is all back to what it was in SVN trunk, with folding
newline/whitespace sanitized.
Comment 5 Justin Mason 2006-10-13 08:57:51 UTC
Created attachment 3716 [details]
fix r2

a new rev.

BTW, note that the s/// statements have simply moved from Message.pm to
Node.pm; that's the only place that data is used, since the Message.pm code
immediately calls the Node.pm code.
Comment 6 Justin Mason 2006-10-13 09:33:51 UTC
Created attachment 3717 [details]
fix r3

oops!  found a bug in that last one; the decoding algorithm wasn't coming out
as trunk-compatible. this one is...
Comment 7 Daryl C. W. O'Shea 2006-10-13 11:30:48 UTC
Related to the leading whitespace in :raw headers, I opened bug 5116 after Alex
Broens asked jm and I about matching whitespace in Date headers as often seen in
spam.
Comment 8 Justin Mason 2006-10-18 03:56:09 UTC
ok, taking silence as assent, I've checked this in (with a fix in the test
script and a line or two of POD documentation).

: jm 292...; svn commit -m "bug 5127: allow mimeheader :raw rules to match
newlines and folded-header whitespace  in MIME header strings"
lib/Mail/SpamAssassin/Message.pm lib/Mail/SpamAssassin/Message/Node.pm
lib/Mail/SpamAssassin/Plugin/MIMEHeader.pm t/mimeheader.t
Sending        lib/Mail/SpamAssassin/Message/Node.pm
Sending        lib/Mail/SpamAssassin/Message.pm
Sending        lib/Mail/SpamAssassin/Plugin/MIMEHeader.pm
Sending        t/mimeheader.t
Transmitting file data ....
Committed revision 465206.