317 – message/rfc822 Spam encapsulation patch

Bug 317 - message/rfc822 Spam encapsulation patch

Summary: message/rfc822 Spam encapsulation patch

Status:	RESOLVED FIXED

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	Libraries (show other bugs)
Version:	2.11
Hardware:	Other other

Importance:	P2 enhancement
Target Milestone:	2.50
Assignee:	Daniel Quinlan

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	478 885 1039 1264 1265
	Show dependency tree

Reported:	2002-05-14 15:05 UTC by Ryan Finnie
Modified:	2002-12-27 15:11 UTC (History)
CC List:	1 user (show)

Attachment	Type	Actions	Submitter/CLA Status
rfc822_message patch	patch	None	Ryan Finnie
Example of viewing a spam in Outlook	image/gif	None	Ryan Finnie
spamassassin.against-cvs20021015-1145PST.patch	patch	None	Ryan Finnie
spamassassin.against-cvs20021015-1145PST.unified.patch	patch	None	Ryan Finnie
spamassassin.against-cvs20021015-1145PST.nomime.unified.patch	patch	None	Ryan Finnie
updated patch	patch	None	Daniel Quinlan
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ryan Finnie 2002-05-14 15:05:51 UTC

A while ago we talked about the possibility of having the original spam 
encapsulated into a message/rfc822 MIME part, with the spam report in its own 
text/plain part (thereby completely preserving the original spam and making it 
easy to take out if it was a false positive).  I have hacked together an 
implementation of this to use where I work, and it has worked out rather well 
(employees no longer bug me about re-assembling messages if a mail with an 
attachment was incorrectly tagged as spam).

Unfortunately the patch is against the 2.11 release, as I don't have time to 
install 2.20 and redo the fixes.  This path introduces a new config variable 
called rfc822_message (which by default is set to 1, setting to 0 uses 
the "old" mode of rewriting an existing message), and also extends 
NoMailAudit.pm to provide a way to re-inject a completely new message into an 
existing NoMailAudit object.  This patch does require the Mail::Tools package, 
but since you plan on migrating to using that anyways...

I will attach the patch and a screenshot of the sample spam to show how it 
works in Outlook.

Comment 1 Ryan Finnie 2002-05-14 15:06:26 UTC

Created attachment 112 [details]
rfc822_message patch

Comment 2 Ryan Finnie 2002-05-14 15:06:58 UTC

Created attachment 113 [details]
Example of viewing a spam in Outlook

Comment 3 Craig Hughes 2002-05-14 16:09:24 UTC

Thanks, I'll take a look at this hopefully Friday, when I plan on doing the MIME::Tools stuff.

Comment 4 Daniel Quinlan 2002-06-05 20:52:52 UTC

assigning to Craig

Comment 5 Craig Hughes 2002-06-06 19:15:12 UTC

Ack.  I hate it when I forget about promises I've made.  Thanks for the reminder Dan

Comment 6 Daniel Quinlan 2002-08-17 22:16:28 UTC

assigning to list

Comment 7 Daniel Quinlan 2002-09-20 01:25:51 UTC

Ryan,

Can you update your patch for SpamAssassin current CVS?  I'd like to get this
into 2.50.  Really, I mean it.

Ultimately, I want to have two separate modules.  One for reading the message
and another for outputing the new message.  Completely separated.

Output will be completely MIME based.  Input will just make data fast and easy
to read from tests and trim SA tags -if- a flag is set.  That way, spammers
won't be able to bypass SA by disguishing headers.

In other words, if you want to start down that path or at least keep it in
mind as you update for the current CVS tree, I would not object.  :-)

(reassigning to myself)

Comment 8 Daniel Quinlan 2002-10-15 04:22:00 UTC

Ryan,

Could you get back to me?  I'm tempted to wade through the diff and port it
to current CVS, but it's always easier to get the original author (who knows
the code) to do it.  ;-)

I haven't mentioned it on the list yet, but I suspect that we will end up
making make this the default and the only supported markup method.  (See
bug 1039.)

Thanks.

Comment 9 Ryan Finnie 2002-10-15 09:42:19 UTC

Sorry, I've been kinda busy lately.  I will try to look at the new CVS code in 
the next few days and get my patch in.

Comment 10 Ryan Finnie 2002-10-15 10:14:23 UTC

Question...  do we want to make rfc822_message default?  If we do, the 
requirements must be changed to use MIME::Tools, whereas if it's in the code, 
but not on be default, MIME::Tools will only be required if the sysadmin 
chooses to turn it on.  I personally think it'll be worth it, not messing with 
the original message at all...

Comment 11 Ryan Finnie 2002-10-15 10:58:33 UTC

doh, probably should have read your full messages...  I'll flush out the 
Conf.pm file to document what will and will not be used when in rfc822_message 
mode (IE, things like defang_mime, obviously)

Comment 12 Ryan Finnie 2002-10-15 14:37:28 UTC

Created attachment 407 [details]
spamassassin.against-cvs20021015-1145PST.patch

Comment 13 Ryan Finnie 2002-10-15 14:43:12 UTC

Okay, that was quick :)

As it turns out, doing the patch correctly was more difficult than I thought.  
I removed "old" spam reporting support (IE, rewriting the original message) 
because it was getting too difficult to manage, and replaced it with my code, 
chopping the size of the sub in more than half.

Here's what was done:
Removed defang_mime and report_header configuration options.
Removed report_header test.
Changed use_terse_report default from 1 to 0
Changed README to reflect new changes
Added replace_original_message sub to NoMailAudit.pm
Added MIME::Entity requirement to Makefile.PL
Changed rewrite_as_spam sub in PerMsgStatus.pm to encapsulate spam
Changed wording of verbose report in rules file to reflect new format

This patch was taken against a CVS checkout at 11:45AM PST today; I don't think 
anything changed since then.

Please take a look at the patch, hopefully we can get this in in time for 
2.50.  If you want, I have a sourceforge account (ryanf), so I could do the CVS 
checkin, with your approval.

Have fun!

Comment 14 Daniel Quinlan 2002-10-15 16:04:33 UTC

I'll take a look at it tonight.  I don't suppose you could attach any future
revisions as unified diffs ("diff -u") to make them easier to read?

Also, should we try to do this without the external requirement on MIME::Tools?
We've been trying to avoid dependency requirements, if constructing a MIME
message is too complicated

Comment 15 Ryan Finnie 2002-10-15 16:11:15 UTC

Created attachment 408 [details]
spamassassin.against-cvs20021015-1145PST.unified.patch

Comment 16 Ryan Finnie 2002-10-15 16:13:54 UTC

Here we go.

As for using MIME::Tools...  Yes, at this moment, we could assemble the 
encapsulation by hand, but in the future it would be very benefitial to do 
things like examination using MIME::Tools.  Would you prefer not using it in 
the 2.x series, but introducing it in the 3 rewrite?

Comment 17 Daniel Quinlan 2002-10-15 16:29:23 UTC

Subject: Re:  message/rfc822 Spam encapsulation patch

ryan@finnie.org writes:

> Here we go.

Thanks.

> As for using MIME::Tools...  Yes, at this moment, we could assemble the 
> encapsulation by hand, but in the future it would be very benefitial to do 
> things like examination using MIME::Tools.  Would you prefer not using it in 
> the 2.x series, but introducing it in the 3 rewrite?

I'd probably ask the SAdev mailing list.  If it would be hard or ugly
to do without MIME::Tools, I'm fine with requiring MIME::Tools for the
2.x series.

Comment 18 Bryan Higgins 2002-10-15 16:42:48 UTC

Subject: Re:  message/rfc822 Spam encapsulation patch

> I'd probably ask the SAdev mailing list.  If it would be hard or ugly
> to do without MIME::Tools, I'm fine with requiring MIME::Tools for the
> 2.x series.

Actually, it's relatively easy to do without MIME::Tools for what I used 
it for; I'm working right now to do it manually, and will post an updated 
patch in a few minutes.  However, I'd recommend using MIME::Tools as a 
basis for reading/writing mail in the 3 series.

Comment 19 Ryan Finnie 2002-10-15 16:45:10 UTC

Odd, I replied via email to that last one, and bugzilla thought I was Bryan 
Higgins.  Very odd...

Comment 20 Ryan Finnie 2002-10-15 16:50:38 UTC

Created attachment 411 [details]
spamassassin.against-cvs20021015-1145PST.nomime.unified.patch

Comment 21 Ryan Finnie 2002-10-15 16:51:03 UTC

Okay, latest patch does not use MIME::Tools.

Comment 22 Daniel Quinlan 2002-10-18 20:40:26 UTC

Okay, I started working on the latest patch.  Thanks.

Some questions (for Ryan and anyone else watching this bug).

1. Why does the patch remove the "From " line if one is present?

2. Should we attach the SA report as "7bit" instead of "binary"?
   (Also, "8bit" seems to be the correct name, not "binary".)

3. Should we encode the spam as quoted-printable instead of "binary" (or "8bit",
   whatever).  base64 encoding does not seem like the way to go.  My only
   concern about quoted-printable is that re-encoding quoted-printable looks
   pretty nasty.

In addition, it looks like get_full_message_as_text() does not return the
pristine original message due to some \r munging in NoMailAudit.pm.  I added
a variable to keep the original headers and a method to get at it.  Feel free
to comment on this.  I think my biggest worry here is Razor and other exact
checksum databases.  They must ignore some headers, although, I'm not sure if
they ignore ALL headers, which would imply that we may not be getting all of
the bang out of Razor and friends if some of those \r bytes are significant
in the checksums.

Comment 23 Daniel Quinlan 2002-10-18 21:00:06 UTC

How about this as a solution for #3.

A. Add a new configuration setting: "spam_encoding", allowable values
   (these are working names subject to change)

  1. "quoted-printable" - use quoted-printable encoding
  2. "base64" - use base64 encoding
  3. "8bit" - send the message as is.
  4. "minimal" - encode with minimal necessary encoding to be safe;
      this would be the default and here is how it would work:
      if (7-bit, all text, no control characters, no \r, no long lines, etc.)
        then do not encode and use 7bit
      elsif (mostly garbage and quoted-printable would waste a lot of space)
        then encode as base64
      else
        then encode as quoted-printable

Thoughts?

Whatever the solution to #3, I'm not going to wait on that particular
functionality before I check this into SA.  It can be a separate bug report
once this is done.

Comment 24 Ryan Finnie 2002-10-18 21:09:43 UTC

1. The opening "From " is not actually part of the mime standard.  Leaving it 
in will breaks about 20% of the mail clients out there.

2. The Content-Transfer-Encoding was added by MIME::Entity, which I just copied 
without thought for doing it manually.  I don't see why 7bit wouldn't work in 
the report.

3. Actually, that Content-Transfer-Encoding entry for the spam part shouldn't 
even be there; message/rfc822 parts should not have altered content in them, 
they should be verbatim from the attached message.  I'm not sure if something 
like base64 encoding the part would work, never tested that.  But what you 
really need to do is just blindly insert the original message in that part.

Actually, you probably should search for ----------=_$timestamp-$pid-0 in the 
original message before blindly inserting, to check for mime part separator 
collision.  But the chances of that happening are nearly infitecimal.

Comment 25 Daniel Quinlan 2002-10-19 03:43:25 UTC

New version of patch follows.  Here are the major changes:

- added back t/reportheader.t and fix it so it passes
- README word-smithing
- change rewrite_subject to 1 (we can put anything we want in the header)
- add $self->{headers_pristine} and &get_pristine to NoMailAudit.pm to
  allow original message to be attached
- remove "SPAM: " prefix and start/end lines from report
- also reuse "Cc" and "Date" headers from spam if present
- if content-type is unsafe, then Content-Disposition: of the
  message/rfc822 MIME part is "attachment" and we add a warning,
  otherwise it is "inline"
- minor tweaks to MIME formatting

> Actually, that Content-Transfer-Encoding entry for the spam part shouldn't 
> even be there; message/rfc822 parts should not have altered content in them, 
> they should be verbatim from the attached message.  I'm not sure if something 
> like base64 encoding the part would work, never tested that.  But what you 
> really need to do is just blindly insert the original message in that part.

Most of my non-spam corpus had more complete MIME headers.  I reviewed the
relevant RFCs, looked at what's usually done, made a few changes, and ended
up with this:

+------------=_$timestamp-$$-0
+Content-Type: text/plain
+Content-Disposition: inline
+Content-Transfer-Encoding: 7bit
+
+$report
+------------=_$timestamp-$$-0
+Content-Type: message/rfc822
+Content-Description: original message
+Content-Disposition: $disposition
+Content-Transfer-Encoding: 8bit
+
+$messagestring
+------------=_$timestamp-$$-0--

> Actually, you probably should search for ----------=_$timestamp-$pid-0 in the 
> original message before blindly inserting, to check for mime part separator 
> collision.  But the chances of that happening are nearly infitecimal.

Maybe later...

Comment 26 Daniel Quinlan 2002-10-19 03:46:24 UTC

Created attachment 412 [details]
updated patch

Comment 27 Theo Van Dinter 2002-12-21 21:42:45 UTC

Dan, what's the status on this ticket?

Comment 28 Theo Van Dinter 2002-12-21 21:44:36 UTC

bug 478 talks about adding to rewrites, but the goal is to just get 
encapsulation working for spam, so I'm blocking 478 with 317.

Comment 29 Daniel Quinlan 2002-12-22 04:19:31 UTC

> Dan, what's the status on this ticket?

Okay, it has been applied to the CVS tree.  I'll post a note to SAdev
with more information.

Comment 30 Daniel Quinlan 2002-12-28 00:11:06 UTC

this is done
thanks for the contribution, Ryan