SA Bugzilla – Bug 317
message/rfc822 Spam encapsulation patch
Last modified: 2002-12-27 15:11:06 UTC
A while ago we talked about the possibility of having the original spam encapsulated into a message/rfc822 MIME part, with the spam report in its own text/plain part (thereby completely preserving the original spam and making it easy to take out if it was a false positive). I have hacked together an implementation of this to use where I work, and it has worked out rather well (employees no longer bug me about re-assembling messages if a mail with an attachment was incorrectly tagged as spam). Unfortunately the patch is against the 2.11 release, as I don't have time to install 2.20 and redo the fixes. This path introduces a new config variable called rfc822_message (which by default is set to 1, setting to 0 uses the "old" mode of rewriting an existing message), and also extends NoMailAudit.pm to provide a way to re-inject a completely new message into an existing NoMailAudit object. This patch does require the Mail::Tools package, but since you plan on migrating to using that anyways... I will attach the patch and a screenshot of the sample spam to show how it works in Outlook.
Created attachment 112 [details] rfc822_message patch
Created attachment 113 [details] Example of viewing a spam in Outlook
Thanks, I'll take a look at this hopefully Friday, when I plan on doing the MIME::Tools stuff.
assigning to Craig
Ack. I hate it when I forget about promises I've made. Thanks for the reminder Dan
assigning to list
Ryan, Can you update your patch for SpamAssassin current CVS? I'd like to get this into 2.50. Really, I mean it. Ultimately, I want to have two separate modules. One for reading the message and another for outputing the new message. Completely separated. Output will be completely MIME based. Input will just make data fast and easy to read from tests and trim SA tags -if- a flag is set. That way, spammers won't be able to bypass SA by disguishing headers. In other words, if you want to start down that path or at least keep it in mind as you update for the current CVS tree, I would not object. :-) (reassigning to myself)
Ryan, Could you get back to me? I'm tempted to wade through the diff and port it to current CVS, but it's always easier to get the original author (who knows the code) to do it. ;-) I haven't mentioned it on the list yet, but I suspect that we will end up making make this the default and the only supported markup method. (See bug 1039.) Thanks.
Sorry, I've been kinda busy lately. I will try to look at the new CVS code in the next few days and get my patch in.
Question... do we want to make rfc822_message default? If we do, the requirements must be changed to use MIME::Tools, whereas if it's in the code, but not on be default, MIME::Tools will only be required if the sysadmin chooses to turn it on. I personally think it'll be worth it, not messing with the original message at all...
doh, probably should have read your full messages... I'll flush out the Conf.pm file to document what will and will not be used when in rfc822_message mode (IE, things like defang_mime, obviously)
Created attachment 407 [details] spamassassin.against-cvs20021015-1145PST.patch
Okay, that was quick :) As it turns out, doing the patch correctly was more difficult than I thought. I removed "old" spam reporting support (IE, rewriting the original message) because it was getting too difficult to manage, and replaced it with my code, chopping the size of the sub in more than half. Here's what was done: Removed defang_mime and report_header configuration options. Removed report_header test. Changed use_terse_report default from 1 to 0 Changed README to reflect new changes Added replace_original_message sub to NoMailAudit.pm Added MIME::Entity requirement to Makefile.PL Changed rewrite_as_spam sub in PerMsgStatus.pm to encapsulate spam Changed wording of verbose report in rules file to reflect new format This patch was taken against a CVS checkout at 11:45AM PST today; I don't think anything changed since then. Please take a look at the patch, hopefully we can get this in in time for 2.50. If you want, I have a sourceforge account (ryanf), so I could do the CVS checkin, with your approval. Have fun!
I'll take a look at it tonight. I don't suppose you could attach any future revisions as unified diffs ("diff -u") to make them easier to read? Also, should we try to do this without the external requirement on MIME::Tools? We've been trying to avoid dependency requirements, if constructing a MIME message is too complicated
Created attachment 408 [details] spamassassin.against-cvs20021015-1145PST.unified.patch
Here we go. As for using MIME::Tools... Yes, at this moment, we could assemble the encapsulation by hand, but in the future it would be very benefitial to do things like examination using MIME::Tools. Would you prefer not using it in the 2.x series, but introducing it in the 3 rewrite?
Subject: Re: message/rfc822 Spam encapsulation patch ryan@finnie.org writes: > Here we go. Thanks. > As for using MIME::Tools... Yes, at this moment, we could assemble the > encapsulation by hand, but in the future it would be very benefitial to do > things like examination using MIME::Tools. Would you prefer not using it in > the 2.x series, but introducing it in the 3 rewrite? I'd probably ask the SAdev mailing list. If it would be hard or ugly to do without MIME::Tools, I'm fine with requiring MIME::Tools for the 2.x series.
Subject: Re: message/rfc822 Spam encapsulation patch > I'd probably ask the SAdev mailing list. If it would be hard or ugly > to do without MIME::Tools, I'm fine with requiring MIME::Tools for the > 2.x series. Actually, it's relatively easy to do without MIME::Tools for what I used it for; I'm working right now to do it manually, and will post an updated patch in a few minutes. However, I'd recommend using MIME::Tools as a basis for reading/writing mail in the 3 series.
Odd, I replied via email to that last one, and bugzilla thought I was Bryan Higgins. Very odd...
Created attachment 411 [details] spamassassin.against-cvs20021015-1145PST.nomime.unified.patch
Okay, latest patch does not use MIME::Tools.
Okay, I started working on the latest patch. Thanks. Some questions (for Ryan and anyone else watching this bug). 1. Why does the patch remove the "From " line if one is present? 2. Should we attach the SA report as "7bit" instead of "binary"? (Also, "8bit" seems to be the correct name, not "binary".) 3. Should we encode the spam as quoted-printable instead of "binary" (or "8bit", whatever). base64 encoding does not seem like the way to go. My only concern about quoted-printable is that re-encoding quoted-printable looks pretty nasty. In addition, it looks like get_full_message_as_text() does not return the pristine original message due to some \r munging in NoMailAudit.pm. I added a variable to keep the original headers and a method to get at it. Feel free to comment on this. I think my biggest worry here is Razor and other exact checksum databases. They must ignore some headers, although, I'm not sure if they ignore ALL headers, which would imply that we may not be getting all of the bang out of Razor and friends if some of those \r bytes are significant in the checksums.
How about this as a solution for #3. A. Add a new configuration setting: "spam_encoding", allowable values (these are working names subject to change) 1. "quoted-printable" - use quoted-printable encoding 2. "base64" - use base64 encoding 3. "8bit" - send the message as is. 4. "minimal" - encode with minimal necessary encoding to be safe; this would be the default and here is how it would work: if (7-bit, all text, no control characters, no \r, no long lines, etc.) then do not encode and use 7bit elsif (mostly garbage and quoted-printable would waste a lot of space) then encode as base64 else then encode as quoted-printable Thoughts? Whatever the solution to #3, I'm not going to wait on that particular functionality before I check this into SA. It can be a separate bug report once this is done.
1. The opening "From " is not actually part of the mime standard. Leaving it in will breaks about 20% of the mail clients out there. 2. The Content-Transfer-Encoding was added by MIME::Entity, which I just copied without thought for doing it manually. I don't see why 7bit wouldn't work in the report. 3. Actually, that Content-Transfer-Encoding entry for the spam part shouldn't even be there; message/rfc822 parts should not have altered content in them, they should be verbatim from the attached message. I'm not sure if something like base64 encoding the part would work, never tested that. But what you really need to do is just blindly insert the original message in that part. Actually, you probably should search for ----------=_$timestamp-$pid-0 in the original message before blindly inserting, to check for mime part separator collision. But the chances of that happening are nearly infitecimal.
New version of patch follows. Here are the major changes: - added back t/reportheader.t and fix it so it passes - README word-smithing - change rewrite_subject to 1 (we can put anything we want in the header) - add $self->{headers_pristine} and &get_pristine to NoMailAudit.pm to allow original message to be attached - remove "SPAM: " prefix and start/end lines from report - also reuse "Cc" and "Date" headers from spam if present - if content-type is unsafe, then Content-Disposition: of the message/rfc822 MIME part is "attachment" and we add a warning, otherwise it is "inline" - minor tweaks to MIME formatting > Actually, that Content-Transfer-Encoding entry for the spam part shouldn't > even be there; message/rfc822 parts should not have altered content in them, > they should be verbatim from the attached message. I'm not sure if something > like base64 encoding the part would work, never tested that. But what you > really need to do is just blindly insert the original message in that part. Most of my non-spam corpus had more complete MIME headers. I reviewed the relevant RFCs, looked at what's usually done, made a few changes, and ended up with this: +------------=_$timestamp-$$-0 +Content-Type: text/plain +Content-Disposition: inline +Content-Transfer-Encoding: 7bit + +$report +------------=_$timestamp-$$-0 +Content-Type: message/rfc822 +Content-Description: original message +Content-Disposition: $disposition +Content-Transfer-Encoding: 8bit + +$messagestring +------------=_$timestamp-$$-0-- > Actually, you probably should search for ----------=_$timestamp-$pid-0 in the > original message before blindly inserting, to check for mime part separator > collision. But the chances of that happening are nearly infitecimal. Maybe later...
Created attachment 412 [details] updated patch
Dan, what's the status on this ticket?
bug 478 talks about adding to rewrites, but the goal is to just get encapsulation working for spam, so I'm blocking 478 with 317.
> Dan, what's the status on this ticket? Okay, it has been applied to the CVS tree. I'll post a note to SAdev with more information.
this is done thanks for the contribution, Ryan