Bug 6928 - Add sa-learn option to learn from RFC822 attachment to message rather than full message
Summary: Add sa-learn option to learn from RFC822 attachment to message rather than fu...
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 enhancement
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
Depends on:
Reported: 2013-04-20 21:47 UTC by John Hardin
Modified: 2013-04-20 21:47 UTC (History)
0 users

Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description John Hardin 2013-04-20 21:47:33 UTC
It's a fairly common practice to have users forward misclassified emails to a training mailbox address as RFC-822 attachments.

If the site admin doesn't know to (or how to) extract these attachments and instead learns from the raw training mailbox, the training won't be correct - it will include the local forward headers, and if a given spam is addressed to multiple recipients it would be learned once for each forwarded copy.

It would be much easier in this situation to have a command-line option (perhaps --attachment) to tell sa-learn to extract and learn from an RFC-822 attachment to the message being provided (if present) rather than from the whole message.

sa-learn already unwraps attachments that are present due to SA markup with report_safe = 1. At first glance it looks like it would be pretty easy to implement by adding another clause here in remove_spamassassin_markup looking for Content-Type = message/rfc822 with no other qualifiers, but only if the command line option were provided:

        # Ok, we found the encapsulated piece ...
	if ($ct =~ m@^(?:message/rfc822|text/plain);\s+x-spam-type=original@ ||
	    ($ct eq "message/rfc822" &&
	     $cd eq $self->{conf}->{'encapsulated_content_description'}))

...maybe something like:

   || ($ct eq "message/rfc822" && defined(@self->{conf}->{'opt_extract_attachment'}))

This solution wouldn't work for a forwarded message having SA markup using report_safe = 1, though. That would require two unwraps.