Bug 6439 - Extend the meaning of "textual parts" like MUAs handle it
Summary: Extend the meaning of "textual parts" like MUAs handle it
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: 3.2.5
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
Depends on:
Reported: 2010-05-28 18:30 UTC by Karsten Bräckelmann
Modified: 2012-09-27 18:46 UTC (History)
2 users (show)

Attachment Type Modified Status Actions Submitter/CLA Status
testcase application/octet-stream None Karsten Bräckelmann [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Karsten Bräckelmann 2010-05-28 18:30:11 UTC
"[...] is the textual parts of the message body; any non-text MIME
  parts are stripped [...]"  -- M::SA::Conf

The MUA happily will show the attached text based on the file name extension, but the bloody Content-Type prevents SA from treating it as a textual part of the message.

  Content-Type: application/octet-stream; name="foo.txt"

SA should treat the attached text just like any other textual part with a correct MIME Content-Type set, render it, use it for rules and Bayes, just like an MUA.
Comment 1 Kevin A. McGrail 2011-10-29 04:52:16 UTC
Can you add an example email in mbox format so I can test this in various MUAs?
Comment 2 Karsten Bräckelmann 2011-10-29 23:38:44 UTC
Created attachment 5001 [details]
Comment 3 Karsten Bräckelmann 2011-10-29 23:41:38 UTC
(In reply to comment #1)
> Can you add an example email in mbox format so I can test this in various MUAs?

Sure, see attachment 5001 [details].

Testcase of two trivial multipart/mixed MIME messages, with a text/plain and application/octet-stream attachment respectively. Other than the second MIME part's Content-Type (and Subject), both messages are identical.

This masquerading technique is used by 419 scammers, to get the actual text past a content scanner. Body rules, as well as Bayes should be affected. The topic has been discussed on the users list a few times.

It appears MUAs in wide-spread use (and most likely web-mail interfaces, too) will happily show the content of the attachment based on the file extension or content sniffing, even with binary-indicating Content-Type.

Trivial testcase ad-hoc body rule:

  spamassassin --cf="body BUG_6439 /^Plain .*/" -D  < MSG
   2>&1 | grep BUG_6439

Only the text/plain variant will hit the body rule, and the debug output of that greedy regex match will print the actual payload line in full. The application/octet-stream variant will not hit that rule.
Comment 4 Rob Janssen 2012-09-27 18:46:43 UTC
Maybe new case (don't know if this is covered by what is written above):

Mime part like this:

Content-Type: application/octet-stream; name="Vordering.html"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="Vordering.html"

Message also has a textual part that says "please read the attachment" or similar.
The spam is in the html part that shows as an attachment in a mail program and when opened shows in a browser.
But the content of this base64 encoded part cannot be examined with rawbody or body patterns.