Bug 3069 - non-text part inside of forwarded message included in "body"
Summary: non-text part inside of forwarded message included in "body"
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P3 normal
Target Milestone: 3.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 3271 (view as bug list)
Depends on:
Blocks: 3208
  Show dependency tree
 
Reported: 2004-02-19 19:28 UTC by Daniel Quinlan
Modified: 2005-09-29 09:31 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
example message text/plain None Daniel Quinlan [HasCLA]
rendered body text/plain None Daniel Quinlan [HasCLA]
apple mail window showing what gets displayed application/octet-stream None Theo Van Dinter [HasCLA]
Thunderbird screenshot (0.5) image/png None Jesse Houwing [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Quinlan 2004-02-19 19:28:23 UTC
When someone forwards a message as an message/rfc822 attachment and that
attachment includes a binary part like application/* or something else,
the base64 text of that part plus the message headers are being rendered
as part of the body and returned through to get_rendered_body_text_array()
so some false positives for body tests are happening.

I'll attach an example message (replaced some confidential content, but the
rendering result is the same and my MUA handles it as a message attachment
just fine and even will save off the word document which has the right file
magic, but that's about so don't try opening it in word for real).
Comment 1 Daniel Quinlan 2004-02-19 19:28:49 UTC
Created attachment 1790 [details]
example message
Comment 2 Daniel Quinlan 2004-02-19 19:32:04 UTC
Created attachment 1791 [details]
rendered body

Note: the results are the same whether or not the message is an inline
or an attachment.
Comment 3 Theo Van Dinter 2004-03-06 17:45:45 UTC
yeah, this is from the conscious decision to include "message/*" parts in the 
standard checks.

imo, we either need to accept this possibility, or stop including message/* 
parts.  since, as far as we've been able to tell, only apple mail seems to 
display message/* attachments inline with the actual message, I'd say we should 
stop including those parts.
Comment 4 Daniel Quinlan 2004-03-06 17:50:27 UTC
> since, as far as we've been able to tell, only apple mail seems to 
> display message/* attachments inline with the actual message, I'd say we should 
> stop including those parts.

Well, we don't really render message/* attachments at all like Apple Mail.
Apple Mail treats them like a message, not like a text attachment.

My concern with doing nothing with them is that spammers could EASILY start
sending mail that says "Subject: forwarded message", includes a forwarded
message that we entirely skipped, and then someone has to open the attachment
to see ... a spam.

I know we usually try to closely simulate the rendering behavior of common
MUAs, but I think we need to also think about easy exploits like this one.
So, while Apple Mail does the wrong thing, I think it is (well, would be)
the right thing for us to render message attachments like Apple Mail.
Comment 5 Jesse Houwing 2004-03-06 18:19:01 UTC
Thunderbird and Mozilla mail have the option to show attachments inline. I
always have it on.
Comment 6 Theo Van Dinter 2004-03-06 22:05:32 UTC
ok, committed code to parse message/* parts into a subtree.

r7037
Comment 7 Daniel Quinlan 2004-03-06 22:13:01 UTC
Jesse,

Can you do two things for us?

1. Figure out which headers from message/rfc822 Mozilla displays inline by
   default when the option is turned on (To, Cc, From, Subject, Date, ... ?)
2. paste one screen shot of such a message (perhaps one with some HTML in the
   inner message)

Likewise, if any Apple Mail users are watching, the same two things would be
helpful.  Thanks.
Comment 8 Theo Van Dinter 2004-03-06 22:25:27 UTC
Subject: Re:  non-text part inside of forwarded message included in "body"

On Sat, Mar 06, 2004 at 10:13:02PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> 1. Figure out which headers from message/rfc822 Mozilla displays inline by
>    default when the option is turned on (To, Cc, From, Subject, Date, ... ?)
> 2. paste one screen shot of such a message (perhaps one with some HTML in the
>    inner message)
> 
> Likewise, if any Apple Mail users are watching, the same two things would be
> helpful.  Thanks.

Since you asked... :)

Apple Mail shows the normal message, then "From", "Date", "To", and
"Subject" of the attached message, then strangely, the full attached
message as plain text.

Will attach a PDF of the basics shortly.

Comment 9 Theo Van Dinter 2004-03-06 22:27:20 UTC
Created attachment 1818 [details]
apple mail window showing what gets displayed
Comment 10 Jesse Houwing 2004-03-07 06:04:03 UTC
Created attachment 1819 [details]
Thunderbird screenshot (0.5)

Thunderbird shows the following inline:

Subject:   Re: Yahoogroups Spamassassin rules
From:	   removed <removed@utwente.nl>
Date:	   Tue, 02 Mar 2004 11:31:02 +0100
To:	   "Jesse Houwing" <removed@removed.utwente.nl>
Comment 11 Jesse Houwing 2004-03-07 06:07:14 UTC
Thunderbird will show parts that are renderable by plugins or images directly.
Other parts I'd still have to check, but I don't think they'll be included as
plain text.
Comment 12 Jesse Houwing 2004-03-07 06:17:00 UTC
Seems they have fixed this in thunderbird. just tried the test message, but it
works as expected. I'm sure I've seen it happen.
Comment 13 Jesse Houwing 2004-03-07 06:29:24 UTC
Seems they have fixed this in thunderbird. just tried the test message, but it
works as expected. I'm sure I've seen it happen.
Comment 14 Justin Mason 2004-03-07 20:15:30 UTC
Subject: Re:  non-text part inside of forwarded message included in "body" 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>imo, we either need to accept this possibility, or stop including
>message/* parts.  since, as far as we've been able to tell, only apple
>mail seems to display message/* attachments inline with the actual
>message, I'd say we should stop including those parts.

+1 -- agreed with you here.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAS/NWQTcbUG5Y7woRAkbnAJ9RFCuIxRndU5zZHLDR0BQELHqN0QCeL/aW
xF+asDSHQoZt+Y+38SP9xuM=
=TD1i
-----END PGP SIGNATURE-----

Comment 15 Justin Mason 2004-03-07 20:19:33 UTC
Subject: Re:  non-text part inside of forwarded message included in "body" 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>My concern with doing nothing with them is that spammers could EASILY start
>sending mail that says "Subject: forwarded message", includes a forwarded
>message that we entirely skipped, and then someone has to open the attachment
>to see ... a spam.

IMO, they could do the same with a HTML document in a password-protected
ZIP file.  So I'm -1 about this idea.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAS/RIQTcbUG5Y7woRAuUNAKDKVUPW8Ej+UO3+Mv8gDVEnH6/Q6wCgmt02
llPNTCdWeswLXLJKIBFmHUA=
=90Lu
-----END PGP SIGNATURE-----

Comment 16 Daniel Quinlan 2004-03-07 20:39:32 UTC
Subject: Re:  non-text part inside of forwarded message included in "body"

> IMO, they could do the same with a HTML document in a password-protected
> ZIP file.  So I'm -1 about this idea.

There's a significant gap between ZIP files (especially
password-protected ones) and forwarded messages.  The latter looks much
more innocent, is well-supported everywhere, and is even displayed
inline (either by default or through an option) in some MUAs like Apple
Mail and Mozilla.

Bear in mind we've always been rendering text/message parts up through
2.64, just not especially gracefully.  Ignoring them would create a
gaping wide hole.

Daniel

Comment 17 Theo Van Dinter 2004-03-08 12:40:41 UTC
fyi for apple mail...  it will display the message/rfc822 inline -- but it doesn't decode it.  so in my case I 
sent a multipart message to myself, with the first part being message/rfc822 w/ base64 encoding 
(preserving headers as it passes between MTAs).  Apple Mail happily shows me the attachment and the 
base64 encoded strings.
Comment 18 Daniel Quinlan 2004-03-12 23:01:51 UTC
> base64 encoded strings

Hmmm... we definitely don't want to run tests on base64-encoded strings.
Recursively decoding the internals of message/rfc822 is probably the way
to go if we're not doing that already.

Comment 19 Theo Van Dinter 2004-03-13 07:35:40 UTC
yeah, I don't want to mimic the apple mail behavior.  the parser makes a subtree 
out of message/rfc822 parts, then they're handled in the same was as everything 
else, so no problems there.
Comment 20 Justin Mason 2004-04-14 13:00:54 UTC
*** Bug 3271 has been marked as a duplicate of this bug. ***
Comment 21 Justin Mason 2004-04-14 13:02:28 UTC
just noting that bug 3271 is a real-world case of wrong behaviour caused by
this... again, I'm -1 on scanning into message/rfc822 parts.
Comment 22 Daniel Quinlan 2004-04-14 13:28:07 UTC
Subject: Re:  non-text part inside of forwarded message included in "body"

> just noting that bug 3271 is a real-world case of wrong behaviour caused by
> this... again, I'm -1 on scanning into message/rfc822 parts.

If we don't scan message/rfc822 parts, then that's exactly what spammers
will start sending (and this is a similar problem to one of the major
flaws with challege/response systems, spammers can fake C/R messages and
trick users into opening them).

Haven't we always scanned message/rfc822 parts anyway?  Maybe not very
effectively or consistently, but I thought we just muddled through them
in 2.6x.  Maybe I should re-read the old thread... :-)

Comment 23 Justin Mason 2004-04-14 13:50:56 UTC
Subject: Re:  non-text part inside of forwarded message included in "body" 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>If we don't scan message/rfc822 parts, then that's exactly what spammers
>will start sending (and this is a similar problem to one of the major
>flaws with challege/response systems, spammers can fake C/R messages and
>trick users into opening them).

I think I'd settle for making this behaviour optional, through a boolean
config parameter.  I will definitely be turning it off ;)

Consider also the effects on Bayes learning -- if a mailman admin
(or someone similarly receiving nonspam mails with spammy messages
encapsulated within them) wants the nonspam mails to get past Bayes,
they'll probably consider learning them as ham.  That'll wind up
with a load of spam tokens (from the encapsulated spam) getting learned
as ham.

>Haven't we always scanned message/rfc822 parts anyway?  Maybe not very
>effectively or consistently, but I thought we just muddled through them
>in 2.6x.  Maybe I should re-read the old thread... :-)

Possibly it's the new MIME-part comparison rules that are causing trouble
here: MIME_HTML_MOSTLY, MPART_ALT_DIFF.  Also the HTML rules are now
firing on the message/rfc822 text, whereas before I think the
message/rfc822 part would be treated as plain text, and its HTML
sub-parts would not be parsed as HTML correctly.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAfaQpQTcbUG5Y7woRAgIpAJ92tCJNMVE5AXhXqarpRJSMmJDg7QCeK+5W
6Htw4Qg9s8ZtLPj3P6S1XcU=
=Nmvu
-----END PGP SIGNATURE-----

Comment 24 Nick Leverton 2004-04-14 14:32:47 UTC
Subject: Re:  non-text part inside of forwarded message included in "body"

On Wed, Apr 14, 2004 at 01:28:08PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> http://bugzilla.spamassassin.org/show_bug.cgi?id=3069
> ------- Additional Comments From quinlan@pathname.com  2004-04-14 13:28 -------
> If we don't scan message/rfc822 parts, then that's exactly what spammers
> will start sending (and this is a similar problem to one of the major
> flaws with challege/response systems, spammers can fake C/R messages and
> trick users into opening them).
> 
> Haven't we always scanned message/rfc822 parts anyway?  Maybe not very
> effectively or consistently, but I thought we just muddled through them
> in 2.6x.  Maybe I should re-read the old thread... :-)

One place where it would be a really good idea to scan message/rfc822s
is in the case of DSNs.  Where you get either a complete message/rfc822
or a text/rfc822-headers within something that looks like a DSN
(multipart/report for example), it would be extremely worthwhile scoring
that inner message.  Any Received: lines in it can be continued backwards
from the final one in the outer message.  As traces of message delivery
they're exactly as trustworthy as they would be anyway.  The ones where
I've done this by hand usually end up on an RBLed IP address - all those
points going to waste :-~)

In 2.63, there is a problem with bounces going straight through the
system - especially if it's newly installed and learning from scratch.
Many bounces trigger no rules at all on 2.63, and I've had to fiddle
with our rules to catch the spammy ones, so I'm really glad to hear that
3.0 will do that.  I'll have to try it out :)  I had started writing a
plugin to do it, not knowing it was already in the code !

Nick

Comment 25 Justin Mason 2004-04-14 15:14:26 UTC
Subject: Re:  non-text part inside of forwarded message included in "body" 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> One place where it would be a really good idea to scan message/rfc822s
> is in the case of DSNs.  Where you get either a complete message/rfc822
> or a text/rfc822-headers within something that looks like a DSN
> (multipart/report for example), it would be extremely worthwhile scoring
> that inner message.  Any Received: lines in it can be continued backwards
> from the final one in the outer message.  As traces of message delivery
> they're exactly as trustworthy as they would be anyway.  The ones where
> I've done this by hand usually end up on an RBLed IP address - all those
> points going to waste :-~)

Well -- that's another question. Is a *legit* DSN from a misconfigured
host, containing a spam, a spam itself?   In other words, should
SpamAssassin be scoring virus/spam "blowback" as spam?

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAfbe7QTcbUG5Y7woRAi+qAJ0V1SNlAnT0a9tcyimorTIaL/7R/ACglV6f
eqTLHe4q8wpEkJZDd43Z2N4=
=AVlg
-----END PGP SIGNATURE-----

Comment 26 Nick Leverton 2004-04-15 01:37:12 UTC
Subject: Re:  non-text part inside of forwarded message included in "body"

Thanks for correctly divining I was talking about spam bounces :) Yes, I
think it should.  At the moment, spam bounces are extremely effective at
getting through SA and delivering the spam, because neither the original
headers nor the original body are scored.  Albeit to the original forged
originator rather than the destination, but I don't think the spammers
care how they get delivered.  I will give a more recent 3.0 a try, because
overcoming this problem has taken days of my Copious Spare Time so far.

Nick

Comment 27 Matt Sergeant 2004-04-20 14:24:21 UTC
Subject: Re:  non-text part inside of forwarded message included
 in "body"

On Wed, 14 Apr 2004, bugzilla-daemon@bugzilla.spamassassin.org wrote:

> Well -- that's another question. Is a *legit* DSN from a misconfigured
> host, containing a spam, a spam itself?   In other words, should
> SpamAssassin be scoring virus/spam "blowback" as spam?

Our customers think so.

Comment 28 Theo Van Dinter 2004-04-28 23:45:33 UTC
the more I think about this, the more I think I'd rather leave the message/rfc822 parts alone.  don't 
parse into a subtree, etc.

the majority of MUAs don't display automatically, so I don't think spammers are going to start doing 
this en masse.  if this starts happening, and users go and open attachments from people they don't 
know (don't they know not to do this due to worms/etc?), there's really no difference between that and 
if spammers start sending doc, pdf, etc spam attachments.  we don't scan those either.

just because there's a spam attachment message, doesn't mean the whole message should be 
considered spam anyway.  mailman notification and (imnsho) DSNs are examples of this -- non-spam 
email messages that may contain spam messages as an attachment, not as an advertisement.

so I'd say we should take the message/* parts, and ignore them entirely.
Comment 29 Justin Mason 2004-04-29 00:01:56 UTC
Subject: Re:  non-text part inside of forwarded message included in "body" 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>the majority of MUAs don't display automatically, so I don't think
>spammers are going to start doing this en masse.  if this starts
>happening, and users go and open attachments from people they don't know
>(don't they know not to do this due to worms/etc?), there's really no
>difference between that and if spammers start sending doc, pdf, etc spam
>attachments.  we don't scan those either.

...or password-protected ZIP files, for that matter. ;)

>just because there's a spam attachment message, doesn't mean the whole
>message should be considered spam anyway.  mailman notification and
>(imnsho) DSNs are examples of this -- non-spam email messages that may
>contain spam messages as an attachment, not as an advertisement.

Yep.

>so I'd say we should take the message/* parts, and ignore them entirely.

Agreed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAkKhVQTcbUG5Y7woRAkK6AKCyhE8rkSO3ICbUH5+ij8psVK1p4gCfcSWF
Bly3QJ1ylSUvKT1mLpU48N8=
=Ifme
-----END PGP SIGNATURE-----

Comment 30 Daniel Quinlan 2004-04-29 01:02:45 UTC
>so I'd say we should take the message/* parts, and ignore them entirely.

I have to disagree.

1. message/rfc822 attachments that are spammy are typically not wanted
2. opening a message/rfc822 attachment *is* done automatically by some
   MUAs and is easy in most other MUAs
3. it's much easier than opening viruses, password protected ZIP files, etc.
   and the analogy doesn't really work -- virus scanners (when present) will
   catch those, they don't catch spam and spammers don't need someone to
   run a message/rfc822 -- they just need someone to look at it, so it will
   be considered safe by MUAs.

It's 100% our job to scan ALL of the message, especially this which is so
easy for us to scan.  If spammers start attaching other formats, then we can
write rules for that.  We can't just mark message/rfc822-containing messages
as spam, but we can check the contents and we should continue to do so.

Not scanning them because we developers might get an occasional spam forwarded
to us is a disservice to 99.99% of SpamAssassin users who don't want this crap.
Comment 31 Theo Van Dinter 2004-05-10 21:07:46 UTC
ok folks... we're going to need to come to some form of conclusion to this.

Right now, we essentially have 2 +1s for not parsing, and 1 +1 for doing the 
parsing.  I don't think we're going to get consensus on this.

The code required to do the parsing is trivial.  It's an if statement and about 
4 lines of code.  ie: to enable/disable it is trivial.

Shall we take a vote?  Do we want more discussion?
Comment 32 Daniel Quinlan 2004-05-10 21:24:18 UTC
I'm satisfied that my original bug has been fixed, so this can be closed.
Comment 33 Justin Mason 2004-05-10 23:37:25 UTC
ok then! moved the discussion to bug 3367.
Comment 34 Fred T 2005-09-29 17:31:03 UTC
*** Bug 4606 has been marked as a duplicate of this bug. ***