Bug 3271 - new MIME parser FPs much more often on Mailman admin messages
Summary: new MIME parser FPs much more often on Mailman admin messages
Status: RESOLVED DUPLICATE of bug 3069
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 major
Target Milestone: 3.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-04-14 11:25 UTC by Justin Mason
Modified: 2004-04-14 05:00 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
mailman admin notice FP text/plain None Justin Mason [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2004-04-14 11:25:16 UTC
Mailman 2.1.x has a (nifty) new feature.  When a list is set to require admin
approval for non-members to post, it'll send the moderation-required message in
this format:

From: list-owner@example.com
Subject: blah post from hywworqlgwq@summitoh.net requires approval
Content-type: multipart/mixed ...

The multipart/mixed parts are:

   [text/plain]: a brief "please authorize this posting" msg
   [message/rfc822]: the original message
   [message/rfc822]: an approval message suitable for use as response

This is great for list moderation to fend off spam.

Now, the problem is -- in 2.63 this was fine, and got through no problem,
presumably because of limitations in the 2.6x MIME parser.  However, I've *just*
installed 3.0.0svn on my server for dogfooding, and it doesn't handle them at
all well; every single 'requires approval' message that related to a spam has
been caught as spam.

It looks like the new MIME parser is descending into the message/rfc822 part. 
Here's the rules hit from one msg:

X-spam-report: 
	*  0.2 NO_REAL_NAME From: does not include a real name
	*  1.0 HTML_OBFUSCATE_20_30 BODY: Message is 20% to 30% HTML obfuscation
	*  0.0 HTML_10_20 BODY: Message is 10% to 20% HTML
	*  1.2 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME
	*  1.0 HTML_BADTAG_40_50 BODY: HTML message is 40% to 50% bad tags
	* -0.0 BAYES_44 BODY: Bayesian spam probability is 44 to 50%
	*      [score: 0.5000]
	*  3.0 MPART_ALT_DIFF BODY: HTML and text parts are different
	*  1.0 HTML_NONELEMENT_60_70 BODY: 60% to 70% of HTML elements are non-standard
	*  0.1 HTML_MESSAGE BODY: HTML included in message
	*  0.6 MIME_HTML_NO_CHARSET RAW: Message text in HTML without charset
	*  1.0 URIBL_SBL Contains a URL listed in the SBL blocklist
	*      [URIs: monnsid.com]
	*  1.0 LONGWORDS Long string of long words
	* -1.8 AWL AWL: From: address is in the auto white-list
X-spam-status: Yes, score=8.2 required=5.0 tests=AWL,BAYES_44,HTML_10_20,
	HTML_BADTAG_40_50,HTML_MESSAGE,HTML_NONELEMENT_60_70,
	HTML_OBFUSCATE_20_30,LONGWORDS,MIME_HTML_MOSTLY,MIME_HTML_NO_CHARSET,
	MPART_ALT_DIFF,NO_REAL_NAME,URIBL_SBL autolearn=no version=3.0.0-r9952

(msg attached)

I've manually whitelisted my list admin addresses to work around this, but I do
get a stack of spam directly to those addrs as well, so that's nonoptimal,
kludgy, requires user configuration, therefore not good.

IMO it'd be better to just not descend into message/rfc822 parts.  After all,
*WE* use message/rfc822 as a "safe" encapsulation format, ourselves!
Comment 1 Justin Mason 2004-04-14 11:26:02 UTC
Created attachment 1900 [details]
mailman admin notice FP
Comment 2 Theo Van Dinter 2004-04-14 11:45:46 UTC
Subject: Re:  New: new MIME parser FPs much more often on Mailman admin messages

On Wed, Apr 14, 2004 at 11:25:17AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> It looks like the new MIME parser is descending into the message/rfc822 part. 
> Here's the rules hit from one msg:

Yep.  It does so explicitly actually.  M::SA::Message, line 497.
see ticket 3069.

Comment 3 Justin Mason 2004-04-14 13:00:53 UTC

*** This bug has been marked as a duplicate of 3069 ***