Bug 1106 - rules to detect forged MUAs
Summary: rules to detect forged MUAs
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 2.42
Hardware: All All
: P3 minor
Target Milestone: ---
Assignee: Daniel Quinlan
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-10-12 06:17 UTC by Martin Radford
Modified: 2002-12-18 21:27 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Message-ID format rules for Mutt and The Bat! text/plain None Martin Radford [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Radford 2002-10-12 06:17:54 UTC
The rule identifying Outlook Express as a non-spam MUA has a positive score.
This is because the header is frequently forged.  OE (and Outlook) have a 
readily recognised Message-Id format, which many spams claiming to have been
sent with OE don't use.  I suggest a new meta rule to try to identify these.
__HAS_OUTLOOK_IN_MAILER already exists, I've added __MSGID_MS_FORMAT and 
FAKED_MS_MUA rules as shown below to my local.cf:

header __MSGID_MS_FORMAT        Message-Id =~ /^<[0-9a-f]{12,12}\$[0-9a-f]{8,8}\
$[0-9a-f]{8,8}\@.{1,50}>$/
describe __MSGID_MS_FORMAT      Message-Id is in standard Microsoft format

meta FAKED_MS_MUA       (__HAS_OUTLOOK_IN_MAILER && !__MSGID_MS_FORMAT)
describe FAKED_MS_MUA   Mailer claims to be Outlook/OE, but Message-Id is in wro
ng format
score FAKED_MS_MUA 1.0

Obviously a new run of the scoring system would be required.
Comment 1 Daniel Quinlan 2002-10-13 02:20:48 UTC
(assigning to me)

Thanks.  Seems like a good test to try.  I had to make a few changes so far
(also please don't put extra newlines into submissions, if you have trouble with
cut-and-paste due to your browser, attachments are a good idea).

Here's the revised version (it just exempts Outlook IMO which has a
different header format).

header __OUTLOOK_EXCEPT_IMO    X-Mailer =~ /Microsoft Outlook(?! IMO)/
header __OUTLOOK_MSGID         Message-Id =~
/^<[0-9a-f]{12,12}\$[0-9a-f]{8,8}\$[0-9a-f]{8,8}\@.{1,50}>$/
meta T_FORGED_OUTLOOK_MAILER   (__OUTLOOK_EXCEPT_IMO && !__OUTLOOK_MSGID)
describe T_FORGED_OUTLOOK_MAILER       Forged mail pretending to be from Outlook/OE
score T_FORGED_OUTLOOK_MAILER  1.0

It works well

OVERALL%   SPAM% NONSPAM%     S/O    RANK   SCORE  NAME
  12402     4708     7694    0.38    0.00    0.00  (all messages)
100.000   37.962   62.038    0.38    0.00    0.00  (all messages as %)
  3.822    9.919    0.091    0.99    0.62    1.00  T_FORGED_OUTLOOK_MAILER

except for the troublesome false positives:

X-Mailer: Microsoft Outlook Express 6.00.2600.0000
Message-ID: <OE55z9IPUS9O4Tsvthl00001e4d@hotmail.com>

Message-ID: <20285A942B45D5118B1400A0D2A4615502858D@NTSERVER>
X-Mailer: Microsoft Outlook CWS, Build 9.0.2416 (9.0.2911.0)

Message-ID: <20285A942B45D5118B1400A0D2A4615502859C@NTSERVER>
X-Mailer: Microsoft Outlook CWS, Build 9.0.2416 (9.0.2911.0)

Message-ID: <20285A942B45D5118B1400A0D2A461550285A3@NTSERVER>
X-Mailer: Microsoft Outlook CWS, Build 9.0.2416 (9.0.2911.0)

X-Mailer: Microsoft Outlook Express 6.00.2600.0000
Message-ID: <OE692nnNf0GjFbznl740001024a@hotmail.com>

X-Mailer: Microsoft Outlook Express 6.00.2600.0000
Message-ID: <OE46gLzhVOJwnzsLqIf00000495@hotmail.com>

X-Mailer: Microsoft Outlook Express 6.00.2600.0000
Message-ID: <OE2143oFLaqSOc7wo9500001163@hotmail.com>

If I exclude those X-Mailers, my SPAM% goes from 9.919% to 6.096% so I'd
rather not exclude them, but this test might work a lot worse for other
people.
Comment 2 Daniel Quinlan 2002-10-13 02:22:18 UTC
I forgot to note, it's now in CVS for testing.  So my cut-and-paste was
for information, not actual use.  ;-)
Comment 3 Daniel Quinlan 2002-10-15 00:42:58 UTC
Here are more rules to test!  The FPs for T_FORGED_MUA_IMS are all from the
same person and site and even though the MUA version is the same as other
non-FPs, it has a very different Message-ID format, so I suspect that he or
his site is munging outgoing IDs which would make it usable.

OVERALL%   SPAM% NONSPAM%     S/O    RANK   SCORE  NAME
  12402     4708     7694    0.38    0.00    0.00  (all messages)
100.000   37.962   62.038    0.38    0.00    0.00  (all messages as %)
  1.193    3.144    0.000    1.00    0.86    1.00  T_FORGED_MUA_MOZILLA
  0.839    2.209    0.000    1.00    0.84    1.00  T_FORGED_MUA_OIMO
  0.782    2.060    0.000    1.00    0.83    1.00  T_FORGED_MUA_AOL
  0.718    1.890    0.000    1.00    0.83    1.00  T_FORGED_MUA_EUDORA
  3.895   10.068    0.117    0.99    0.60    1.00  T_FORGED_MUA_OUTLOOK
  0.750    1.784    0.117    0.94    0.49    1.00  T_FORGED_MUA_IMS

All in CVS now.
Comment 4 Martin Radford 2002-10-15 06:57:10 UTC
Your rule for Outlook IMO is too strict (or it would be if the . in [a-z.]
were quoted :-)

Would 
/^<[A-P]{28}\.[a-zA-Z_\.]+\@\S+>$/
be better (you can definitely have capitals and underscore here)?

Also, I note that you're using \S+ as the right-hand side of the message ID.
Someone else pointed out privately that my \@.{1,50} looked odd (for 
__OUTLOOK_MSGID) - should that be changed to \S+ too?
Comment 5 Martin Radford 2002-10-15 07:29:11 UTC
Created attachment 401 [details]
Message-ID format rules for Mutt and The Bat!
Comment 6 Martin Radford 2002-10-15 07:40:11 UTC
Above attachment contains a set of rules for The Bat! and Mutt.

I've seen a fair number of spams claiming to be from The Bat!  Mutt is for
completeness.  It possibly isn't worth doing other MUAs at the moment, since 
most of them don't seem to have any spam filed against them (at least, not 
according to STATISTICS.TXT).
Comment 7 Daniel Quinlan 2002-10-15 15:48:36 UTC
Subject: Re: [SAdev]  rules to detect forged MUAs

martin-sabz@zamenhof.demon.co.uk writes:

> Above attachment contains a set of rules for The Bat! and Mutt.
> 
> I've seen a fair number of spams claiming to be from The Bat!  Mutt is for
> completeness.  It possibly isn't worth doing other MUAs at the moment, since 
> most of them don't seem to have any spam filed against them (at least, not 
> according to STATISTICS.TXT).

Mutt is never (or almost never) forged (like my mailer, Emacs/VM) so I
highly doubt it will be worth running as a rule.  I think the risk of
Mutt changing their Message-ID format outweighs the potential benefit
so I didn't add it to CVS (all I got was FPs anyway).

A lot of the spam claiming to come from The Bat! is actually sent from
The Bat!  It's frequently used as spamware, even though it is used by
non-spammers.  However, it does look like a rule might work out well.
I found some older versions that use a slightly different Message-ID,
though, so I simplified the rule a bit and added it to CVS.

Also, for the date strings starting with "200", I think it's fine to
just use \d and a length (or length range) like the other rules.  If a
spammer manages to mimic that much, then they're going to notice it's
a date and mimic that as well.

Dan

Comment 8 Daniel Quinlan 2002-10-15 15:57:19 UTC
Subject: Re: [SAdev]  rules to detect forged MUAs

bugzilla-daemon@hughes-family.org writes:

> Your rule for Outlook IMO is too strict (or it would be if the . in
> [a-z.]  were quoted :-)

You don't need to quote . inside of a [] set.

> Would /^<[A-P]{28}\.[a-zA-Z_\.]+\@\S+>$/ be better (you can
> definitely have capitals and underscore here)?

I found underscore, but do capitals happen?  Do you have non-spam
examples you could attach?  I'll add capitals for now.
 
> Also, I note that you're using \S+ as the right-hand side of the message ID.
> Someone else pointed out privately that my \@.{1,50} looked odd (for 
> __OUTLOOK_MSGID) - should that be changed to \S+ too?

Changed in CVS.  I used \S+ to keep things simple for now.  We can
change some of them to be more specific if it would significantly
raise SPAM% without any FPs.

Dan

Comment 9 Justin Mason 2002-12-19 06:27:37 UTC
in CVS