Bug 2294 - The Bat! mailer is incorrectly treated forged
Summary: The Bat! mailer is incorrectly treated forged
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 2.60
Hardware: All All
: P5 normal
Target Milestone: 2.61
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 2743 (view as bug list)
Depends on:
Blocks: 2344
  Show dependency tree
 
Reported: 2003-08-05 01:26 UTC by Andrei Arkhipov
Modified: 2003-11-11 14:36 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
patch to remove FORGED_MUA_THEBAT patch None Justin Mason [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Andrei Arkhipov 2003-08-05 01:26:43 UTC
Here is a message header of non-spam message:

Received: from localhost(127.0.0.1) by zuka via smap (V2.0)
        id xma012194; Mon, 4 Aug 03 17:00:26 +0400
Received: from ws-maxim.office.elvis.ru (localhost [127.0.0.1])
        by ra.elvis.ru (8.11.6+Sun/8.11.6) with ESMTP id h74Cxmp08054;
        Mon, 4 Aug 2003 16:59:48 +0400 (MSD)
Date: Mon, 4 Aug 2003 17:00:01 +0400
From: Filippov Maxim <maxim@elvis.ru>
X-Mailer: The Bat! (v1.53d)
Reply-To: Filippov Maxim <maxim@elvis.ru>
Organization: =?koi8-r?B?T0FPICL8zNfJ0ysi?=
X-Priority: 3 (Normal)
Message-ID: <178-1270863024.20030804170001@elvis.ru>
To: elvis+@elvis.ru
Subject: =?koi8-r?B?79TCz9LP3s7ZyiDU1dLOydIgzsEgQklUQ09NTV8yMDAzX0lJ?=
In-Reply-To: <DBEJLLFGEKGLAPCOFPLNGEOOCGAA.ap@elvis.ru>
References: <DBEJLLFGEKGLAPCOFPLNGEOOCGAA.ap@elvis.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=koi8-r

It was incorrectly treated as a FORGED_MUA_THEBAT by

# The Bat! forgeries
header __THEBAT_MUA             X-Mailer =~ /The Bat!/
header __THEBAT_MSGID           MESSAGEID =~ /^<\d+\.\d+\@\S+>$/m
meta FORGED_MUA_THEBAT          (__THEBAT_MUA && !__THEBAT_MSGID)
describe FORGED_MUA_THEBAT      Forged mail pretending to be from The Bat!
Comment 1 Brendan Byrd/SineSwiper 2003-08-12 10:08:26 UTC
Honestly, I get a bunch of forged "The Bat!" mailers, so it -is- a helpful 
rule.  For example, the latest admin@whatever.com worm gets caught by the rule:

Received: from localhost ([80.88.129.107])
        by ResonatorSoft.org (8.11.6/8.11.6) with SMTP id h7BH9jE12877
        for <SineSwiper@ResonatorSoft.org>; Mon, 11 Aug 2003 13:09:49 -0400
Date: Mon, 11 Aug 2003 13:09:49 -0400
Message-Id: <200308111709.h7BH9jE12877@ResonatorSoft.org>
From: admin@ResonatorSoft.org
To: SineSwiper <SineSwiper@ResonatorSoft.org>
Reply-To: admin@ResonatorSoft.org
X-Mailer: The Bat! (v1.61)
X-Priority: 2 (High)
Subject: your account                         veevvodv
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------B8F5E4550077FCB"
Comment 2 Malte S. Stretz 2003-08-12 12:21:32 UTC
Hmm... AFAICR were there some fixes for the FORGET_THEBAT rules for 2.60; we 
now follow the official RITlabs way to identify The Bat!. Andrei, any chance 
you could try it out with 2.60-cvs? 
Comment 3 Brendan Byrd/SineSwiper 2003-09-07 15:16:24 UTC
Do you have a bug # for this fix?  I plugged his headers into my version 
(fairly recent CVS) of SpamAssassin and it caught it as FORGED_MUA_THEBAT.
Comment 4 Malte S. Stretz 2003-09-07 15:36:36 UTC
Yes, I see. It doesn't match the Message-Id pattern... 
Comment 5 Brendan Byrd/SineSwiper 2003-09-08 10:04:31 UTC
Ahhh...is it supposed to have that dash in there?  Manual patch edit:

- header __THEBAT_MSGID           MESSAGEID =~ /^<\d+\.\d+\@\S+>$/m
+ header __THEBAT_MSGID           MESSAGEID =~ /^<[\d+\-]\.\d+\@\S+>$/m

Anybody have a contact with RITlabs to verify that it's a legal Message-ID?
Comment 6 Brendan Byrd/SineSwiper 2003-09-09 09:51:46 UTC
Bah...I'm an idiot.  That should read:

- header __THEBAT_MSGID           MESSAGEID =~ /^<\d+\.\d+\@\S+>$/m
+ header __THEBAT_MSGID           MESSAGEID =~ /^<[\d\-]+\.\d+\@\S+>$/m

Comment 7 Brendan Byrd/SineSwiper 2003-09-11 16:14:28 UTC
The actual line on the CVS is:

header __BAT_MSGID      MESSAGEID =~ /^<\d{2,12}\.\d{14}\@\S+>$/m

I could include the dash as a OK character, like [\d\-]{2,12}.  However, the 
Message-ID with the dash is more than 12 characters:

178-1270863024.20030804170001@elvis.ru
^^^^^^^^^^^^^^
00000000011111
12345678901234

The part after the "178-" -is- 10 characters, within the range, so it's 
possible that "178-" is some extra characters that might appear on there.  Or 
it may be an old standard for pre-V2.0 The Bat! clients.

I (and probably the devels) don't really want to change this rule unless some 
official word from RITlabs says its okay.
Comment 8 Theo Van Dinter 2003-10-27 21:07:59 UTC
well, I looked through my corpus and found 4 message-ids that didn't match the standard version:

<15660.010102@thinkgeek.com>
<1671550353000.20030623152607@certiflexdimension.com>
<8607.990601@linepoint.com>
<17447.990603@linepoint.com>

the first, third, and fourth are all valid.  it looks like before the YYYYMMDDHHMMSS format, it was 
just YYMMDD.  so those FPs can be fixed.  The second is also a valid message, but it went through 
a mailing list, but it looks like that mailing list doesn't change the Message-ID header at all.  The 
reason it doesn't match is the part before the . is 13 chars long.


how often does the 178-* style message-id show up?  is it fairly often, or is this perhaps just a 
one-off type issue?  I'd like to just say "yeah, this happens", and call it good.
Comment 9 Justin Mason 2003-11-01 16:37:28 UTC
hmm, I think we're doing this wrong -- instead of trying to fix the rule, let's
just drop it!

We currently have:

  1.681   2.5780   0.0000    1.000   0.95    4.30  FORGED_THEBAT_HTML
  1.625   2.4922   0.0000    1.000   0.95    4.30  FORGED_MUA_THEBAT_BOUN
  1.768   2.7107   0.0037    0.999   0.95    4.29  FORGED_MUA_THEBAT
  0.151   0.2309   0.0026    0.989   0.92    2.80  FORGED_MUA_THEBAT_CS

Note that FORGED_MUA_THEBAT_BOUN and FORGED_THEBAT_HTML catch very nearly the
same amount of stuff as FORGED_MUA_THEBAT, without any false positives; also
note that we *keep* running into gateways mangling the message-ids, and wierd
bug reports like this.   So I suggest we just nuke FORGED_MUA_THEBAT in 2.61
(and possibly tweak scores a little to make up, if required -- but I doubt it
will be.)
Comment 10 Justin Mason 2003-11-01 16:37:52 UTC
Created attachment 1534 [details]
patch to remove FORGED_MUA_THEBAT
Comment 11 Daniel Quinlan 2003-11-04 17:17:44 UTC
I'm not sure.  What's the overlap like?
Comment 12 Justin Mason 2003-11-04 17:56:42 UTC
anyone got a copy of the 2.60 logs rsynced down already?  I don't, so running an
overlap would be a bit too much effort right now ;)

(translation: I'm lazy)
Comment 13 Justin Mason 2003-11-04 21:59:11 UTC
ok -- overlap data:

[jm@bugzilla masses]$ grep FORGED_THEBAT_HTML ov | grep FORGED_MUA_THEBAT
3164    0.901   0.884   FORGED_THEBAT_HTML,FORGED_MUA_THEBAT
3067    0.882   0.874   FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML
230     0.576   0.066   FORGED_MUA_THEBAT_CS,FORGED_THEBAT_HTML

so 90%/88% overlap between FORGED_THEBAT_HTML and FORGED_MUA_THEBAT.
I'd say that's good enough to drop the rule, given the FPs.
Comment 14 Justin Mason 2003-11-06 19:08:21 UTC
comments?   +1s?

I think it would be a good idea to drop the rule, allowing us to get away from
the message-id forgery-detection rules for more reliable versions.
Comment 15 Justin Mason 2003-11-09 18:37:49 UTC
*** Bug 2743 has been marked as a duplicate of this bug. ***
Comment 16 Justin Mason 2003-11-09 18:38:38 UTC
OK, marked 2743 as a dup; it would be fixed by removing the rule.
Still waiting for comments...
Comment 17 Duncan Findlay 2003-11-11 20:03:45 UTC
+1

I guess you can't really change the other scores at all, so leave them as is.
Comment 18 Justin Mason 2003-11-11 23:36:22 UTC
ok, applied -- thanks Duncan ;)