SA Bugzilla – Bug 2294
The Bat! mailer is incorrectly treated forged
Last modified: 2003-11-11 14:36:22 UTC
Here is a message header of non-spam message: Received: from localhost(127.0.0.1) by zuka via smap (V2.0) id xma012194; Mon, 4 Aug 03 17:00:26 +0400 Received: from ws-maxim.office.elvis.ru (localhost [127.0.0.1]) by ra.elvis.ru (8.11.6+Sun/8.11.6) with ESMTP id h74Cxmp08054; Mon, 4 Aug 2003 16:59:48 +0400 (MSD) Date: Mon, 4 Aug 2003 17:00:01 +0400 From: Filippov Maxim <maxim@elvis.ru> X-Mailer: The Bat! (v1.53d) Reply-To: Filippov Maxim <maxim@elvis.ru> Organization: =?koi8-r?B?T0FPICL8zNfJ0ysi?= X-Priority: 3 (Normal) Message-ID: <178-1270863024.20030804170001@elvis.ru> To: elvis+@elvis.ru Subject: =?koi8-r?B?79TCz9LP3s7ZyiDU1dLOydIgzsEgQklUQ09NTV8yMDAzX0lJ?= In-Reply-To: <DBEJLLFGEKGLAPCOFPLNGEOOCGAA.ap@elvis.ru> References: <DBEJLLFGEKGLAPCOFPLNGEOOCGAA.ap@elvis.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r It was incorrectly treated as a FORGED_MUA_THEBAT by # The Bat! forgeries header __THEBAT_MUA X-Mailer =~ /The Bat!/ header __THEBAT_MSGID MESSAGEID =~ /^<\d+\.\d+\@\S+>$/m meta FORGED_MUA_THEBAT (__THEBAT_MUA && !__THEBAT_MSGID) describe FORGED_MUA_THEBAT Forged mail pretending to be from The Bat!
Honestly, I get a bunch of forged "The Bat!" mailers, so it -is- a helpful rule. For example, the latest admin@whatever.com worm gets caught by the rule: Received: from localhost ([80.88.129.107]) by ResonatorSoft.org (8.11.6/8.11.6) with SMTP id h7BH9jE12877 for <SineSwiper@ResonatorSoft.org>; Mon, 11 Aug 2003 13:09:49 -0400 Date: Mon, 11 Aug 2003 13:09:49 -0400 Message-Id: <200308111709.h7BH9jE12877@ResonatorSoft.org> From: admin@ResonatorSoft.org To: SineSwiper <SineSwiper@ResonatorSoft.org> Reply-To: admin@ResonatorSoft.org X-Mailer: The Bat! (v1.61) X-Priority: 2 (High) Subject: your account veevvodv MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----------B8F5E4550077FCB"
Hmm... AFAICR were there some fixes for the FORGET_THEBAT rules for 2.60; we now follow the official RITlabs way to identify The Bat!. Andrei, any chance you could try it out with 2.60-cvs?
Do you have a bug # for this fix? I plugged his headers into my version (fairly recent CVS) of SpamAssassin and it caught it as FORGED_MUA_THEBAT.
Yes, I see. It doesn't match the Message-Id pattern...
Ahhh...is it supposed to have that dash in there? Manual patch edit: - header __THEBAT_MSGID MESSAGEID =~ /^<\d+\.\d+\@\S+>$/m + header __THEBAT_MSGID MESSAGEID =~ /^<[\d+\-]\.\d+\@\S+>$/m Anybody have a contact with RITlabs to verify that it's a legal Message-ID?
Bah...I'm an idiot. That should read: - header __THEBAT_MSGID MESSAGEID =~ /^<\d+\.\d+\@\S+>$/m + header __THEBAT_MSGID MESSAGEID =~ /^<[\d\-]+\.\d+\@\S+>$/m
The actual line on the CVS is: header __BAT_MSGID MESSAGEID =~ /^<\d{2,12}\.\d{14}\@\S+>$/m I could include the dash as a OK character, like [\d\-]{2,12}. However, the Message-ID with the dash is more than 12 characters: 178-1270863024.20030804170001@elvis.ru ^^^^^^^^^^^^^^ 00000000011111 12345678901234 The part after the "178-" -is- 10 characters, within the range, so it's possible that "178-" is some extra characters that might appear on there. Or it may be an old standard for pre-V2.0 The Bat! clients. I (and probably the devels) don't really want to change this rule unless some official word from RITlabs says its okay.
well, I looked through my corpus and found 4 message-ids that didn't match the standard version: <15660.010102@thinkgeek.com> <1671550353000.20030623152607@certiflexdimension.com> <8607.990601@linepoint.com> <17447.990603@linepoint.com> the first, third, and fourth are all valid. it looks like before the YYYYMMDDHHMMSS format, it was just YYMMDD. so those FPs can be fixed. The second is also a valid message, but it went through a mailing list, but it looks like that mailing list doesn't change the Message-ID header at all. The reason it doesn't match is the part before the . is 13 chars long. how often does the 178-* style message-id show up? is it fairly often, or is this perhaps just a one-off type issue? I'd like to just say "yeah, this happens", and call it good.
hmm, I think we're doing this wrong -- instead of trying to fix the rule, let's just drop it! We currently have: 1.681 2.5780 0.0000 1.000 0.95 4.30 FORGED_THEBAT_HTML 1.625 2.4922 0.0000 1.000 0.95 4.30 FORGED_MUA_THEBAT_BOUN 1.768 2.7107 0.0037 0.999 0.95 4.29 FORGED_MUA_THEBAT 0.151 0.2309 0.0026 0.989 0.92 2.80 FORGED_MUA_THEBAT_CS Note that FORGED_MUA_THEBAT_BOUN and FORGED_THEBAT_HTML catch very nearly the same amount of stuff as FORGED_MUA_THEBAT, without any false positives; also note that we *keep* running into gateways mangling the message-ids, and wierd bug reports like this. So I suggest we just nuke FORGED_MUA_THEBAT in 2.61 (and possibly tweak scores a little to make up, if required -- but I doubt it will be.)
Created attachment 1534 [details] patch to remove FORGED_MUA_THEBAT
I'm not sure. What's the overlap like?
anyone got a copy of the 2.60 logs rsynced down already? I don't, so running an overlap would be a bit too much effort right now ;) (translation: I'm lazy)
ok -- overlap data: [jm@bugzilla masses]$ grep FORGED_THEBAT_HTML ov | grep FORGED_MUA_THEBAT 3164 0.901 0.884 FORGED_THEBAT_HTML,FORGED_MUA_THEBAT 3067 0.882 0.874 FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML 230 0.576 0.066 FORGED_MUA_THEBAT_CS,FORGED_THEBAT_HTML so 90%/88% overlap between FORGED_THEBAT_HTML and FORGED_MUA_THEBAT. I'd say that's good enough to drop the rule, given the FPs.
comments? +1s? I think it would be a good idea to drop the rule, allowing us to get away from the message-id forgery-detection rules for more reliable versions.
*** Bug 2743 has been marked as a duplicate of this bug. ***
OK, marked 2743 as a dup; it would be fixed by removing the rule. Still waiting for comments...
+1 I guess you can't really change the other scores at all, so leave them as is.
ok, applied -- thanks Duncan ;)