SA Bugzilla – Bug 5800
RFE: rule DATE_CONTAINS_TAB (rule included)
Last modified: 2008-02-21 15:12:55 UTC
Reminded and encouraged by the recent mailing list thread about DOS_OUTLOOK_TO_MX, I want to propose a new rule. I have posted this rule a couple weeks ago to the mailing list, and it has been hitting like crazy on my particular spam ever since. Maybe I'm just lucky to being tortured by a specific spammer... ;) First, some interesting observations: * DATE_CONTAINS_TAB almost exclusively comes with a faked The Bat! MUA header. * The recent home.graffiti.net URI abuse flood *exclusively* does use both, a faked The Bat! MUA header and the Date header containing a tab. The basic rule is quite simple, and probably should apply to a lot of headers. I have not seen it for other headers than Date, though. Granted, did not have a close look yet. header KB_DATE_CONTAINS_TAB Date:raw =~ /^ \t/ describe KB_DATE_CONTAINS_TAB Header: Date header starts with Tab Within the last 2 weeks, this hit on 25% of my 05-10 spam, 42% for 10-15. It's about 15% for higher scoring spam. (Note: These results include some special, custom crafted rules which apply to my env only.) This definitely hits on the sneaky, low scorers for me. I got additional rules, to score extra points if the MUA is faked to be The Bat!, and I have been told by a user of that MUA, that it never ever generated such headers for him. I have not contacted the authors to verify, though. Also, I got an additional rule to flame spam with a graffiti.net URI and such headers. But that probably is just a temporary issue. I still do hope they will stop that abuse...
Adjusting subject. Yes, the rule is included. ;)
Adjusting Severity, since this actually seems to be commonly used for new rules.
*ping* Anyone interested? It's a simple rule that hits on a lot of spam for me, and actually never should trigger on any ham at all.
fyi, after removing the space (just /^\t/) I get: 3.957 4.2692 0.0000 1.000 0.33 1.00 KB_DATE_CONTAINS_TAB
Thanks, Theo. So at least, it doesn't hit any ham. ;) No, seriously, this is weird. Actually, that has been my first approach too, as I figured SA would remove the single delimiting space before matching. It does for other headers. I just checked again some recent spam with a few variants. Neither Date:raw =~ /^\t/ nor Date =~ /^\t/ does match for me. The Date:raw rule with the space as mentioned in comment 0 does, though... And yes, I just verified again by grepping through the headers. These headers indeed are exactly /^Date: \t/. Stumped. (Oh, and it still hits a magnitude higher on my incoming stream. Maybe I'm just lucky, and some particular spammer loves me... *shrug*)
I believe there was a recent bugfix to remove the leading space that was showing up in header rules.
I just noticed, Justin already added KB_DATE_CONTAINS_TAB and KB_FAKED_THE_BAT to jm/20_basic.cf based on my earlier post to the list. Nice. :) However, the ruleqa results are quite unsatisfying. I believe the reason to be the recent change WRT leading whitespace, as mentioned by Loren and Theo. Still, it hits at least some spam -- due to a not uptodate mass-check env somewhere? With the whitespace fix in place (assuming it only strips the first space, rather than all leading whitespace), the adjusted rule Theo mentioned should hit way better. Justin, can you please tweak this? Also, I believe __THEBAT_MUA in 20_ratware.cf should better be anchored at the beginning. Oh, and is there any way to get ahold of the 2 *hams* with a tab in the Date header, but at least not sent by The Bat? Ths strikes me as odd...
(In reply to comment #7) > I just noticed, Justin already added KB_DATE_CONTAINS_TAB and KB_FAKED_THE_BAT > to jm/20_basic.cf based on my earlier post to the list. Nice. :) > > However, the ruleqa results are quite unsatisfying. I believe the reason to be > the recent change WRT leading whitespace, as mentioned by Loren and Theo. Still, > it hits at least some spam -- due to a not uptodate mass-check env somewhere? Daryl, you may want to check this, it appears to be your corpus. > With the whitespace fix in place (assuming it only strips the first space, > rather than all leading whitespace), the adjusted rule Theo mentioned should hit > way better. Justin, can you please tweak this? done: : jm 163...; svn commit -m "bug 5800: fix bug in KB_DATE_CONTAINS_TAB rule" rulesrc/sandbox/jm/20_basic.cf Sending rulesrc/sandbox/jm/20_basic.cf Transmitting file data . Committed revision 629833. marking fixed, since this is now in. > Also, I believe __THEBAT_MUA in 20_ratware.cf should better be anchored at the > beginning. yep, well spotted. r629836. > Oh, and is there any way to get ahold of the 2 *hams* with a tab in the Date > header, but at least not sent by The Bat? Ths strikes me as odd... Daryl -- these are yours again: . 1 /home/dos/SA-corpus/ham/dos/Inbox-2007/1175734785.M236771P31456V0000000000000302I001C078D_78.cyan.dostech.net,S=18894:2,S KB_DATE_CONTAINS_TAB,UPPERCASE_50_75,__CT,__CTYPE_HAS_BOUNDARY,__DOS_RCVD_MON,__DOS_RELAYED_EXT,__ENV_AND_HDR_FROM_MATCH,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_MAILER,__INR_AND_NO_REF,__LAST_UNTRUSTED_RELAY_NO_AUTH,__MIME_BASE64,__MIME_VERSION,__MIME_VERSION_APPLEMAIL,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MSGID_APPLEMAIL,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NAKED_TO,__NONEMPTY_BODY,__NUMBERS_IN_SUBJ,__PART_STOCK_CD_F,__RELAY_MUA_HELO_IP_OR_NONE,__SANE_MSGID,__TOCC_EXISTS,__TVD_BODY,__TVD_MIME_ATT,__TVD_MIME_ATT_AP,__TVD_MIME_ATT_TP,__TVD_MIME_CT_MM,__UPPERCASE_50_75,__USER_AGENT_APPLEMAIL,__XM_APPLEMAIL,__X_MAILER_APPLEMAIL time=1175545146,scantime=0,format=f,reuse=yes,set=1,host=injector.georgianbayplastics.com . 1 /home/dos/SA-corpus/ham/dos/infra-list/1196034235.M76328P8771V0000000000000302I008D0ED5_3.cyan.dostech.net,S=2822:2,S KB_DATE_CONTAINS_TAB,T_RP_MATCHES_RCVD,T_SIDNEY__GATED_THROUGH_RCVD_REMOVER,T_SIDNEY__LYRIS_EZLM_REMAILER,T_SIDNEY__UNUSABLE_MSGID,__CD,__CT,__CTE,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_HAS_LIST_ID,__DOS_HAS_LIST_UNSUB,__DOS_HAS_MAILING_LIST,__DOS_RCVD_SUN,__DOS_RELAYED_EXT,__DOS_SINGLE_EXT_RELAY,__FH_HAS_XPRIORITY,__GATED_THROUGH_RCVD_REMOVER,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_MAILER,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOCAL_PP_NONPPURL,__LYRIS_EZLM_REMAILER,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MSOE_MID_WRONG_CASE,__NAKED_TO,__NONEMPTY_BODY,__SANE_MSGID,__TOCC_EXISTS,__TVD_BODY,__TVD_MIME_ATT_TP,__UNUSABLE_MSGID time=1196026181,scantime=0,format=f,reuse=no,set=0,host=pe840-c2
(In reply to comment #8) > Daryl, you may want to check this, it appears to be your corpus. > /home/dos/SA-corpus/ham/dos/Inbox-2007/1175734785.M236771P31456V0000000000000302I001C078D_78.cyan.dostech.net,S=18894:2,S Ham (likely auto generated): Mime-Version: 1.0 (Apple Message framework v752.2) X-Mailer: Apple Mail (2.752.2) > /home/dos/SA-corpus/ham/dos/infra-list/1196034235.M76328P8771V0000000000000302I008D0ED5_3.cyan.dostech.net,S=2822:2,S Ham (onet.pl home grown webmail, I think): X-Mailer: onet.poczta