SA Bugzilla – Bug 5278
__UNUSABLE_MSGID is hiding FPs; let's remove it
Last modified: 2007-01-11 05:51:54 UTC
quoting from bug 4960: '[a mail] no longer FPs (in 3.1.7 with sa-update) , but only because it triggers __UNUSABLE_MSGID which prevents MSGID_DOLLARS triggering.' This is a very good point -- __UNUSABLE_MSGID calls HeaderEval::check_messageid_not_usable(), which contains this code: # too old; older versions of clients used different formats return 1 if ($self->received_within_months($pms, '6','undef')); In other words, no message over 6 months old will display a MSGID_* rule hit, because it's been blocked by this rule. I think at the time we put it in, it made sense, but since then, Message-IDs have been pretty sane in ham. in the meantime, it's now hiding false positives in the ham corpora, which are generally older than spam corpora (e.g. my spam goes back 5 months or so, but my ham collection is up to 2 years old.) I propose removing those 2 lines ASAP so we can get more accurate ideas of FP rates on the rules that use it: FORGED_MUA_MOZILLA FORGED_MUA_IMS __FORGED_OE __FORGED_OUTLOOK_DOLLARS FORGED_MUA_OIMO FORGED_MUA_EUDORA TVD_FW_GRAPHIC_ID3 TVD_FW_GRAPHIC_ID3_2
ok, here's results for the rules that use it: http://ruleqa.spamassassin.org/?daterev=20070110-r494768-n&rule=%2F%28FORGED%7CTVD_FW_GRAPHIC%7CUNUSABLE%29&srcpath=&g=Change 0.00000 5.2828 0.0281 0.995 0.92 0.00 FORGED_MUA_OUTLOOK5278 0.00000 5.2386 0.0179 0.997 0.93 3.36 FORGED_MUA_OUTLOOK definitely some hidden FPs there worth knowing about, spam% goes up a little. 0.00000 4.7496 0.0281 0.994 0.91 (n/a) __FORGED_OE5278 0.00000 4.7079 0.0179 0.996 0.92 (n/a) __FORGED_OE ditto (probably the same msgs) 0.00000 0.5332 0.0000 1.000 0.81 (n/a) __FORGED_OUTLOOK_DOLLARS5278 0.00000 0.5307 0.0000 1.000 0.80 (n/a) __FORGED_OUTLOOK_DOLLARS 0.00000 0.3351 0.0000 1.000 0.75 0.00 FORGED_MUA_EUDORA5278 0.00000 0.3318 0.0000 1.000 0.75 2.44 FORGED_MUA_EUDORA 0.00000 0.3032 0.0000 1.000 0.73 0.00 FORGED_MUA_OIMO5278 0.00000 0.3010 0.0000 1.000 0.73 1.21 FORGED_MUA_OIMO all just good news. 0.00000 0.2597 0.0013 0.995 0.71 0.00 FORGED_MUA_IMS5278 0.00000 0.2568 0.0000 1.000 0.71 2.48 FORGED_MUA_IMS A hidden FP; worth tracking. 0.00000 0.1856 0.0281 0.868 0.64 0.00 T_TVD_FW_GRAPHIC_ID3_2_5278 0.00000 0.1833 0.0013 0.993 0.67 1.00 TVD_FW_GRAPHIC_ID3_2 0.00000 0.1850 0.0281 0.868 0.64 0.00 T_TVD_FW_GRAPHIC_ID3_5278 0.00000 0.1827 0.0013 0.993 0.67 1.00 TVD_FW_GRAPHIC_ID3 plenty of hidden FPs (again, probably the same messages), worth knowing about! 0.00000 0.0594 19.0556 0.003 0.00 (n/a) __UNUSABLE_MSGID5278 0.00000 1.9085 30.7667 0.058 0.33 (n/a) __UNUSABLE_MSGID looks about right... there's still a lot of ham we ignore though - probably ezmlm lists or "gated_through_received_hdr_remover" hits. At some point we should investigate those too... Anyway, in the meantime, I've applied this. : jm 1061...; svn commit -m "bug 5278: remove 6-month limit imposed via the __UNUSABLE_MSGID rule on FORGED_MUA_* rules which use Message-ID header" rulesrc/sandbox/jm/20_basic.cf lib/Mail/SpamAssassin/Plugin Sending rulesrc/sandbox/jm/20_basic.cf Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm Transmitting file data .. Committed revision 495220.