SA Bugzilla – Bug 6010
RuleQA: default corpus for QA measurements should ignore "high scoring" spam
Last modified: 2008-11-04 03:23:10 UTC
Currently, our ruleqa measurements include details of how the rules perform against _all_ mail in the spam corpora, including the stuff that's hitting every single rule we have. This means that great rules like: http://ruleqa.spamassassin.org/20081103-r710024-n/PQRTW_4/detail are hidden. This is clearly demonstrated by my 2 corpora, "jm" and "bb-jm". "bb-jm" is my high-scoring spam; on this corpus, PQRTW_4 hits only 0.251% of spam. But on my low-scoring spam ("jm"), it hits 13.6118%. Overall it hits 0.7921% of spam. But as you can see it's really good against the low-scoring stuff. Now, that's where we _need_ good rules... so in my opinion we should fix the ruleqa app to highlight those rules by default. We don't need lots of rules that hit the spam we're already catching. I suggest the ruleqa scripts are extended to track a new subset of logs, alongside the current set, for mass-check lines under some score threshold (10 points?). So something like this: set 0, low-scoring spam MSECS SPAM% HAM% S/O RANK SCORE NAME 0.00000 10.6118 0.0000 1.000 0.86 1.00 PQRTW_4 set 0, in aggregate MSECS SPAM% HAM% S/O RANK SCORE NAME 0.00000 0.7921 0.0000 1.000 0.86 1.00 PQRTW_4 set 0, broken down by message age in weeks MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE 0.00000 0.3335 0.0000 1.000 0.66 1.00 PQRTW_4 0-1 0.00000 0.2991 0.0000 1.000 0.63 1.00 PQRTW_4 1-2 0.00000 0.0000 0.0000 0.500 0.45 1.00 PQRTW_4 2-3 0.00000 0.7855 0.0000 1.000 0.80 1.00 PQRTW_4 3-6 set 0, broken down by contributor [etc.] this should be easy enough to do. I don't think it needs to dictate the promotion criteria; rules like this would still be promoted, since the SPAM% ratio is over the very low threshold (what is it? 0.1%? can't recall)