Bug 5722

Summary: hit-frequencies needs new ranking algorithm which likes fresh spam
Product: Spamassassin Reporter: Justin Mason <jm>
Component: MassesAssignee: SpamAssassin Developer Mailing List <dev>
Status: NEW ---    
Severity: minor    
Priority: P5    
Version: SVN Trunk (Latest Devel Version)   
Target Milestone: Undefined   
Hardware: Other   
OS: other   
Whiteboard:

Description Justin Mason 2007-11-16 03:37:02 UTC
Rules which hit "fresh" spam currently fare badly in the hit-frequencies ranking
report.  

Imagine a mass-check which contains 50k spam messages. 46k are from between 3
months and 1 week old, and the remaining 4k are fresher than 1 week old.  a rule
that hits 10% of that "fresh" spam, therefore, hits only 0.8% of the overall
corpus -- which doesn't look so impressive compared to other rules.  But because
it's hitting "fresh" spam, that's very useful for us.

We should try to come up with a new ranking algo which can take this into
account -- possibly by biasing against "old" spam, by treating a hit on old spam
as increasingly worth less than a hit on fresh spam.  

It needn't bias against "old" ham, however, since ham doesn't have this issue.