SA Bugzilla – Bug 5722
hit-frequencies needs new ranking algorithm which likes fresh spam
Last modified: 2007-11-16 03:37:02 UTC
Rules which hit "fresh" spam currently fare badly in the hit-frequencies ranking report. Imagine a mass-check which contains 50k spam messages. 46k are from between 3 months and 1 week old, and the remaining 4k are fresher than 1 week old. a rule that hits 10% of that "fresh" spam, therefore, hits only 0.8% of the overall corpus -- which doesn't look so impressive compared to other rules. But because it's hitting "fresh" spam, that's very useful for us. We should try to come up with a new ranking algo which can take this into account -- possibly by biasing against "old" spam, by treating a hit on old spam as increasingly worth less than a hit on fresh spam. It needn't bias against "old" ham, however, since ham doesn't have this issue.