Bug 5722 - hit-frequencies needs new ranking algorithm which likes fresh spam
Summary: hit-frequencies needs new ranking algorithm which likes fresh spam
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Masses (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 minor
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
Depends on:
Reported: 2007-11-16 03:37 UTC by Justin Mason
Modified: 2007-11-16 03:37 UTC (History)
0 users

Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2007-11-16 03:37:02 UTC
Rules which hit "fresh" spam currently fare badly in the hit-frequencies ranking

Imagine a mass-check which contains 50k spam messages. 46k are from between 3
months and 1 week old, and the remaining 4k are fresher than 1 week old.  a rule
that hits 10% of that "fresh" spam, therefore, hits only 0.8% of the overall
corpus -- which doesn't look so impressive compared to other rules.  But because
it's hitting "fresh" spam, that's very useful for us.

We should try to come up with a new ranking algo which can take this into
account -- possibly by biasing against "old" spam, by treating a hit on old spam
as increasingly worth less than a hit on fresh spam.  

It needn't bias against "old" ham, however, since ham doesn't have this issue.