Bug 1282 - Quasi-Bayesian filter that increases/decreases rule scores
Summary: Quasi-Bayesian filter that increases/decreases rule scores
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 1.5
Hardware: All All
: P2 enhancement
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-12-10 20:20 UTC by Brendan Byrd/SineSwiper
Modified: 2002-12-21 12:58 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Brendan Byrd/SineSwiper 2002-12-10 20:20:42 UTC
The Bayesian filter is based on words, but what about the scores?  If something
turns up as a false negative, one or more of the scores should be increased.  If
it's a false positive, one/more should be decreased.  I know you have something
of this capacity on Razor, but how do you bring this technology home to fix
something right away?

Obviously, a big batch of files marked with real spam and real e-mail would be
inputted, and an improved config file with better scores would be outputted. 
Given a threshold (of say 5.0), it would cycle through all of the messages,
increasing or decreasing the rules (possibly at an equal percentage) on false
+/-s so that it's just above/below the threshold.  After a few cycles, if there
is a problem e-mail that is simply too far extreme in one direction or the
other, it would recommend a blacklist/whitelist or a new rule.

This could be on a personal config level or server level.  I had a posting about
this idea in the spamassassin-devel list a few months back, but I can't find it.
 (It had a lot more detail, but I can scan my Linux partition for it later on.)
Comment 1 Theo Van Dinter 2002-12-21 21:58:18 UTC
I'm just going to close this bug as 'worksforme' since the bayesian system in 
2.50 allows for learning to fix fp/fn problems.  You don't want the scores to 
be changing, you want the probability of spam to be changing.