SA Bugzilla – Bug 1282
Quasi-Bayesian filter that increases/decreases rule scores
Last modified: 2002-12-21 12:58:18 UTC
The Bayesian filter is based on words, but what about the scores? If something turns up as a false negative, one or more of the scores should be increased. If it's a false positive, one/more should be decreased. I know you have something of this capacity on Razor, but how do you bring this technology home to fix something right away? Obviously, a big batch of files marked with real spam and real e-mail would be inputted, and an improved config file with better scores would be outputted. Given a threshold (of say 5.0), it would cycle through all of the messages, increasing or decreasing the rules (possibly at an equal percentage) on false +/-s so that it's just above/below the threshold. After a few cycles, if there is a problem e-mail that is simply too far extreme in one direction or the other, it would recommend a blacklist/whitelist or a new rule. This could be on a personal config level or server level. I had a posting about this idea in the spamassassin-devel list a few months back, but I can't find it. (It had a lot more detail, but I can scan my Linux partition for it later on.)
I'm just going to close this bug as 'worksforme' since the bayesian system in 2.50 allows for learning to fix fp/fn problems. You don't want the scores to be changing, you want the probability of spam to be changing.