SA Bugzilla – Bug 2981
inoculation support?
Last modified: 2004-08-27 10:07:28 UTC
BTW, here's an interesting idea we ran into at the Spam Conf 2004. http://lists.netsys.com/pipermail/full-disclosure/2003-November/013840.html http://www.nuclearelephant.com/projects/dspam/draft-spamfilt-inoculation-01.txt Basically, it's quite simple -- a standard MIME wrapper for training spam filters. My issue with this proposal, however, is what happens when you have a trained db with these tokens: SPAMCOUNT HAMCOUNT TOKEN 1 3 foo 1 3 bar Note, both are hammy tokens. If you have 8 friends who have you in their inoculation list, and they all get copies of *1* single spam message containing "bar" as a token, and they all inoculate you, that'll result in: SPAMCOUNT HAMCOUNT TOKEN 1 3 foo 9 3 bar hence -- "bar" becomes a strongly spammy token, even though in reality that was a result of a single spam run. In other words, inoculation does bad things for Bayes training; inoculated tokens, IMO, are likely to be "stronger" in result than personally-trained tokens. This could be avoided by using a hash of the message body somehow as a message identifier, so that once 1 person inoculates you for a given spam, you will learn it once and ignore future inoculations. -- but then the issue there is, what is a reliable message id for spam, given that spammers routinely evade body hashing, fake message-id headers, etc.? comments?
+1 on reassigning this ticket to 3.1 since it is (a) non-trivial and (b) not a feature we have even considered for 3.0.
I don't think this idea is really catching on, closing as LATER.