SA Bugzilla – Bug 5257
RFE: adaptive autolearning thresholds
Last modified: 2019-09-25 03:33:41 UTC
I think we need to reduce the frequency of autolearning mails as ham; it doesn't seem to cause major trouble for me at least, but anecdotally it's not good. worth investigating in the 3.2.0 mass-check/rescoring, anyway.
this takes place after the perceptron run
OK, I've set the autolearn ham threshold to -1.0, which collects 1.21% of ham. autolearn spam threshold is then 12.0, for 81% of spam.
fixed...
After reading the comments in bug 5497 and its talk about the complaints on the user list about Bayes performance after this change, and reading over the comments here that show that this change was made based on the supposition that it was needed without clear statistics, I propose that this be reverted in time for the 3.2.1 release.
ok; +1 I still think we're probably allowing autolearning too much spam as ham, but if the fact that too little ham is being learned is having bad effects in itself that are worse than that, we can revert to the 3.1.x behaviour.
+1
Committed to branch 3.2 revision 545281. Committed to trunk revision 545287.
I'm reopening this because if there was a reason to open this in the first place, then that reason still exists now that we reverted what was supposed to fix it. I think that we should consider how to have an adaptive autolearning threshold based on sampling a configurable percentage of the best configurable percentage of the ham and spam. To clarify: Identify the threshold score that gives us the lowest scoring X% of the ham, then autolearn Y% of those hams. X is set at a value which is unlikely to result in spam being learned as ham. Y is configurable in case the volume of mail is too high to learn everything that is below the threshold, but allows us to learn a representative sample of ham, not just the very lowest scoring. That protects against an effect such as all mail of a certain type triggering a 1.0 score rule and then Bayes incorrectly learning that mail of that type is always spam.
3.2.3 was released without these fixed, moving to 3.2.3
er, 3.2.4. ;)
no movement -> pushing out to 3.3.0, optimistically
pushing out further