SA Bugzilla – Bug 3499
MPART_ALT_DIFF should deal with # of words in text and html parts
Last modified: 2004-11-14 02:14:48 UTC
If the HTML part of the message has a small number of words, and the text part a large number, it's not difficult to get the current difference value down below the threshold. For instance, one spam had 395 text words and 52 html words, resulting in: debug: madiff: left: 35, orig: 52, max-difference: 67.31% It's expected that there will be some difference in number between text and html, but if it's a large difference, that can be good enough without seeing that the words themselves are different.
Created attachment 2022 [details] sample spam
moving accuracy and some bugs to 3.1.0 milestone
more accuracy and performance bugs going to 3.1.0 milestone
it turns out this doesn't really work terrifically, but one incarnation did catch another ~0.2% of spam w/ out any extra FPs. committed some rules for testing, r65614