Bug 3499 - MPART_ALT_DIFF should deal with # of words in text and html parts
Summary: MPART_ALT_DIFF should deal with # of words in text and html parts
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (Eval Tests) (show other bugs)
Version: unspecified
Hardware: Other other
: P5 normal
Target Milestone: 3.1.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-06-10 14:16 UTC by Theo Van Dinter
Modified: 2004-11-14 02:14 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
sample spam text/plain None Theo Van Dinter [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Theo Van Dinter 2004-06-10 14:16:44 UTC
If the HTML part of the message has a small number of words, and the text part a large number, it's not 
difficult to get the current difference value down below the threshold.  For instance, one spam had 395 
text words and 52 html words, resulting in:

debug: madiff: left: 35, orig: 52, max-difference: 67.31%

It's expected that there will be some difference in number between text and html, but if it's a large 
difference, that can be good enough without seeing that the words themselves are different.
Comment 1 Theo Van Dinter 2004-06-10 14:18:37 UTC
Created attachment 2022 [details]
sample spam
Comment 2 Daniel Quinlan 2004-08-27 17:00:23 UTC
moving accuracy and some bugs to 3.1.0 milestone
Comment 3 Daniel Quinlan 2004-08-27 17:19:50 UTC
more accuracy and performance bugs going to 3.1.0 milestone
Comment 4 Theo Van Dinter 2004-11-14 11:14:48 UTC
it turns out this doesn't really work terrifically, but one incarnation did catch another ~0.2% of spam w/
out any extra FPs.

committed some rules for testing, r65614