Bug 2931 - HTML font matching
Summary: HTML font matching
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: unspecified
Hardware: Other other
: P5 normal
Target Milestone: 3.1.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-01-15 11:04 UTC by Robert J. Accettura
Modified: 2005-02-06 15:10 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Robert J. Accettura 2004-01-15 11:04:07 UTC
A new tactic used by spammers, is to use HTML, and embed spam into a normal
article.  Something like:

<font>he short drive <B>BUY</B>begins a trek that could take the craft to a
variety of sites of scientifi<B>VIAGRA</B>c interest during the next three
months, including shallow depressions and nearby hills that it observed in
earlier photos.The successful rolloff by Spirit, which came almost two weeks
after its risky landing in Gusev Crater <B>TODAY</B>near the Martian equator,
left mission controllers at NASA's Jet Propulsion Laboratory ecstatic.</font>

The method is to help trip up the bayesian filter, and prevent detection.

My proposal is this:

Extract words according to their font description:

Hence, in the above testcase, all the bold words (<B>) would be put together:
BUY VIAGRA TODAY

It would need to be somwhat advanced to truly perform this task:
be aware of CSS, and know that for example:
#fff = #ffffff = rgb(255,255,255) = rgb(100%,100%,100%)

But this method, could prove successful in helping to eliminate this spamming
tactic.  By ordering the text, based on font description, it would be no longer
be vulnerable to learning bogus data.
Comment 1 Daniel Quinlan 2004-08-27 17:18:03 UTC
more accuracy and performance bugs going to 3.1.0 milestone
Comment 2 Justin Mason 2005-02-07 00:10:33 UTC
I don't think we need to do this; I haven't seen mails like this get past SA at
all successfully.