SA Bugzilla – Bug 2931
HTML font matching
Last modified: 2005-02-06 15:10:33 UTC
A new tactic used by spammers, is to use HTML, and embed spam into a normal article. Something like: <font>he short drive <B>BUY</B>begins a trek that could take the craft to a variety of sites of scientifi<B>VIAGRA</B>c interest during the next three months, including shallow depressions and nearby hills that it observed in earlier photos.The successful rolloff by Spirit, which came almost two weeks after its risky landing in Gusev Crater <B>TODAY</B>near the Martian equator, left mission controllers at NASA's Jet Propulsion Laboratory ecstatic.</font> The method is to help trip up the bayesian filter, and prevent detection. My proposal is this: Extract words according to their font description: Hence, in the above testcase, all the bold words (<B>) would be put together: BUY VIAGRA TODAY It would need to be somwhat advanced to truly perform this task: be aware of CSS, and know that for example: #fff = #ffffff = rgb(255,255,255) = rgb(100%,100%,100%) But this method, could prove successful in helping to eliminate this spamming tactic. By ordering the text, based on font description, it would be no longer be vulnerable to learning bogus data.
more accuracy and performance bugs going to 3.1.0 milestone
I don't think we need to do this; I haven't seen mails like this get past SA at all successfully.