Bug 4773 - Minor suggestion for spam rule regarding Pharmacies
Summary: Minor suggestion for spam rule regarding Pharmacies
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.0.4
Hardware: All All
: P5 enhancement
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-30 23:22 UTC by Tim Alberts
Modified: 2006-02-09 16:32 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Alberts 2006-01-30 23:22:52 UTC
First, thank your for an OUTSTANDING program.
Second, I saw on the website if anyone has a suggestion for new rules they can
submit here (sorry if this is incorrect).

I run an email server for a 30 account domain and I see Pharmacy and
Pharmaceutical spam with the words jumbled a couple hundred times a day.  I have
written a couple rules that find most of these.  I wish to suggest them to
Spamassassin.

Note that these rules only look for common ways of miss-spelling the words
Pharmacy and Pharmaceutical, the correctly spelled words will not be marked as spam.

CONTAINS_JUMBLED_PHARAMACY
\bP([\s\w]?)h([\s\w]?)a([\s\w]?)r([\s\w]?)a([\s\w]?)m([\s\w]?)a([\s\w]?)c([\s\w]?)y\b

CONTAINS_JUMBLED_PHRMACEUTICAL
\bPhrm([\s\w]?)a([\s\w]?)c([\s\w]?)e([\s\w]?)u([\s\w]?)t([\s\w]?)i([\s\w]?)c([\s\w]?)a([\s\w]?)l\b

CONTAINS_JUMBLED_PHARAMACEUTICAL
\bP([\s\w]?)h([\s\w]?)a([\s\w]?)r([\s\w]?)a([\s\w]?)m([\s\w]?)a([\s\w]?)c([\s\w]?)e([\s\w]?)u([\s\w]?)t([\s\w]?)i([\s\w]?)c([\s\w]?)a([\s\w]?)l\b


The following is the set of examples I've used to test against.  The above rules
catch most of these.

Pharmacy
P armacy
P harmacy
Phad ramacy
Phae ramacy
Pharrmacy
pharamacy
Pharamacy
P haramacy
P harcamacy
Pharamga cy
Phara maecy
Pharam acy
Pharamac y
Pharama cy
Ph armamacy
Ph aramacy
Pha ramacy
Phar amacy
Pharfama cy
Pghara macy
Ptharamacy

Phamaceutical
Phharmyaceutical
pfhr maceutical
Pharaamaceutical
Pharamaceuetical
phrmaceuteic al
Phrmaceutica ul
Phrmace uticjal
Phrmaceuticma l
Phrmaceuticm al
Phrmaceuti cyal
Phrmacteutic al
Phrmac teutical
Phrm adceutical
Phrmeac eutical
Phrmlaceut ical
Phadramaceutical
Pharmacceutoical

I understand that if these rules (or something similar) were to get published to
the Spamassassin distribution, the spammers would just use these rules to come
up with new ways of miss-spelling.  Until that time, these rules do help
considerably on my domain at least.
Comment 1 Theo Van Dinter 2006-01-30 23:37:01 UTC
Hi,

Thank you for your suggestion.  You may want to take a look at the FUZZY_* rules in 3.1 which utilize the 
ReplaceTags plugin to do this kind of thing on a generic level.  There's a FUZZY_PHARMACY rule already, 
but I don't know if we tried pharmaceutical.
Comment 2 Tim Alberts 2006-01-31 20:49:36 UTC
(In reply to comment #1)
> Hi,
> 
> Thank you for your suggestion.  You may want to take a look at the FUZZY_*
rules in 3.1 which utilize the 
> ReplaceTags plugin to do this kind of thing on a generic level.  There's a
FUZZY_PHARMACY rule already, 
> but I don't know if we tried pharmaceutical.

I'm sorry, I was running off the SA ver.3.04 rules.  I took a look at the new
ReplaceTags plugin and it looks like an outstanding addition.  Unfortunately, I
don't have any time soon when I can test these new capabilities.  I will have to
get upgraded to the new version ASAP and see how things go.

As you mentioned, I don't see any rules for 'Pharmaceutical' so it probably
could be added to the FUZZY_* rules.  I think something like:

body FUZZY_PHARMACEUTICAL
/<inter W2><post P2>(?!pharmaceutical)<P><H><A><R><M><A><C><E><U><T><I><C><A><L>/i
describe FUZZY_PHARMACEUTICAL	Attempt to obfuscate words in spam
replace_rules FUZZY_PHARMACEUTICAL

added to the 25_replace.cf file would do it.

I admit, I don't completely understand how the ReplaceTags plugin works, but it
looks like it is still trying to find all the letters of the word in the correct
order.  The rules I suggest actually look for common miss-spellings of the
original words.

Pharmacy -> Pharamacy
Pharmaceutical -> Phrmaceutical, Pharamaceutical

So in conclusion, I will do my best to upgrade to the current version of SA and
evaluate how the new rules catch these words.  If I find that the current
version of SA does not do a good job of finding these, I will re-post to this
report (or start a new report).
Comment 3 Theo Van Dinter 2006-02-10 01:32:43 UTC
ok, I put a version into the sandbox for testing.  it works well for my corpus:

  0.748   0.8599   0.0000    1.000   0.77    0.01  TVD_FUZZY_PHARMACEUTICAL

thanks for the suggestion! :)