Bug 6171 - Drugs rules don't detect runtogether words
Summary: Drugs rules don't detect runtogether words
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P4 minor
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-04 18:56 UTC by Cedric Knight
Modified: 2009-08-04 19:12 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
Anonymised sample from 090730 showing full body not hitting drug rules message/rfc822 None Cedric Knight [HasCLA]
Provisional patch to detect runtogether erectile drug names; fix rule description patch None Cedric Knight [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Cedric Knight 2009-08-04 18:56:53 UTC
Created attachment 4502 [details]
Anonymised sample from 090730 showing full body not hitting drug rules

Rules in 20_drugs.cf try to detect straightforward names of particular drugs bounded with \b ; and also obfuscated names of drugs bounded with \b.  An obfuscation seen recently is "CialisSuper" and "ViagraAs low as $1.85", which doesn't hit either rule because there is no \b boundary, and scores nothing on content.  Care is particularly needed with 'cialis', which occurs within words such as 'specialism'.
Comment 1 Cedric Knight 2009-08-04 19:12:22 UTC
Created attachment 4503 [details]
Provisional patch to detect runtogether erectile drug names; fix rule description

The patch detects changes of case after 'levitra' and 'cialis', so 'Ci@lisAs', for example, scores as an obfuscated drug name.  For 'viagra', a different approach is used, so that 'viagralike', for example, even though in consistent case, scores as obfuscated.  It also simplifies the definition of a word boundary as (\b|_).  This patch is more of an example, and maybe similar changes should happen to other drug word endings; a similar check could happen at the first boundary to stop, e.g. 'BuyCIALIS!'.

This patch also provides a more accurate description of the SUBJECT_FUZZY_MEDS which the sample *did* hit, although the word in the title was actually not obfuscated.