Bug 6171

Summary: Drugs rules don't detect runtogether words
Product: Spamassassin Reporter: Cedric Knight <cedric>
Component: RulesAssignee: SpamAssassin Developer Mailing List <dev>
Status: NEW ---    
Severity: minor    
Priority: P4    
Version: SVN Trunk (Latest Devel Version)   
Target Milestone: Undefined   
Hardware: All   
OS: All   
Whiteboard:
Attachments: Anonymised sample from 090730 showing full body not hitting drug rules
Provisional patch to detect runtogether erectile drug names; fix rule description

Description Cedric Knight 2009-08-04 18:56:53 UTC
Created attachment 4502 [details]
Anonymised sample from 090730 showing full body not hitting drug rules

Rules in 20_drugs.cf try to detect straightforward names of particular drugs bounded with \b ; and also obfuscated names of drugs bounded with \b.  An obfuscation seen recently is "CialisSuper" and "ViagraAs low as $1.85", which doesn't hit either rule because there is no \b boundary, and scores nothing on content.  Care is particularly needed with 'cialis', which occurs within words such as 'specialism'.
Comment 1 Cedric Knight 2009-08-04 19:12:22 UTC
Created attachment 4503 [details]
Provisional patch to detect runtogether erectile drug names; fix rule description

The patch detects changes of case after 'levitra' and 'cialis', so 'Ci@lisAs', for example, scores as an obfuscated drug name.  For 'viagra', a different approach is used, so that 'viagralike', for example, even though in consistent case, scores as obfuscated.  It also simplifies the definition of a word boundary as (\b|_).  This patch is more of an example, and maybe similar changes should happen to other drug word endings; a similar check could happen at the first boundary to stop, e.g. 'BuyCIALIS!'.

This patch also provides a more accurate description of the SUBJECT_FUZZY_MEDS which the sample *did* hit, although the word in the title was actually not obfuscated.