|
SA Bugzilla – Full Text Bug Listing |
Summary: | SUBJECT_FUZZY_MEDS triggers on un-obfuscated meds and meds in a word | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Michael Bietenholz <mfb6> |
Component: | Rules | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 3.1.8 | ||
Target Milestone: | 3.3.0 | ||
Hardware: | PC | ||
OS: | Linux | ||
Whiteboard: |
Description
Michael Bietenholz
2007-03-14 11:53:24 UTC
will try to fix for 3.3.0 looks like it will fp on anything with meds in the subject line, inside a word, etc the following (small snipet) is enough to trigger this: (save to a file, yes, just these lines is enough) ------------begin--- Subject: Someone: Review Meds MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1130195460-1249928373=:51768" --0-1130195460-1249928373=:51768 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable -<<<EOF the rule matches also on czech/slovak words "medzi" (inter), "obmedzit" (to limit). Yes I'd be glad if we'd have way to cut FPs down. btw if we can get some samples of what it's _supposed_ to hit, that would help too. (this is a new approach to rule regression testing I'm working on.) The problem is, that the rule as-is will hit on any sub-string "meds". No word boundaries, no exclusion of the NON-obfuscated meds. 25_replace.cf: header SUBJECT_FUZZY_MEDS Subject =~ /<M><E><D><S>/i Given comment 3, the proposed limiting in comment 0 seems entire sensible. No plain non-obfuscated "meds". No half-assed obfuscated one within a longer word. Maybe using (\b|_) rather than \b, to catch that pathetic this-is-a-word-char non-real-word char that underscore is. (In reply to comment #5) > Given comment 3, the proposed limiting in comment 0 seems entire sensible. No > plain non-obfuscated "meds". No half-assed obfuscated one within a longer word. > > Maybe using (\b|_) rather than \b, to catch that pathetic this-is-a-word-char > non-real-word char that underscore is. +1 to both. Performance question: which is more efficient? (?:\b|_)x(?:\b|_) \b_*x_*\b btw, check the rescoring bug; most of the FUZZY ruleset got zeroed scores. (In reply to comment #6) > Performance question: which is more efficient? They are not equivalent. "a_x" =~ /(?:\b|_)x(?:\b|_)/ && "a_x" !~ /\b_*x_*\b/ if we want to change this for 3.3.0, it needs to be in SVN by this Thursday; see bug 6155. svn commit -m 'bug 5380: fix SUBJECT_FUZZY_MEDS FP on unobfuscated "meds"' Sending rules/25_replace.cf Transmitting file data . Committed revision 809780. |