Bug 744 - rule broken: BREAKTHROUGH
Summary: rule broken: BREAKTHROUGH
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P4 trivial
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-08-26 10:41 UTC by Justin Mason
Modified: 2002-09-20 20:37 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2002-08-26 10:41:30 UTC
Found to have very low frequencies in pre-2.40 mass-check, so commented
and moved to '70_broken_rules.cf'.  If anyone wants to revive it, please
use this bug as a tracker.

Freqs:

  0.000    0.000    0.000    0.00    1.00  BREAKTHROUGH
Comment 1 Daniel Quinlan 2002-09-21 04:27:39 UTC
A rule worth investigating.  BFD.

Initial test run:

(looks like someone overwrote an old rule by accident at some point, if you
haven't figured out the basic process for these bugs, I extracted all past
versions of the rules, fixed them up a bit, added a few that seemed obvious,
and ran a mass-check)

body Q_BREAKTHROUGH1            /(?:revolutionary|medical) breakthrough/i
score Q_BREAKTHROUGH1 0.01
body Q_BREAKTHROUGH2            /\bbreakthrough price/i
score Q_BREAKTHROUGH2 0.01
body Q_BREAKTHROUGH3            /\bbreakthrough price\b/i
score Q_BREAKTHROUGH3 0.01
body Q_BREAKTHROUGH4            /breakthrough price/i
score Q_BREAKTHROUGH4 0.01

results:

  0.158    0.458    0.013    0.97    0.54    0.01  Q_BREAKTHROUGH1
  0.000    0.000    0.000    0.00    0.00    0.01  Q_BREAKTHROUGH4
  0.000    0.000    0.000    0.00    0.00    0.01  Q_BREAKTHROUGH3
  0.000    0.000    0.000    0.00    0.00    0.01  Q_BREAKTHROUGH2

Now testing an expanded version of the first rule.  I did a quick search
of all words preceeding and following "breakthrough", compared frequency of
spam instances vs. nonspam and am testing the result.
Comment 2 Daniel Quinlan 2002-09-21 04:37:00 UTC
body T_SOME_BREAKTHROUGH        /(?:science|medical|major|scientific|fundamenta\
l|technology|revolutionary)\s+breakthrough/i
score T_SOME_BREAKTHROUGH       0.5

Results:

OVERALL%   SPAM% NONSPAM%     S/O    RANK   SCORE  NAME
  11410     3715     7695    0.33    0.00    0.00  (all messages)
100.000   32.559   67.441    0.33    0.00    0.00  (all messages as %)
  0.482    1.454    0.013    0.99    0.61    0.50  T_SOME_BREAKTHROUGH

Checking into 70_cvs_rules_under_test_this_name_is_too_long.cf for further
testing.