SA Bugzilla – Bug 744
rule broken: BREAKTHROUGH
Last modified: 2002-09-20 20:37:00 UTC
Found to have very low frequencies in pre-2.40 mass-check, so commented and moved to '70_broken_rules.cf'. If anyone wants to revive it, please use this bug as a tracker. Freqs: 0.000 0.000 0.000 0.00 1.00 BREAKTHROUGH
A rule worth investigating. BFD. Initial test run: (looks like someone overwrote an old rule by accident at some point, if you haven't figured out the basic process for these bugs, I extracted all past versions of the rules, fixed them up a bit, added a few that seemed obvious, and ran a mass-check) body Q_BREAKTHROUGH1 /(?:revolutionary|medical) breakthrough/i score Q_BREAKTHROUGH1 0.01 body Q_BREAKTHROUGH2 /\bbreakthrough price/i score Q_BREAKTHROUGH2 0.01 body Q_BREAKTHROUGH3 /\bbreakthrough price\b/i score Q_BREAKTHROUGH3 0.01 body Q_BREAKTHROUGH4 /breakthrough price/i score Q_BREAKTHROUGH4 0.01 results: 0.158 0.458 0.013 0.97 0.54 0.01 Q_BREAKTHROUGH1 0.000 0.000 0.000 0.00 0.00 0.01 Q_BREAKTHROUGH4 0.000 0.000 0.000 0.00 0.00 0.01 Q_BREAKTHROUGH3 0.000 0.000 0.000 0.00 0.00 0.01 Q_BREAKTHROUGH2 Now testing an expanded version of the first rule. I did a quick search of all words preceeding and following "breakthrough", compared frequency of spam instances vs. nonspam and am testing the result.
body T_SOME_BREAKTHROUGH /(?:science|medical|major|scientific|fundamenta\ l|technology|revolutionary)\s+breakthrough/i score T_SOME_BREAKTHROUGH 0.5 Results: OVERALL% SPAM% NONSPAM% S/O RANK SCORE NAME 11410 3715 7695 0.33 0.00 0.00 (all messages) 100.000 32.559 67.441 0.33 0.00 0.00 (all messages as %) 0.482 1.454 0.013 0.99 0.61 0.50 T_SOME_BREAKTHROUGH Checking into 70_cvs_rules_under_test_this_name_is_too_long.cf for further testing.