Bug 3863 - enhance SUBJECT_DIET with LOSE_PCT
Summary: enhance SUBJECT_DIET with LOSE_PCT
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.0.0
Hardware: Other other
: P5 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords: triage
Depends on:
Blocks:
 
Reported: 2004-10-03 16:51 UTC by Bob Menschel
Modified: 2005-04-28 12:34 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
Submitted rule text/plain None Bob Menschel [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Bob Menschel 2004-10-03 16:51:32 UTC
Attached is SARE rule I developed which strongly overlaps SUBJECT_DIET. The 
main difference is that SUBJECT_DIET hits four ham here, while LOSE_PCT does 
not. It may be beneficial to fold LOSE_PCT into the SUBJECT_DIET rule.
Comment 1 Bob Menschel 2004-10-03 16:52:18 UTC
Created attachment 2417 [details]
Submitted rule
Comment 2 Bob Menschel 2005-04-06 22:04:42 UTC
header SUBJECT_DIET       Subject =~ /\bLose .*(?:pounds|lbs|weight)/i

header SARE_SUB_LOSE_PCT  Subject =~
/lose.{1,20}(?:\d+\%.{1,25}weight|weight.{1,40}\d+\%)/i

header N1_SUBJECT_DIET  Subject =~
/lose.{1,20}(?:\d+\%.{1,25}(?:pounds|lbs|weight)|(?:pounds|lbs|weight).{1,40}\d+\%)/i

header N2_SUBJECT_DIET  Subject =~
/lose.{1,20}(?:\d+.{1,25}(?:pounds|lbs|weight)|(?:pounds|lbs|weight).{1,40}\d+)/i

header N3_SUBJECT_DIET  Subject =~
/lose.{1,20}(?:\%.{1,25}(?:pounds|lbs|weight)|(?:pounds|lbs|weight).{1,40}\%)/i

NEEDSMC


Comment 3 Auto-Mass-Checker 2005-04-28 16:28:23 UTC
# [automatically generated by automc: start]
# DONEMC 2: completed request from comment 2

  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_SUBJECT_DIET_b3863_c2
  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_SARE_SUB_LOSE_PCT_b3863_c2
  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_N1_SUBJECT_DIET_b3863_c2
  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_N2_SUBJECT_DIET_b3863_c2
  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_N3_SUBJECT_DIET_b3863_c2

above freqs using data from "/home/automc/corpus/html/DETAILS.new" as of Thu Apr 28 16:28:20 2005:

T_MC_SUBJECT_DIET_b3863_c2 = SUBJECT_DIET from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_SUBJECT_DIET_b3863_c2&date=20050428

T_MC_SARE_SUB_LOSE_PCT_b3863_c2 = SARE_SUB_LOSE_PCT from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_SARE_SUB_LOSE_PCT_b3863_c2&date=20050428

T_MC_N1_SUBJECT_DIET_b3863_c2 = N1_SUBJECT_DIET from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N1_SUBJECT_DIET_b3863_c2&date=20050428

T_MC_N2_SUBJECT_DIET_b3863_c2 = N2_SUBJECT_DIET from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N2_SUBJECT_DIET_b3863_c2&date=20050428

T_MC_N3_SUBJECT_DIET_b3863_c2 = N3_SUBJECT_DIET from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N3_SUBJECT_DIET_b3863_c2&date=20050428
# ham results used: ham-bzoetekouw.log ham-cthielen.log ham-parkerm.log ham-quinlan.log ham-rODbegbie.log ham-theo.log
# spam results used: spam-bzoetekouw.log spam-cthielen.log spam-parkerm.log spam-quinlan.log spam-rODbegbie.log spam-theo.log
 456579   350293   106286    0.767   0.00    0.00  (all messages)
100.000  76.7212  23.2788    0.767   0.00    0.00  (all messages as %)

bug 3863 cmt 1: ignored, lint failed

# [automatically generated by automc: end]
Comment 4 Auto-Mass-Checker 2005-04-28 16:28:27 UTC
# [automatically generated by automc: start]
# DONEMC 2: completed request from comment 2

  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_SUBJECT_DIET_b3863_c2
  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_SARE_SUB_LOSE_PCT_b3863_c2
  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_N1_SUBJECT_DIET_b3863_c2
  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_N2_SUBJECT_DIET_b3863_c2
  0.000   0.0000   0.0000    0.500   0.46    0.01  T_MC_N3_SUBJECT_DIET_b3863_c2

above freqs using data from "/home/automc/corpus/html/DETAILS.new" as of Thu Apr 28 16:28:24 2005:

T_MC_SUBJECT_DIET_b3863_c2 = SUBJECT_DIET from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_SUBJECT_DIET_b3863_c2&date=20050428

T_MC_SARE_SUB_LOSE_PCT_b3863_c2 = SARE_SUB_LOSE_PCT from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_SARE_SUB_LOSE_PCT_b3863_c2&date=20050428

T_MC_N1_SUBJECT_DIET_b3863_c2 = N1_SUBJECT_DIET from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N1_SUBJECT_DIET_b3863_c2&date=20050428

T_MC_N2_SUBJECT_DIET_b3863_c2 = N2_SUBJECT_DIET from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N2_SUBJECT_DIET_b3863_c2&date=20050428

T_MC_N3_SUBJECT_DIET_b3863_c2 = N3_SUBJECT_DIET from bug 3863 comment 2
full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N3_SUBJECT_DIET_b3863_c2&date=20050428
# ham results used: ham-bzoetekouw.log ham-cthielen.log ham-parkerm.log ham-quinlan.log ham-rODbegbie.log ham-theo.log
# spam results used: spam-bzoetekouw.log spam-cthielen.log spam-parkerm.log spam-quinlan.log spam-rODbegbie.log spam-theo.log
 456579   350293   106286    0.767   0.00    0.00  (all messages)
100.000  76.7212  23.2788    0.767   0.00    0.00  (all messages as %)

bug 3863 cmt 1: ignored, lint failed

# [automatically generated by automc: end]
Comment 5 Bob Menschel 2005-04-28 18:36:59 UTC
Results from my own personal mass-check: 


OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 300736   127038   173698    0.422   0.00    0.00  (all messages)
100.000  42.2424  57.7576    0.422   0.00    0.00  (all messages as %)
  0.035   0.0771   0.0046    0.944   0.51    1.35  SUBJECT_DIET

Counts: 
    106       98        8    0.944   0.51   1.35  SUBJECT_DIET

SARE_SUB_LOSE_PCT does hit zero emails.  

Spammers have moved on? 
Comment 6 Bob Menschel 2005-04-28 20:34:49 UTC
Closing since this suggestion no longer has significant value.