SA Bugzilla – Bug 3863
enhance SUBJECT_DIET with LOSE_PCT
Last modified: 2005-04-28 12:34:49 UTC
Attached is SARE rule I developed which strongly overlaps SUBJECT_DIET. The main difference is that SUBJECT_DIET hits four ham here, while LOSE_PCT does not. It may be beneficial to fold LOSE_PCT into the SUBJECT_DIET rule.
Created attachment 2417 [details] Submitted rule
header SUBJECT_DIET Subject =~ /\bLose .*(?:pounds|lbs|weight)/i header SARE_SUB_LOSE_PCT Subject =~ /lose.{1,20}(?:\d+\%.{1,25}weight|weight.{1,40}\d+\%)/i header N1_SUBJECT_DIET Subject =~ /lose.{1,20}(?:\d+\%.{1,25}(?:pounds|lbs|weight)|(?:pounds|lbs|weight).{1,40}\d+\%)/i header N2_SUBJECT_DIET Subject =~ /lose.{1,20}(?:\d+.{1,25}(?:pounds|lbs|weight)|(?:pounds|lbs|weight).{1,40}\d+)/i header N3_SUBJECT_DIET Subject =~ /lose.{1,20}(?:\%.{1,25}(?:pounds|lbs|weight)|(?:pounds|lbs|weight).{1,40}\%)/i NEEDSMC
# [automatically generated by automc: start] # DONEMC 2: completed request from comment 2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_SUBJECT_DIET_b3863_c2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_SARE_SUB_LOSE_PCT_b3863_c2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_N1_SUBJECT_DIET_b3863_c2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_N2_SUBJECT_DIET_b3863_c2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_N3_SUBJECT_DIET_b3863_c2 above freqs using data from "/home/automc/corpus/html/DETAILS.new" as of Thu Apr 28 16:28:20 2005: T_MC_SUBJECT_DIET_b3863_c2 = SUBJECT_DIET from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_SUBJECT_DIET_b3863_c2&date=20050428 T_MC_SARE_SUB_LOSE_PCT_b3863_c2 = SARE_SUB_LOSE_PCT from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_SARE_SUB_LOSE_PCT_b3863_c2&date=20050428 T_MC_N1_SUBJECT_DIET_b3863_c2 = N1_SUBJECT_DIET from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N1_SUBJECT_DIET_b3863_c2&date=20050428 T_MC_N2_SUBJECT_DIET_b3863_c2 = N2_SUBJECT_DIET from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N2_SUBJECT_DIET_b3863_c2&date=20050428 T_MC_N3_SUBJECT_DIET_b3863_c2 = N3_SUBJECT_DIET from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N3_SUBJECT_DIET_b3863_c2&date=20050428 # ham results used: ham-bzoetekouw.log ham-cthielen.log ham-parkerm.log ham-quinlan.log ham-rODbegbie.log ham-theo.log # spam results used: spam-bzoetekouw.log spam-cthielen.log spam-parkerm.log spam-quinlan.log spam-rODbegbie.log spam-theo.log 456579 350293 106286 0.767 0.00 0.00 (all messages) 100.000 76.7212 23.2788 0.767 0.00 0.00 (all messages as %) bug 3863 cmt 1: ignored, lint failed # [automatically generated by automc: end]
# [automatically generated by automc: start] # DONEMC 2: completed request from comment 2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_SUBJECT_DIET_b3863_c2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_SARE_SUB_LOSE_PCT_b3863_c2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_N1_SUBJECT_DIET_b3863_c2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_N2_SUBJECT_DIET_b3863_c2 0.000 0.0000 0.0000 0.500 0.46 0.01 T_MC_N3_SUBJECT_DIET_b3863_c2 above freqs using data from "/home/automc/corpus/html/DETAILS.new" as of Thu Apr 28 16:28:24 2005: T_MC_SUBJECT_DIET_b3863_c2 = SUBJECT_DIET from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_SUBJECT_DIET_b3863_c2&date=20050428 T_MC_SARE_SUB_LOSE_PCT_b3863_c2 = SARE_SUB_LOSE_PCT from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_SARE_SUB_LOSE_PCT_b3863_c2&date=20050428 T_MC_N1_SUBJECT_DIET_b3863_c2 = N1_SUBJECT_DIET from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N1_SUBJECT_DIET_b3863_c2&date=20050428 T_MC_N2_SUBJECT_DIET_b3863_c2 = N2_SUBJECT_DIET from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N2_SUBJECT_DIET_b3863_c2&date=20050428 T_MC_N3_SUBJECT_DIET_b3863_c2 = N3_SUBJECT_DIET from bug 3863 comment 2 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_N3_SUBJECT_DIET_b3863_c2&date=20050428 # ham results used: ham-bzoetekouw.log ham-cthielen.log ham-parkerm.log ham-quinlan.log ham-rODbegbie.log ham-theo.log # spam results used: spam-bzoetekouw.log spam-cthielen.log spam-parkerm.log spam-quinlan.log spam-rODbegbie.log spam-theo.log 456579 350293 106286 0.767 0.00 0.00 (all messages) 100.000 76.7212 23.2788 0.767 0.00 0.00 (all messages as %) bug 3863 cmt 1: ignored, lint failed # [automatically generated by automc: end]
Results from my own personal mass-check: OVERALL% SPAM% HAM% S/O RANK SCORE NAME 300736 127038 173698 0.422 0.00 0.00 (all messages) 100.000 42.2424 57.7576 0.422 0.00 0.00 (all messages as %) 0.035 0.0771 0.0046 0.944 0.51 1.35 SUBJECT_DIET Counts: 106 98 8 0.944 0.51 1.35 SUBJECT_DIET SARE_SUB_LOSE_PCT does hit zero emails. Spammers have moved on?
Closing since this suggestion no longer has significant value.