SA Bugzilla – Bug 3250
MARKETING_PARTNERS false positive
Last modified: 2005-04-09 17:38:21 UTC
The regex for the MARKETING_PARTNERS rule is: /\b(?:marketing|network) partner|\bpartner (?:web)?site/i This matched a confirmation message for some plane tickets I just bought and together with the HTML tests pushed it over the limit. It's fairly common for airlines to refer to partner web sites (for hotels, car rental, etc) and this is liable to match a lot of such confirmations. This could be disastrous for someone who failed to note down the reference number for a ticketless booking. Either the regex should be changed to require a longer phrase or the default weights for this rule should be reduced from the current values (up to 3.5).
The weight for this rule is going to only be about 0.75 to 1.9. Seems reasonable. If you attach an example message, maybe we could attempt to fix the false match. Otherwise, I suggest we close this as WONTFIX.
not even close to being a major problem
Created attachment 2180 [details] Confirmation message from Easyjet that matches this rule I have replaced some of the personal information in the mail with the text "[deleted]" and hope that this doesn't affect the result.
moving accuracy and some bugs to 3.1.0 milestone
wow, I was considering whitelist easyjet -- but they don't even have reverse DNS in that message!
ok, here's a potential replacement with a negative lookahead to block the Easyjet FP: body T_MARKETING_PARTNERS /\b(?:marketing|network) partner|\bpartner (?:web)?site\b(?! for more information)/i NEEDSMC
# [automatically generated by automc: start] # DONEMC 6: completed request from comment 6 0.145 0.1810 0.0029 0.984 0.59 0.01 T_MC_MARKETING_PARTNERS_b3250_c6 above freqs using data from "/home/automc/corpus/html/DETAILS.new" as of Sun Mar 13 15:50:05 2005: T_MC_MARKETING_PARTNERS_b3250_c6 = T_MARKETING_PARTNERS from bug 3250 comment 6 full freqs: http://bugzilla.spamassassin.org/ruleqa?rule=T_MC_MARKETING_PARTNERS_b3250_c6&date=20050313 # ham results used: ham-cthielen.log ham-daf.log ham-quinlan.log ham-rODbegbie.log ham-theo.log # spam results used: spam-cthielen.log spam-daf.log spam-quinlan.log spam-rODbegbie.log spam-theo.log 346646 276788 69858 0.798 0.00 0.00 (all messages) 100.000 79.8475 20.1525 0.798 0.00 0.00 (all messages as %) # [automatically generated by automc: end]
hmm, that's not too helpful: 0.235 0.2926 0.0086 0.971 0.63 2.02 MARKETING_PARTNERS 0.145 0.1810 0.0029 0.984 0.59 0.01 T_MC_MARKETING_PARTNERS_b3250_c6
I checked in some test rules.
splitting the rule has no effect on efficacy, closing as WORKSFORME OVERALL% SPAM% HAM% S/O RANK SCORE NAME 279798 212073 67725 0.758 0.00 0.00 (all messages) 100.000 75.7950 24.2050 0.758 0.00 0.00 (all messages as %) 0.201 0.2626 0.0059 0.978 0.58 2.02 MARKETING_PARTNERS 0.128 0.1674 0.0030 0.983 0.54 0.01 T_MARKETING_PARTNERS 0.073 0.0953 0.0030 0.970 0.50 0.01 T_PARTNER_SITE