SA Bugzilla – Bug 6247
Replace HABEAS, BSP, SSC with RP Whitelist rules
Last modified: 2021-03-27 01:25:23 UTC
please remove the following rules: - HABEAS_ACCREDITED_COI - HABEAS_ACCREDITED_SOI - HABEAS_CHECKED - RCVD_IN_BSP_TRUSTED - RCVD_IN_BSP_OTHER - RCVD_IN_SSC_TRUSTED_COI ...and replace them with: # Return Path Certified: # http://www.returnpath.net/internetserviceprovider/certification/ # (replaces RCVD_IN_BSP_TRUSTED, RCVD_IN_BSP_OTHER, RCVD_IN_SSC_TRUSTED_COI) header RCVD_IN_RP_CERTIFIED eval:check_rbl_txt('ssc-firsttrusted', 'sa-trusted.bondedsender.org.') describe RCVD_IN_RP_CERTIFIED Sender is in Return Path Certified (trusted relay) tflags RCVD_IN_RP_CERTIFIED net nice # Return Path Safe: # http://www.returnpath.net/internetserviceprovider/certification/ # (replaces HABEAS_ACCREDITED_COI, HABEAS_ACCREDITED_SOI, HABEAS_CHECKED) header RCVD_IN_RP_SAFE eval:check_rbl_txt('ssc-firsttrusted', 'sa-accredit.habeas.com.') describe RCVD_IN_RP_SAFE Sender is in Return Path Safe (trusted relay) tflags RCVD_IN_RP_SAFE net nice # Return Path Reputation Network Blacklist (RNBL): # https://senderscore.org/blacklistlookup/ header RCVD_IN_RP_RNBL eval:check_rbl('rbl', 'bl.score.senderscore.com.') describe RCVD_IN_RP_RNBL Relay in RNBL, https://senderscore.org/blacklistlookup/ tflags RCVD_IN_RP_RNBL net score RCVD_IN_RP_CERTIFIED 0 -6 0 -6 score RCVD_IN_RP_SAFE 0 -4 0 -4 score RCVD_IN_RP_RNBL 0 5 0 5
(apologies for the continued variation in zones; we're planning to merge them all to one domain, but haven't gotten around to it yet.)
Are these score equivalent with the old rules? We went through a significant amount of effort scoring the entire ruleset, influenced by your old rules. http://ruleqa.spamassassin.org/20090930-r808953-n OTOH fairly few hits for all the old rules, so it might not matter much from the GARescorer POV.
RCVD_IN_BSP_TRUSTED was -4.3, we recommend RP_CERTIFIED be -6 HABEAS_ACCREDITED_COI was -8, we recommend RP_SAFE be -4 It's more difficult for senders to get on (and stay on) Certified than Safe, so Certified should have more impact.
What rule is RCVD_IN_RP_RNBL replacing? header RCVD_IN_RP_RNBL eval:check_rbl('rbl', 'bl.score.senderscore.com.')
(In reply to comment #4) > What rule is RCVD_IN_RP_RNBL replacing? That's a new one we're offering, if y'all would like to use it. http://www.returnpath.net/internetserviceprovider/blacklist/
re: RCVD_IN_RP_RNBL According to the link you provided: "All users of the blacklist contribute data to the Return Path Reputation Network. For specific details and access instructions, contact us today." Is this blacklist free for personal use? Free for commercial use? This question might also extrapolate to the other rule changes as well.
(In reply to comment #6) > re: RCVD_IN_RP_RNBL > > According to the link you provided: "All users of the blacklist contribute data > to the Return Path Reputation Network. For specific details and access > instructions, contact us today." > > Is this blacklist free for personal use? Free for commercial use? We'll be implementing query volume limits (similar to Spamhaus) some time next year, but most SpamAssassin users should be fine. > This question might also extrapolate to the other rule changes as well. The Certified and Safe lists are free for anyone to query.
One of the tasks I have been struggling with is developing a policy for rules that implement query volume limits in default installation. I am by no means anti-commercial applications but I will tell you that the issue of implementing rules by default that might have query limits has been discussed. When you say next year, can you be more specific? Also, if a query limit is reached, will your service timeout, return false negatives, false positives, ??
I assume returnpath has the capacity to responsibly serve the entire spamassassin community with a new blacklist? But we will not add a new blacklist with a non-informational score right before the release. For example, PSBL went through months of testing in weekly masscheck before it became enabled for the upcoming 3.3.0 release. Furthermore, none of our blacklists have a high score like 5 points. Blacklist scores are set by the GARescorer. I propose that we add this new blacklist to the weekly masscheck, with score 0.001. We can decide after N weeks of testing what score to give it in sa-update. Also, do you really intend for RNBL to be not lastexternal? That seems to have always been a mistake with other blacklists.
(In reply to comment #8) > One of the tasks I have been struggling with is developing a policy for rules > that implement query volume limits in default installation. > > I am by no means anti-commercial applications but I will tell you that the > issue of implementing rules by default that might have query limits has been > discussed. *nod* I wish we could give a blanket exception to SpamAssassin, but there's no reliable way to differentiate between queries. > When you say next year, can you be more specific? Unfortunately, no -- the work's not scheduled yet. > Also, if a query limit is reached, will your service timeout, return false > negatives, false positives, ?? I can promise we won't return anything false. Beyond that, I'm not sure yet. (In reply to comment #9) > But we will not add a new blacklist with a non-informational score right before > the release. For example, PSBL went through months of testing in weekly > masscheck before it became enabled for the upcoming 3.3.0 release. Totally understandable. > Also, do you really intend for RNBL to be not lastexternal? That seems to have > always been a mistake with other blacklists. Our RNBL model won't blacklist a dynamic IP (for example) simply because it's dynamic; we have to have seen spam from it recently, as well. But if you're concerned, we'll certainly bow to your experience.
Furthermore, I've never been comfortable with such high artificial negative scores like -4 through -8 for any whitelist. I think we should be more conservative and halve such numbers. Could you also explain, the former SOI list, does this really mean you expect us to assign negative scores to lists that are single opt in? Does single opt in mean people who have put their e-mail address into a form, without e-mail confirmation for subscription? Please explain how the old rules map to the new rules.
my vote would be to comment out ALL negative scored rules which require network lookups. make a huge blinking sign in readme/install/etc files and put the recurring & endless debate to an end. those who want to enable may do so.
(In reply to comment #11) > Furthermore, I've never been comfortable with such high artificial negative > scores like -4 through -8 for any whitelist. I think we should be more > conservative and halve such numbers. As long as you do the same to the other whitelists.... > Please explain how the old rules map to the new rules. I'll ask a colleague to chime in. Short version is that we've condensed all those old levels (quite some time ago, actually), and now there are only the two: Certified, and Safe. Both are based on actual measurable results, rather than self-assertion of practices like other whitelists.
(In reply to comment #11) > Furthermore, I've never been comfortable with such high artificial negative > scores like -4 through -8 for any whitelist. I think we should be more > conservative and halve such numbers. > > Could you also explain, the former SOI list, does this really mean you expect > us to assign negative scores to lists that are single opt in? Does single opt > in mean people who have put their e-mail address into a form, without e-mail > confirmation for subscription? > > Please explain how the old rules map to the new rules. Warren, Its pretty obvious that such high negative scores are supposed to override possible SBL/URI listings. imo, if we cannot get rid of all the negative scores settle for something closer to CERTIFIED = 0.0001 (informational) SAFE = -2.5 score RCVD_IN_RP_RNBL 5.0 without thoroughly testing and a clear policy??? -1
(In reply to comment #14) > CERTIFIED = 0.0001 (informational) > SAFE = -2.5 Please don't. The criteria for being on the Certified list is more difficult for senders to achieve than the Safe list (though of course both are more difficult than not reaching any whitelisting criteria at all.) If you're only going to give one of our lists any benefit, it should be Certified. Not sure why you'd do that, though, when other whitelists with much less stringent criteria already receive higher scores in the default configuration.
(In reply to comment #15) > (In reply to comment #14) > > > CERTIFIED = 0.0001 (informational) > > SAFE = -2.5 > > Please don't. The criteria for being on the Certified list is more difficult > for senders to achieve than the Safe list (though of course both are more > difficult than not reaching any whitelisting criteria at all.) > > If you're only going to give one of our lists any benefit, it should be > Certified. Not sure why you'd do that, though, when other whitelists with much > less stringent criteria already receive higher scores in the default > configuration. sorry my mixup. CERTIFIED = -2.5 SAFE = 0.0001 as to other lists.. I'd *assume* they either have very low coverage or their performance hasn't been debated as much as yours. For several reasons, my personal choice is to always comment out all the certifying services but I can't expect everyone to agreee.
JD, your timing is atrocious -- we have _just_ completed a new rescoring run and changing rules now would be a change not taken lightly. :( btw, 'Its pretty obvious that such high negative scores are supposed to override possible SBL/URI listings.' nope. That's certainly not been the plan, or indeed my experience ;) However there's a good chance that the level of FPs that these rules protect against, don't need such high negative scores as they used to. it would be interesting to check the mass-check logs and see what/who
(In reply to comment #13) > (In reply to comment #11) > > Furthermore, I've never been comfortable with such high artificial negative > > scores like -4 through -8 for any whitelist. I think we should be more > > conservative and halve such numbers. > > As long as you do the same to the other whitelists.... > > > Please explain how the old rules map to the new rules. > > I'll ask a colleague to chime in. > > Short version is that we've condensed all those old levels (quite some time > ago, actually), and now there are only the two: Certified, and Safe. Both are > based on actual measurable results, rather than self-assertion of practices > like other whitelists. Hi, First off, the scoring for Sender Score Certified (now called Return Path Certified) were part a legacy of the Bonded Sender scoring, and that of Habeas Safelist were done under the auspices of that defunct company. As well, we have drastically improved checking on both lists, and will be continually adding data streams to our checks. (There are two massive ones that are currently in test, that I can’t disclose (sorry, wish I could), but rest assured you’ve heard of them, and you don’t have access to that data via SA. They will give us yet another huge jump forward. Stay tuned.) As to the timing – Justin, a million pardons, and we completely understand if we must wait until proper testing on the scoring is done to consider these requests. A lot of our development has been pushed around the calendar; it had been my hope to coincide with the new release, but it was not to be. I did want to comment about the scoring being designed to get around SURBL or URBL checks – that simply isn’t the case. Both our whitelists were in existence and scored as such before either list was in existence, TTBOMK. All of our documentation is listed here: http://www.returnpath.net/commercialsender/certification/certification_documents.php Let me summarize our checks Entity checks – Certified & Safe – PACER (lawsuits) – Dun & Bradstreet – ROKSO – SBL, URIBL & other DNSBLs as listed below – Sign-up disclosure – Privacy policy clarity, presence INFRASTRUCTURE – Certified & Safe - Role accounts (abuse@ postmaster@ for all domains in from: reply-to, return path) - SPF - DNS integrity (recursive ‘open’ dns) - Spamtraps (we currently have five spamtrap feeds in play including those from spamcop and project honeypot) - Unsubscribe functionality (currently measured by Lashback) - Bounce processing - Nameserver reputation and snowshoeing - rDNS - DNSBLS: CBL, Lashback, NJABL, Return Path BL, SORBS, Spamcop, Spamhaus Zen, VIRBL PERFORMANCE – Certified - Hotmail complaints - Yahoo complaints - Windows Live Sender Reputation Data (1) - Two anonymous webmail providers’ complaints - Clients kept to average number of IPs according to their assigned volume tier (anti-snowshoeing measure) - Volume profile examination (we expect to see dispersion of email across our receiver metrics match those shown by all senders in the programme), to curtail ‘selective sending’. (1) http://www.google.com/search?client=safari&rls=en&q=windows+live+sender+reputation+data&ie=UTF-8&oe=UTF-8 -- Neil Schwartzman Director, Certification Security & Standards Return Path Inc. 0142002038
Even without GA & rescoring thru masschecks, I suggest we trash: as per Neil: - HABEAS_ACCREDITED_COI - HABEAS_ACCREDITED_SOI - HABEAS_CHECKED - RCVD_IN_BSP_TRUSTED - RCVD_IN_BSP_OTHER - RCVD_IN_SSC_TRUSTED_COI and add: RCVD_IN_RP_* with scores score RCVD_IN_RP_CERTIFIED 0.0 -5.0 0.0 -5.0 score RCVD_IN_RP_SAFE 0.0 -3.0 0.0 -3.0 seems to me like a good compromise between being over generous and close to Neil's request. :-) future masschecks or "human" requirements can change scores via sa-update. comments? votes?
> RCVD_IN_RP_* > > with scores > > score RCVD_IN_RP_CERTIFIED 0.0 -5.0 0.0 -5.0 > score RCVD_IN_RP_SAFE 0.0 -3.0 0.0 -3.0 > Are you specifically excluding RCVD_IN_RP_RNBL for consideration?
(In reply to comment #20) > > RCVD_IN_RP_* > > > > with scores > > > > score RCVD_IN_RP_CERTIFIED 0.0 -5.0 0.0 -5.0 > > score RCVD_IN_RP_SAFE 0.0 -3.0 0.0 -3.0 > > > > Are you specifically excluding RCVD_IN_RP_RNBL for consideration? for 3.3.0: yes. this seems to be very new to most of us and we zero performance data. As Warren put it, a testing period should be mandatory.
Created attachment 4591 [details] Return Path Rules testing configuration file I've added the attached cf file to my configuration so I can get some anecdotal data on the rules for more comment.
> - DNSBLS: CBL, Lashback, NJABL, Return Path BL, SORBS, Spamcop, Spamhaus > Zen, VIRBL Could you please check PSBL too? PSBL has the benefit of being able to show you "Evidence" content of spam coming from a particular IP address.
(In reply to comment #23) > > - DNSBLS: CBL, Lashback, NJABL, Return Path BL, SORBS, Spamcop, Spamhaus > > Zen, VIRBL > > Could you please check PSBL too? PSBL has the benefit of being able to show > you "Evidence" content of spam coming from a particular IP address. As an aside, I think the PSBL project has significant merit and I assist it with a public mirror. I've also been doing anecdotal testing of PSBL and thought it was slated for 3.3.0 release already (bug 6156). What would you like me to check it for?
btw Kevin please make sure those rules are all marked "tflags nopublish" to ensure they don't get into updates until they're tested...(In reply to comment #19) > Even without GA & rescoring thru masschecks, I suggest we trash: > > as per Neil: > > - HABEAS_ACCREDITED_COI > - HABEAS_ACCREDITED_SOI > - HABEAS_CHECKED > - RCVD_IN_BSP_TRUSTED > - RCVD_IN_BSP_OTHER > - RCVD_IN_SSC_TRUSTED_COI > > and add: > > RCVD_IN_RP_* > > with scores > > score RCVD_IN_RP_CERTIFIED 0.0 -5.0 0.0 -5.0 > score RCVD_IN_RP_SAFE 0.0 -3.0 0.0 -3.0 > > seems to me like a good compromise between being over generous and close to > Neil's request. :-) > > future masschecks or "human" requirements can change scores via sa-update. > > comments? votes? I'm vaguely in favour -- assuming the rules overlap close enough with the old rules in the weekly net-rules masscheck, we can safely do that at this stage. btw Kevin until we're sure of that, though, please make sure those rules are all marked "tflags nopublish" to ensure they don't get into updates...
(In reply to comment #23) > > - DNSBLS: CBL, Lashback, NJABL, Return Path BL, SORBS, Spamcop, Spamhaus > > Zen, VIRBL > > Could you please check PSBL too? PSBL has the benefit of being able to show > you "Evidence" content of spam coming from a particular IP address. My bad - we already do check PSBL. BTW: Our DNSBL checks, were arrived at by asking some MAAWG senior technical advisors, some senders, and some dnsbl operators to rate a bunch of blacklists, and score them according to footprint, conservative listing policies, and ease of delisting. IOW, we wanted the most sane lists out there to have intersection with our whitelistings. ironically, the senders were among the most aggressive in terms of their respective stances. Oh! I should have added some information about frequency of checks - entity checks are periodic, infrastructure periodic to hourly, (DNSBL, spamtraps and rDNS are hourly, for example), and performance is done daily. I don't know if I get a vote wrt weighting, but I would like to say that currently Safe is accorded a -8 and I believe it fair that Certified a significantly more stringent list be given its due with the score requested, namely, -6. Thanks for considering this!
Created attachment 4592 [details] rev2 Return Path Rules testing configuration file > btw Kevin please make sure those rules are all marked "tflags nopublish" to > ensure they don't get into updates until they're tested...(In reply to comment > #19) > btw Kevin until we're sure of that, though, please make sure those rules are > all marked "tflags nopublish" to ensure they don't get into updates... The cf file attached includes the nopublish flag. However, I did NOT submit the cf file as a patch for inclusion in SVN but rather for anecdotal research by committers and interested developers. It is not marked as a patch.
header RCVD_IN_RP_RNBL eval:check_rbl('rbl','bl.score.senderscore.com.') I believe we want this to read: header RCVD_IN_RP_RNBL eval:check_rbl('rnbl-lastexternal','bl.score.senderscore.com.') Please confirm?
(In reply to comment #19) > with scores > > score RCVD_IN_RP_CERTIFIED 0.0 -5.0 0.0 -5.0 > score RCVD_IN_RP_SAFE 0.0 -3.0 0.0 -3.0 +1 http://ruleqa.spamassassin.org/20091205-r887515-n/RCVD_IN_DNSWL_MED/detail http://ruleqa.spamassassin.org/20091205-r887515-n/RCVD_IN_DNSWL_LOW/detail I also think we should reduce the DNSWL scores. It seems that policing at DNSWL might be lax as they don't offer any easy and obvious way to report violations. (No, I will not report violations of whitelists, or blacklist FP's, because doing so would artificially effect the measurements that we do.)
> http://ruleqa.spamassassin.org/20091205-r887515-n/RCVD_IN_DNSWL_MED/detail > http://ruleqa.spamassassin.org/20091205-r887515-n/RCVD_IN_DNSWL_LOW/detail > I also think we should reduce the DNSWL scores. It seems that policing at > DNSWL might be lax as they don't offer any easy and obvious way to report > violations. (No, I will not report violations of whitelists, or blacklist > FP's, because doing so would artificially effect the measurements that we do.) RCVD_IN_DNSWL_HI -5 RCVD_IN_DNSWL_MED -2 RCVD_IN_DNSWL_LOW -0.5 Based upon current performance I believe these scores would be appropriate. I think we should adjust these and returnpath's scores on-the-fly with sa-update as we have more carefully examined their performance over time. Any objections?
(In reply to comment #30) > > http://ruleqa.spamassassin.org/20091205-r887515-n/RCVD_IN_DNSWL_MED/detail > > http://ruleqa.spamassassin.org/20091205-r887515-n/RCVD_IN_DNSWL_LOW/detail > > I also think we should reduce the DNSWL scores. It seems that policing at > > DNSWL might be lax as they don't offer any easy and obvious way to report > > violations. (No, I will not report violations of whitelists, or blacklist > > FP's, because doing so would artificially effect the measurements that we do.) > > RCVD_IN_DNSWL_HI -5 > RCVD_IN_DNSWL_MED -2 > RCVD_IN_DNSWL_LOW -0.5 > > Based upon current performance I believe these scores would be appropriate. I > think we should adjust these and returnpath's scores on-the-fly with sa-update > as we have more carefully examined their performance over time. > > Any objections? objection to hijacking a bug for something else? yes pls open a separate bug for DNSWL adjustments/issues. this is about HABEAS and BSP
(In reply to comment #31) > (In reply to comment #30) > > > http://ruleqa.spamassassin.org/20091205-r887515-n/RCVD_IN_DNSWL_MED/detail > > > http://ruleqa.spamassassin.org/20091205-r887515-n/RCVD_IN_DNSWL_LOW/detail > > > I also think we should reduce the DNSWL scores. It seems that policing at > > > DNSWL might be lax as they don't offer any easy and obvious way to report > > > violations. (No, I will not report violations of whitelists, or blacklist > > > FP's, because doing so would artificially effect the measurements that we do.) > > > > RCVD_IN_DNSWL_HI -5 > > RCVD_IN_DNSWL_MED -2 > > RCVD_IN_DNSWL_LOW -0.5 > > > > Based upon current performance I believe these scores would be appropriate. I > > think we should adjust these and returnpath's scores on-the-fly with sa-update > > as we have more carefully examined their performance over time. > > > > Any objections? > > objection to hijacking a bug for something else? yes > pls open a separate bug for DNSWL adjustments/issues. > this is about HABEAS and BSP yep, please open a new bug for the DNSWL discussion (link to this one, of course). (other than that, +1)
Targeting for 3.3.0. Please be sure the following is done before Saturday's weekly masscheck. * These three rules added as nopublish so they can be tested. * Keep the old rules for now so their performance can be compared to the new whitelist rules, so we can possibly estimate how relevant the overlap is between the GA scored old rules and these new rules.
(In reply to comment #33) > Targeting for 3.3.0. > > Please be sure the following is done before Saturday's weekly masscheck. > > * These three rules added as nopublish so they can be tested. > * Keep the old rules for now so their performance can be compared to the new > whitelist rules, so we can possibly estimate how relevant the overlap is > between the GA scored old rules and these new rules. will gladly take over but need a bit of JM's guidance to remain within "SA standard procedures"
(In reply to comment #33) > Targeting for 3.3.0. > > Please be sure the following is done before Saturday's weekly masscheck. > > * These three rules added as nopublish so they can be tested. > * Keep the old rules for now so their performance can be compared to the new > whitelist rules, so we can possibly estimate how relevant the overlap is > between the GA scored old rules and these new rules. hi Alex -- 1. take Kevin's file, add it in your sandbox, add "tflags nopublish" to any full rule (ie not meta subrule) that doesn't yet have it 2. ensure the rules don't interfere with the existing old rules, so they can both coexist for one mass-check (3. ensure "make test" passes the basic_lint.t test ;) 4. check 'em in! by the way -- I don't know if we can safely do this in the 3.3.0 release. We need to ensure the changes to these rules don't affect FP%/FN% rates far from the levels were measured at during rescoring. (This can happen if the new rules hit diff mails, or the scores don't sufficiently compensate for FP/FNs measured in the rescore mass-check.) So even if the rules overlap sufficiently close to 100% that we can just drop 'em in as replacements, a final, pre-release step for this bug will be to measure the _new_ scores using the rescore mass-check's logfiles (with s/HABEAS_ACCREDITED_COI/RCVD_IN_RP_CERTIFIED/g etc.), determine FP/FN%, and make sure it matches the old rates (or is at least still acceptable).
Created attachment 4593 [details] SVN ready version of the ReturnPath Test Rules > hi Alex -- > > 1. take Kevin's file, add it in your sandbox, add "tflags nopublish" to any > full rule (ie not meta subrule) that doesn't yet have it > > 2. ensure the rules don't interfere with the existing old rules, so they can > both coexist for one mass-check > > (3. ensure "make test" passes the basic_lint.t test ;) > > 4. check 'em in! 5. use the attached file which disables the disabling of the previous rules per #2 above. 6. Consider whether the new rule for RP_RNBL is added or not. I'm +1 for adding it as ReturnPath is a credible community member. We may have to change this if their query limits are too low, etc. but that's an issue to be dealt with on a larger community. NOTE: This rule is in the attached cf file and uses -lastexternal rbl check. KAM
Just commit it. During RTC you can commit rules like this, especially nopublish are very safe. Just do it.
(In reply to comment #37) > Just commit it. During RTC you can commit rules like this, especially > nopublish are very safe. Just do it. I don't know that I would consider a new rule as trivial under RTC and believe this needs a vote. I just updated the attachment to a patch.
(In reply to comment #38) > (In reply to comment #37) > > Just commit it. During RTC you can commit rules like this, especially > > nopublish are very safe. Just do it. > > I don't know that I would consider a new rule as trivial under RTC and believe > this needs a vote. I just updated the attachment to a patch. by our procedures, rule changes can be committed without review under RTC. but a brand new DNSBL lookup is certainly not trivial, and would need more discussion. having said that, the rules file attached uses "nopublish" so is safe to just commit.
> by our procedures, rule changes can be committed without review under RTC. but > a brand new DNSBL lookup is certainly not trivial, and would need more > discussion. > > having said that, the rules file attached uses "nopublish" so is safe to just > commit. From my perspective, this ticket currently encompasses 2 new rules, 1 really new rule and disabling 6 existing rules. I don't consider it a trivial rule change but I think other than the RP_RNBL, you, I, Justin and Alex are (I think) agreeing to submitting the patch I attached and seeing what masscheck shows.
The patch only adds new rules to the weekly masscheck, when we know the target server can handle the load. Where is the controversy? Just do it. =)
Adding wtogami/20_rp_certified.cf Transmitting file data . Committed revision 889883. Added for weekly masscheck tomorrow.
http://ruleqa.spamassassin.org/20091212-r889898-n/T_RCVD_IN_RP_CERTIFIED/detail http://ruleqa.spamassassin.org/20091212-r889898-n/T_RCVD_IN_RP_SAFE/detail Results from Saturday masscheck. There are some problems. * RCVD_IN_RP_CERTIFIED seems to be a subset of RCVD_IN_RP_SAFE. If so then the existing and proposed scores are inappropriate, as it would add both numbers on every hit. Options: 1) Is anyone else using these zones for lookups now? If not, could you please change it so they are non-overlapping? 2) Make SAFE one score, and CERTIFIED a smaller score. CERTIFIED always triggers with safe, so CERTIFIED only adds a small number on top of SAFE to reach the intended total weight. The overlap analysis shows near 100% overlap with the old rules, but I'm not sure we can trust those numbers. I'm not sure that ruleqa is behaving properly in the overlap analysis, as SAFE and CERTIFIED claim to be 100% overlapping in both directions when this is clearly incorrect.
(In reply to comment #43) > Options: > 1) Is anyone else using these zones for lookups now? If not, could you please > change it so they are non-overlapping? > 2) Make SAFE one score, and CERTIFIED a smaller score. CERTIFIED always > triggers with safe, so CERTIFIED only adds a small number on top of SAFE to > reach the intended total weight. (1) is preferable imo, but (2) is acceptable. > The overlap analysis shows near 100% overlap with the old rules, but I'm not > sure we can trust those numbers. I'm not sure that ruleqa is behaving properly > in the overlap analysis, as SAFE and CERTIFIED claim to be 100% overlapping in > both directions when this is clearly incorrect. ... on the spam corpora, where they do indeed have exactly the same hit-rate, and I can believe they are hitting exactly the same spams. However, these rules are intended to hit ham, and that's where they differ: overlap ham: 99% of T_RCVD_IN_RP_SAFE hits also hit HABEAS_ACCREDITED_SOI; 99% of HABEAS_ACCREDITED_SOI hits also hit T_RCVD_IN_RP_SAFE overlap ham: 100% of T_RCVD_IN_RP_CERTIFIED hits also hit RCVD_IN_BSP_TRUSTED; 97% of RCVD_IN_BSP_TRUSTED hits also hit T_RCVD_IN_RP_CERTIFIED overlap ham: 100% of T_RCVD_IN_RP_CERTIFIED hits also hit T_RCVD_IN_RP_SAFE; 17% of T_RCVD_IN_RP_SAFE hits also hit T_RCVD_IN_RP_CERTIFIED overlap ham: 99% of T_RCVD_IN_RP_CERTIFIED hits also hit HABEAS_ACCREDITED_SOI; 17% of HABEAS_ACCREDITED_SOI hits also hit T_RCVD_IN_RP_CERTIFIED that's useful (and imo credible) data.
Let's see how ReturnPath responds about Comment #43 option 1 or 2 before proceeding.
(In reply to comment #43) > http://ruleqa.spamassassin.org/20091212-r889898-n/T_RCVD_IN_RP_CERTIFIED/detail > http://ruleqa.spamassassin.org/20091212-r889898-n/T_RCVD_IN_RP_SAFE/detail > > Results from Saturday masscheck. There are some problems. > > * RCVD_IN_RP_CERTIFIED seems to be a subset of RCVD_IN_RP_SAFE. If so then the > existing and proposed scores are inappropriate, as it would add both numbers on > every hit. > > Options: > 1) Is anyone else using these zones for lookups now? If not, could you please > change it so they are non-overlapping? > 2) Make SAFE one score, and CERTIFIED a smaller score. CERTIFIED always > triggers with safe, so CERTIFIED only adds a small number on top of SAFE to > reach the intended total weight. > > The overlap analysis shows near 100% overlap with the old rules, but I'm not > sure we can trust those numbers. I'm not sure that ruleqa is behaving properly > in the overlap analysis, as SAFE and CERTIFIED claim to be 100% overlapping in > both directions when this is clearly incorrect. Hi, With regard to the overlap of Certified to Safe Yes, you have it right – everything on Certified is on Safe. But not everything of Safe is on Certified. As I said earlier, it is much tougher to get and stay on Certified; we use complaint performance metrics*, all unavailable to Spamasassin users to determine day-to-day compliance with listing standards. Certified (nee. Sender Score Certified, nee. Bonded Sender had, for a very long time a -4.5. We are asking for the new score because we believe our performance metrics have gotten that much better over the years. In terms of numbers, they break down like this: Certified Active: 4407 Suspended: 1300 Total: 5707 Safe Active: 6561 Suspended: 283 Total: 6844 We feel that the onus and larger benefit is well-justified by our performance metrics to be placed upon Certified. Safe is, in essence, know senders who have rDNS in place, aren’t blacklisted, and don’t hit spamtraps often. Certified senders are all of that, and they maintain low complaint levels at major receiving sites, and have a sender reputation that determines that what they send is not junk. -- Neil Schwartzman Director, Certification Security & Standards Return Path Inc. * Windows Live Sender Reputation Data, Hotmail user complaints, Yahoo! user complaints, complaints from two confidential webmail providers; and in the coming months, two additional data sources, and we will be swapping out one of the confidential sources for a larger provider as well. ** trap hits, rDNS, performance, lack of measurable volume, etcetera
Given the stacking nature of your rules, it seems our best choice is Comment #43 option 2. It is problematic that we were doubling up large scores in an unintended fashion in the past. This might complicate changing the rules/scores now given the huge impact such a doubled up score might have had on the GA rescoring. I will analyze the effect on the FP/FN ratios of these proposed changes in a similar manner that closed Bug #6251 before proceeding.
(In reply to comment #47) > Given the stacking nature of your rules, it seems our best choice is Comment > #43 option 2. [ . . . ] > I will analyze the effect on the FP/FN ratios of these proposed changes in a > similar manner that closed Bug #6251 before proceeding. I look forward to seeing the results of your analysis. We'd prefer to see Certified receive a larger score adjustment than Safe, because the bar for being listed on Certified is higher. You've seen how difficult it's been to explain to the users list that the old HABEAS rules didn't involve suing people over X-Habeas headers; explaining that Certified is more difficult than Safe when Safe gets more points would be impossible.
Weekly Masscheck 20091216 ======================= # SUMMARY for threshold 5.0: # Correctly non-spam: 246432 99.33% # Correctly spam: 174605 93.82% # False positives: 1670 0.67% # False negatives: 11501 6.18% # TCR(l=50): 1.958990 SpamRecall: 93.820% SpamPrec: 99.053% Old Scores score HABEAS_ACCREDITED_COI 0 -8.0 0 -8.0 score HABEAS_ACCREDITED_SOI 0 -4.3 0 -4.3 score RCVD_IN_BSP_TRUSTED 0 -4.3 0 -4.3 score RCVD_IN_SSC_TRUSTED_COI 0 -3.7 0 -3.7 Weekly Masscheck 20091216, with HABEAS and BSP Disabled ================================================== # SUMMARY for threshold 5.0: # Correctly non-spam: 246431 99.33% # Correctly spam: 174698 93.87% # False positives: 1671 0.67% # False negatives: 11408 6.13% # TCR(l=50): 1.959877 SpamRecall: 93.870% SpamPrec: 99.053% Interestingly, we perform BETTER with the the whitelists turned off. This is indicative that spamassassin is well balanced and pretty safe against FP's even before the whitelists come into play. Weekly Masscheck 20091216, new RP rules =================================== New Scores score RCVD_IN_RP_CERTIFIED 0.0 -2.0 0.0 -2.0 score RCVD_IN_RP_SAFE 0.0 -3.0 0.0 -3.0 # SUMMARY for threshold 5.0: # Correctly non-spam: 246432 99.33% # Correctly spam: 174651 93.84% # False positives: 1670 0.67% # False negatives: 11455 6.16% # TCR(l=50): 1.959939 SpamRecall: 93.845% SpamPrec: 99.053% Enabling the new RP rules made things slightly worse again, about mid-way between disabled and the old rules. This is an effective score of -5 for CERTIFIED and -3 for SAFE. This is the scoreset that we have voted to include in spamassassin-3.3.0 where the cumulative score of CERTIFIED is -5. Weekly Masscheck 20091216, new RP rules, doubled up ============================================== New Scores score RCVD_IN_RP_CERTIFIED 0.0 -5.0 0.0 -5.0 score RCVD_IN_RP_SAFE 0.0 -3.0 0.0 -3.0 # SUMMARY for threshold 5.0: # Correctly non-spam: 246432 99.33% # Correctly spam: 174631 93.83% # False positives: 1670 0.67% # False negatives: 11475 6.17% # TCR(l=50): 1.959526 SpamRecall: 93.834% SpamPrec: 99.053% Just to satisfy your request, here is the same test with -5 for CERTIFIED. That is an effective score of -8 for CERTIFIED and -3 for SAFE. Results are not improved from the above. Weekly Masscheck 20091216, all DNS whitelists disabled ================================================= # SUMMARY for threshold 5.0: # Correctly non-spam: 246431 99.33% # Correctly spam: 174698 93.87% # False positives: 1671 0.67% # False negatives: 11408 6.13% # TCR(l=50): 1.959877 SpamRecall: 93.870% SpamPrec: 99.053% Out of curiosity, I did the same test with all RP and DNSWL whitelists disabled. The results improved even further. It seems that spamassassin is just fine without the whitelists in FP safety. I suspect however that the whitelists are helpful in pushing the total scores over the edge in order to trigger auto-learn. Given this analysis I suspect the lower whitelist scores in 3.3.0 are entirely appropriate. I am next attempting to repeat this analysis with the larger mcsnapshot masscheck logs instead of the most recent weekly masscheck.
> You've seen how difficult it's been to explain to the users list that the old > HABEAS rules didn't involve suing people over X-Habeas headers; explaining that > Certified is more difficult than Safe when Safe gets more points would be > impossible. I suppose that Safe could be a meta rule that only fires when Certified does nor fire. Then you could have Certified be the higher score that users might expect. On the other hand, explaining the Certified is an additional score on top of save might not really be that hard.
rescore masscheck logs with mcsnapshot + today's trunk Scores ======================================================= # SUMMARY for threshold 5.0: # Correctly non-spam: 703979 99.95% # Correctly spam: 2562432 98.39% # False positives: 387 0.05% # False negatives: 41888 1.61% # TCR(l=50): 42.527842 SpamRecall: 98.392% SpamPrec: 99.985% score HABEAS_ACCREDITED_COI 0 -8.0 0 -8.0 score HABEAS_ACCREDITED_SOI 0 -4.3 0 -4.3 score RCVD_IN_BSP_TRUSTED 0 -4.3 0 -4.3 score RCVD_IN_SSC_TRUSTED_COI 0 -3.7 0 -3.7 rescore masscheck logs with mcsnapshot + today's trunk Scores HABEAS, BSP and SSC Disabled ========================= # SUMMARY for threshold 5.0: # Correctly non-spam: 703971 99.94% # Correctly spam: 2562681 98.40% # False positives: 395 0.06% # False negatives: 41639 1.60% # TCR(l=50): 42.423235 SpamRecall: 98.401% SpamPrec: 99.985% rescore masscheck logs with mcsnapshot + today's trunk Scores HABEAS, BSP and SSC and DNSWL Disabled ========================== # SUMMARY for threshold 5.0: # Correctly non-spam: 703899 99.93% # Correctly spam: 2565063 98.49% # False positives: 467 0.07% # False negatives: 39257 1.51% # TCR(l=50): 41.597904 SpamRecall: 98.493% SpamPrec: 99.982% I did not test the new RP rules that replace the old rules, because it is not an apples to apples comparison. The comparison between these three scores however confirm the trend in Comment #49 from weekly results that the whitelists really don't make much of a difference. The old rules with doubled up scores were actually slightly harmful. For these reasons it seems correct to apply these new scores to the RP rules with these modest scores: score RCVD_IN_RP_CERTIFIED 0.0 -2.0 0.0 -2.0 score RCVD_IN_RP_SAFE 0.0 -3.0 0.0 -3.0
Oops, the weekly masscheck was actually from Dec 12th. Weekly Masscheck 20091212 ======================= # SUMMARY for threshold 5.0: # Correctly non-spam: 246432 99.33% # Correctly spam: 174605 93.82% # False positives: 1670 0.67% # False negatives: 11501 6.18% # TCR(l=50): 1.958990 SpamRecall: 93.820% SpamPrec: 99.053% Old Scores score HABEAS_ACCREDITED_COI 0 -8.0 0 -8.0 score HABEAS_ACCREDITED_SOI 0 -4.3 0 -4.3 score RCVD_IN_BSP_TRUSTED 0 -4.3 0 -4.3 score RCVD_IN_SSC_TRUSTED_COI 0 -3.7 0 -3.7 Weekly Masscheck 20091212 with RP Rules =================================== # SUMMARY for threshold 5.0: # Correctly non-spam: 246432 99.33% # Correctly spam: 174651 93.84% # False positives: 1670 0.67% # False negatives: 11455 6.16% # TCR(l=50): 1.959939 SpamRecall: 93.845% SpamPrec: 99.053% Scores are Flipped score RCVD_IN_RP_CERTIFIED 0.0 -3.0 0.0 -3.0 score RCVD_IN_RP_SAFE 0.0 -2.0 0.0 -2.0 SAFE is -2 and CERTIFIED is cumulative -5. To address your concern about people being confused, this would be easier for people to understand. I'm going ahead with this score.
r891460 | wtogami | 2009-12-16 17:36:28 -0500 (Wed, 16 Dec 2009) | 3 lines Bug #6247: Replace HABEAS, BSP and SSC with RP CERTIFIED. Note: CERTIFIED is cumulative with SAFE. Sending rules/20_dnsbl_tests.cf Sending rules/50_scores.cf Sending rulesrc/10_force_active.cf Sending rulesrc/sandbox/wtogami/20_rp_certified.cf Transmitting file data .... Committed revision 891460. I tested this very thoroughly and checked in these changes. Closing.
Thanks for measuring this, Warren. (In reply to comment #51) > rescore masscheck logs with mcsnapshot + today's trunk Scores > ======================================================= > # SUMMARY for threshold 5.0: > # Correctly non-spam: 703979 99.95% > # Correctly spam: 2562432 98.39% > # False positives: 387 0.05% > # False negatives: 41888 1.61% > # TCR(l=50): 42.527842 SpamRecall: 98.392% SpamPrec: 99.985% vs. > rescore masscheck logs with mcsnapshot + today's trunk Scores > HABEAS, BSP and SSC and DNSWL Disabled > ========================== > # SUMMARY for threshold 5.0: > # Correctly non-spam: 703899 99.93% > # Correctly spam: 2565063 98.49% > # False positives: 467 0.07% > # False negatives: 39257 1.51% > # TCR(l=50): 41.597904 SpamRecall: 98.493% SpamPrec: 99.982% So that means that 467-387 --> 80 80/(703899+467) --> 0.000113577316338381 => 0.01135% of hams were rescued from being FPs by the DNSWL rules. 41888-39257 --> 2631 2631/(2565063+39257) --> 0.00101024451680285 => 0.101% of spams were, conversely, allowed through by them. Good to know, and good to get an idea of the problem. fwiw, I think it's better to use the rescore data for this measurement -- more contributors, more varied logs, and (hopefully) the data will have received more hand-checking before submission. by the way I don't know if it's safe to say whether or not this is "statistically significant". We don't know what the null hypothesis is in this case to use that terminology. --j.
(In reply to comment #54) > > rescore masscheck logs with mcsnapshot + today's trunk Scores > > HABEAS, BSP and SSC and DNSWL Disabled > > ========================== > > # SUMMARY for threshold 5.0: > > # Correctly non-spam: 703899 99.93% > > # Correctly spam: 2565063 98.49% > > # False positives: 467 0.07% > > # False negatives: 39257 1.51% > > # TCR(l=50): 41.597904 SpamRecall: 98.493% SpamPrec: 99.982% > > So that means that > > 467-387 > --> 80 > 80/(703899+467) > --> 0.000113577316338381 > > => 0.01135% of hams were rescued from being FPs by the DNSWL rules. > > 41888-39257 > --> 2631 > 2631/(2565063+39257) > --> 0.00101024451680285 > > => 0.101% of spams were, conversely, allowed through by them. actually, I've realised this may be measuring the wrong thing -- since FP% is much lower (by design) than FN%, it's quite likely that we'll see a comparatively high rate of hits against spam, even if the rules are useful. A better way to think about it is in terms of those values compared against the FP%/FN% rates. => 80 hams rescued from the 467 FPs = 17.13% of FPs rescued => 2631 spams added to the 39257 FNs = 6.70% additional FNs when looked at this way, they look more practical. a reliable way to cut the FP rate by 17% is definitely useful, although obviously it'd be nice if the 6.7% additional FNs was reduced.
(In reply to comment #55) > > when looked at this way, they look more practical. a reliable way to cut the > FP rate by 17% is definitely useful, although obviously it'd be nice if the > 6.7% additional FNs was reduced. Perhaps the suggested metas with the DNSBLs might help with that. http://ruleqa.spamassassin.org/?rule=%2F[N_]_RP_ Unfortunately the last network masscheck seems to have only used one corpus so the results aren't interesting yet; they certainly don't track the earlier quick analysis I did by pulling the overlaps from the masscheck results (before I wrote the rules).
I just got a FN that had both RCVD_IN_BSP_TRUSTED and HABEAS_ACCREDITED_SOI. I use 3.2.5. Could you please push these changes to updates for older SA releases? Thank you.
I've got anoter FP that hit both of these rules: * -4.3 RCVD_IN_BSP_TRUSTED RBL: Sender is in Sender Score Certified * (trusted relay) * [Return Path SenderScore Certified (formerly] [Bonded Sender) - <http://www.senderscorecertified.com>] * -4.3 HABEAS_ACCREDITED_SOI RBL: Habeas Accredited Opt-In or Better * [193.34.88.98 listed in sa-accredit.habeas.com] could you please push these changes to 3.2.5 updates? Thank you.
Updated for current ownership & branding 2021-03-26 revision 1888086.
A summary of the recent changes: RCVD_IN_RP_CERTIFIED -> RCVD_IN_VALIDITY_CERTIFIED RCVD_IN_RP_SAFE -> RCVD_IN_VALIDITY_SAFE RCVD_IN_RP_RNBL -> RCVD_IN_VALIDITY_RPBL Contact information in the descriptions updated to certification@validity.com