Bug 6344

Summary: ReturnPath and DNSWL rules should not autolearn
Product: Spamassassin Reporter: Jason Bertoch <jason>
Component: RulesAssignee: SpamAssassin Developer Mailing List <dev>
Status: NEW ---    
Severity: normal CC: ahayes, jason, jhardin, kmcgrail, rwmaillists, wtogami
Priority: P5    
Version: unspecified   
Target Milestone: Undefined   
Hardware: Other   
OS: All   
Whiteboard:

Description Jason Bertoch 2010-02-23 15:18:18 UTC
Due to the risk of false positives poisoning Bayes, and the fact that the other SA whitelist and blacklist rules already skip autolearning, this should be appended to the RP and DNSWL rules.

Suggest modifying the following rules:

tflags RCVD_IN_RP_CERTIFIED     net nice
tflags RCVD_IN_RP_SAFE          net nice
tflags RCVD_IN_DNSWL_HI         nice net
tflags RCVD_IN_DNSWL_LOW        nice net
tflags RCVD_IN_DNSWL_MED        nice net

with:

tflags RCVD_IN_RP_CERTIFIED     net nice noautolearn
tflags RCVD_IN_RP_SAFE          net nice noautolearn
tflags RCVD_IN_DNSWL_HI         nice net noautolearn
tflags RCVD_IN_DNSWL_LOW        nice net noautolearn
tflags RCVD_IN_DNSWL_MED        nice net noautolearn
Comment 1 ahayes 2010-12-15 10:18:03 UTC
I have just set up SpamAssassin and have had several pieces of spam get through with "autolearn=ham" thanks to the default Return Path whitelist rules.

I can not find anywhere to report these offending messages to Return Path and have struggled to make myself a system for easily telling spamassassin to forget them. I'm also struggling to get the rules disabled in my install while also benefitting from sa-update.

So +1 from me (a new user) for this bug.
Comment 2 Jason Bertoch 2010-12-15 13:53:40 UTC
While I still believe this bug is legitimate, you should understand that adding "noautolearn" to the rules' tflags doesn't prevent a message from being auto-learned.  Instead, it only means this test is ignored when calculating scores for the learning system.  While it may help prevent messages from being auto-learned, it doesn't guarantee it.

In the mean time, feel free to add the suggestions above to your local.cf, or even disable the RP rules by setting their score to zero.
Comment 3 RW 2012-08-15 21:47:11 UTC
There is an apparent case of this in the users list "Very spammy messages yield BAYES_00". A lot of people are reporting problems with DNSWL. I think it would be a good idea to implement this.
Comment 4 AXB 2012-08-15 22:12:19 UTC
(In reply to comment #3)
> There is an apparent case of this in the users list "Very spammy messages
> yield BAYES_00". A lot of people are reporting problems with DNSWL. I think
> it would be a good idea to implement this.

This thread is mainly about Bayes and that DNSWL may be decreasing the socre somewhat, but DNSWL is not the culprit.

Users can easily implement the rule modifications in their site config.

-1 for such a change.
Comment 5 John Hardin 2012-08-15 22:22:00 UTC
See also Bug 6828
Comment 6 RW 2012-08-15 23:13:06 UTC
(In reply to comment #4)
> This thread is mainly about Bayes and that DNSWL may be decreasing the socre
> somewhat, but DNSWL is not the culprit.

The bayes score is a symptom 

I think it's very likely that  DNSWL is the reason BAYES is failing in the first place. If you ignore bayes and look at the other rules hit, they would all of had scores well above the threshold if it weren't for DNSWL.
Comment 7 Jason Bertoch 2012-08-16 03:53:04 UTC
> 
> This thread is mainly about Bayes and that DNSWL may be decreasing the socre
> somewhat, but DNSWL is not the culprit.
> 
> Users can easily implement the rule modifications in their site config.
> 
> -1 for such a change.


I've seen this argument numerous times throughout the development of SA, but it's extremely arrogant.  It assumes that all SA users follow the dev process from beginning to end and are also subscribed to all mailing lists.  The truth is that this product is more far reaching than some people here seem to respect.  Just because it may be trivial for someone on the list to implement (or adjust) some feature, doesn't mean it's trivial everywhere SA may be deployed.  Even though I've followed this project from the beginning, I still think we have a duty to make sane decisions on default configs.  Just because you may want the defaults to fit your situation, that doesn't mean those defaults are appropriate for the project s a whole.  In fact, since you are clearly able to modify the settings, the defaults should likely differ greatly from your situation.
Comment 8 Kevin A. McGrail 2012-08-16 13:09:49 UTC
If this is a discussion on the efficacy and scoring of RP, DNSWL or other rules, sobeit.  But a discussion of not autolearning specific rules, that sounds flawed and unmaintainable to me. Here's my thoughts:

First, to my understanding, the noautolearn setting in question is a masscheck setting.  It doesn't change production systems.

Second, It would seem to me that if you don't trust the set of rules to score very high, you change the scores.  

Third, If you think the scores are not accurate, we get more people assisting with rule QA and improve the scores.

Finally, the concept of not learning for the bayesian system based on certain rules hitting/not-hitting for production systems seems to have little merit to me.  


Regards,
KAM
Comment 9 RW 2012-08-16 19:24:51 UTC
(In reply to comment #8)
> If this is a discussion on the efficacy and scoring of RP, DNSWL or other
> rules, sobeit.  But a discussion of not autolearning specific rules, that
> sounds flawed and unmaintainable to me. Here's my thoughts:
> 
> First, to my understanding, the noautolearn setting in question is a
> masscheck setting.  It doesn't change production systems.

No, autolearning uses a non-Bayes score set and additionally ignores rules marked as noautolearn or userconf.


> Second, It would seem to me that if you don't trust the set of rules to
> score very high, you change the scores.  

The scores are assigned to distinguish spam from what is not proven to be spam.

> Third, If you think the scores are not accurate, we get more people
> assisting with rule QA and improve the scores.

That works for spam because we optimize for a threshold and then add a safety margin. It wont work for ham because we don't have a three-way classification.

Even if we did have a three-way classifiction,  we don't have enough "nice" rules to positively identify ham.

> Finally, the concept of not learning for the bayesian system based on
> certain rules hitting/not-hitting for production systems seems to have
> little merit to me.  

It's more the DNS whitelist rules that are the anomaly. If I add an authenticated address to a whitelist it's ignored for autolearning, but if a direct marketer pays money to Return-Path that does contribute.

The DNS whitelists should be seem as a way of avoiding FPs, not as a way of positively identifying ham.
Comment 10 John Hardin 2012-08-16 20:16:18 UTC
(In reply to comment #8)
> First, to my understanding, the noautolearn setting in question is a
> masscheck setting.  It doesn't change production systems.

Apparently that's not true. Per the documentation:

    $score = $status->get_autolearn_points()
        Return the message's score as computed for auto-learning. Certain
        tests are ignored:

          - rules with tflags set to 'learn' (the Bayesian rules)

          - rules with tflags set to 'userconf' (user white/black-listing rules, etc)

          - rules with tflags set to 'noautolearn'

I'd suggest that as a general practice _any_ DNS-based rule having a negative score should have the "noautolearn" tflag set. It's not so much a matter of mistrust as a recognition that a temporary mistake by the DNS service could cause Bayes to go off the rails.

> Finally, the concept of not learning for the bayesian system based on
> certain rules hitting/not-hitting for production systems seems to have
> little merit to me.  

It's not so much that a DNSWL rule hit would suppress autolearning as, if the message is _still hammy_ when DNSWL is not considered, it should be autolearned.


So, +1 from me on the initial suggestion, plus review of other DNS-based standard rules for the same change (which will be quick, I don't think many reduce the score). I agree with Jason, "users can easily implement the rule modifications in their site config" is not an appropriate response to this particular case.
Comment 11 AXB 2012-08-16 20:53:42 UTC
I'm not convinced this will solve what most ppl are seeing:
few rule hits = low scores and one of the rules includes DNSWL.

From the reports, bayes alone would seldom raised the score above threshold either, unless they're constantly feeding bayes from traps or some other automated method. 

Imo, we're all focusing on DNSWL and RCVD_IN_RP_* but the problem is somewhere else, and unless we see more samples of the messages which cause these  false negatives we're pretty much guessing what could help.

I'd prefer to question the trust & scores we give DNSWL and RCVD_IN_RP_*
Comment 12 Kevin A. McGrail 2012-08-17 16:11:53 UTC
(In reply to comment #10)
> I'd suggest that as a general practice _any_ DNS-based rule having a
> negative score should have the "noautolearn" tflag set. It's not so much a
> matter of mistrust as a recognition that a temporary mistake by the DNS
> service could cause Bayes to go off the rails.

Thanks btw for checking the noautolearn impacts Bayes learning. I missed that.

I disagree with this.  I continually keep circling back to the fact that rules should score appropriately with minimal false hits.  That includes hammy rules.

You are saying that negative DNS based tests should not impact bayes and I agree that this is more of a symptom.  We should look at lowering the scores of those rules if they are rippling that badly.

> 
> > Finally, the concept of not learning for the bayesian system based on
> > certain rules hitting/not-hitting for production systems seems to have
> > little merit to me.  
> 
> It's not so much that a DNSWL rule hit would suppress autolearning as, if
> the message is _still hammy_ when DNSWL is not considered, it should be
> autolearned.

To me this implies a lack of trust in the rule efficacy and scoring that needs to be adjusted not the bayesian system.

> So, +1 from me on the initial suggestion, plus review of other DNS-based
> standard rules for the same change (which will be quick, I don't think many
> reduce the score). I agree with Jason, "users can easily implement the rule
> modifications in their site config" is not an appropriate response to this
> particular case.

Sorry, at best I'm 0 and I'm not going to stand in your way if you do the work, submit the code and follow-up on it with some analysis.
Comment 13 RW 2012-08-17 19:43:42 UTC
(In reply to comment #12)

>  I continually keep circling back to the fact that
> rules should score appropriately with minimal false hits.  That includes
> hammy rules.

As I said before, we don't have any meaningful QA mechanism for this.

It's not possible to optimize for two thing simultaneously. The score-set that optimizes the TP rate at 5.0 with an FP constraint, isn't going to be an optimal score-set for maximizing ham learning at 0.1 with a mislearning constraint.

In theory it is possible to do it with a single optimization if you close the loop and allow mistraining to affect the scores at 5.0, but that means that all the BAYES results would need to be dynamically recomputed from a fresh database for each set of rule scores, and that's simply impractical.
Comment 14 Kevin A. McGrail 2012-08-17 21:02:16 UTC
(In reply to comment #13)
> (In reply to comment #12)
> 
> >  I continually keep circling back to the fact that
> > rules should score appropriately with minimal false hits.  That includes
> > hammy rules.
> 
> As I said before, we don't have any meaningful QA mechanism for this.

Barring an automated mechanism, I think someone to perform SOME analysis.  I don't think anyone is disagreeing perhaps the scores are too highly weighted but see some issues with modifying bayes to accommodate scores.

At the very worst, make this noautolearn change and/or the score and report back on the impact you think it has had would be better than where we are at now.