Bug 6510

Summary: Mass-check generating nightly rules updates with 0 scores for rules with dependencies
Product: Spamassassin Reporter: Marc Sherman <msherman>
Component: Score GenerationAssignee: SpamAssassin Developer Mailing List <dev>
Status: NEW ---    
Severity: normal CC: jhardin, john, kmcgrail, merijnvdk
Priority: P2    
Version: 3.3.1   
Target Milestone: Undefined   
Hardware: PC   
OS: other   
Whiteboard:
Attachments: Aviod publishing zero scores for published sandbox rules

Description Marc Sherman 2010-11-05 09:37:08 UTC
The nightly masscheck which generates the scores for production rules updates has recently started generating 0 scores for rules which have dependencies (leading to warnings).

Examples:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6504
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6509
Comment 1 John Hardin 2010-11-05 21:38:39 UTC
Quick and dirty patch suggestion, from total ignorance of the internals of the sandbox score generation process:

===================================================================
--- build/mkupdates/mkupdate-with-scores	(revision 1031656)
+++ build/mkupdates/mkupdate-with-scores	(working copy)
@@ -150,7 +150,7 @@
 
   cd ..
 
-  cp trunk-rulesrc-scores/72_scores.cf trunk/rules/72_scores.cf
+  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf > trunk/rules/72_scores.cf
 
   # note: one of set0 or set1 stats might be incorrect (not all of their rules
   #       are included in the update) I can't remember if we eliminate dropped
Comment 2 John Hardin 2010-11-05 21:39:55 UTC
Created attachment 4820 [details]
Aviod publishing zero scores for published sandbox rules

Q'n'D fix for the symptoms, not the cause.
Comment 3 Karsten Bräckelmann 2010-11-05 23:49:03 UTC
*** Bug 6509 has been marked as a duplicate of this bug. ***
Comment 4 Daryl C. W. O'Shea 2010-11-05 23:54:43 UTC
(In reply to comment #1)
> +  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf >
> trunk/rules/72_scores.cf

Yeah, that won't work well.  It'll enable net rules in non-net scoresets.

I think I'll change the script to abort the update if lint detects dependency issues.  I missed tonight's update (I think there's yet another rule or more that will have zeroing issues)... so I'll look at it tomorrow.
Comment 5 John Hardin 2010-11-06 11:53:00 UTC
(In reply to comment #4)
> (In reply to comment #1)
> > +  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf >
> > trunk/rules/72_scores.cf
> 
> Yeah, that won't work well.  It'll enable net rules in non-net scoresets.

Would that cause problems given the presence of "tflags net" on net rules? Surely SA does not rely on the score being zero to disable net rules when run in local mode...

> I think I'll change the script to abort the update if lint detects dependency
> issues.  I missed tonight's update (I think there's yet another rule or more
> that will have zeroing issues)... so I'll look at it tomorrow.

That would keep the update from breaking things at the cost of possibly blocking rule and score updates for extended periods while the rescorer is misbehaving.

If we can put a check into the rescorer that does not allow scores to go to zero on published rules (taking into account net vs. non-net as my hack does not, if that's indeed critical) then that should be done. As I said earlier, if a rule is performing well enough to be published then it should not be assigned a zero score.

Any idea why the scores are jumping around so much? That bothers me too, though it's not as problematic as scores going to zero for no discernible reason.
Comment 6 Daryl C. W. O'Shea 2010-11-06 17:38:49 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #1)
> > > +  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf >
> > > trunk/rules/72_scores.cf
> > 
> > Yeah, that won't work well.  It'll enable net rules in non-net scoresets.
> Would that cause problems given the presence of "tflags net" on net rules?
> Surely SA does not rely on the score being zero to disable net rules when run
> in local mode...

Good point, investigating now.

> > I think I'll change the script to abort the update if lint detects dependency
> > issues.  I missed tonight's update (I think there's yet another rule or more
> > that will have zeroing issues)... so I'll look at it tomorrow.
> That would keep the update from breaking things at the cost of possibly
> blocking rule and score updates for extended periods while the rescorer is
> misbehaving.

Yeah.  I'm not sure what's better.  No update or people complaining about the update.

> If we can put a check into the rescorer that does not allow scores to go to
> zero on published rules (taking into account net vs. non-net as my hack does
> not, if that's indeed critical) then that should be done. As I said earlier, if
> a rule is performing well enough to be published then it should not be assigned
> a zero score.

Perhaps we could use the non-mutable flag to stop the GA from doing this.  Really, though, in promoting the rules we don't look at whether there's any point in having the rule (super-redundant, or whatever).

> Any idea why the scores are jumping around so much? That bothers me too, though
> it's not as problematic as scores going to zero for no discernible reason.

I'm not sure they're really changing that much.  A few tenths here and there and the occasional large jumps are to be expected due to the nature of the GA.
Comment 7 Daryl C. W. O'Shea 2010-11-06 18:14:33 UTC
(In reply to comment #0)
> The nightly masscheck which generates the scores for production rules updates
> has recently started generating 0 scores for rules which have dependencies
> (leading to warnings).
> Examples:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6504
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6509

OK, I've confirmed that this doesn't prevent sa-update from working, or from SA from working.  In fact, I would say this (or at least the possibility for it) is by design.  The GA is deciding (apparently) that rules aren't worth having (due to overlaps/whatever that makes the rules have no real bearing on the outcome across the entire corpus) even though they are technically good enough to be promoted.

Historically, before sa-update, this wasn't an issue since we just wouldn't ship those rules or rules that depended on them.  Updates via sa-update open up the possibility of problems, it seems (although I'm not sure why yet, maybe something is broken).

(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > (In reply to comment #1)
> > > > +  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf >
> > > > trunk/rules/72_scores.cf
> > > 
> > > Yeah, that won't work well.  It'll enable net rules in non-net scoresets.
> > Would that cause problems given the presence of "tflags net" on net rules?
> > Surely SA does not rely on the score being zero to disable net rules when run
> > in local mode...
> Good point, investigating now.

Looks like the 'sed' is actually fine as long as net rules have their tflags net (which, if they don't is a different problem).

I'm looking at getting the GA to not 0 them, or if it does, make sure there are no dependencies.
Comment 8 Daryl C. W. O'Shea 2010-11-06 20:57:43 UTC
[dos@dev new-rule-score-gen]$ svn ci -m "bug 6510: temporarily force all generated scores to be non-zero"
Sending        new-rule-score-gen/do-nightly-rescore-example
Sending        new-rule-score-gen/merge-scoresets
Transmitting file data ..
Committed revision 1032191.
[dos@dev new-rule-score-gen]$

I've temporarily added code to force all generated scores to be non-zero and mark the scores-set# files to indicate that the score was changed from 0.000 to 0.0001 with a "# forced non-zero" notation.


I've discovered that most, but not all, of the rules that have problems have been promoted due to "tflags publish"... which means they weren't necessarily selected for promotion due to mass-check results (which may explain their low or zero scores).

I've also found that the script generate-new-scores checks out a copy of tags/spamassassin_release_3_2_0_rc_2/rules from the svn repos.  I'm not sure yet what it uses it for and what it should use now that we've split rules out of the base release.
Comment 9 Tom Schulz 2010-11-08 11:15:39 UTC
> I've discovered that most, but not all, of the rules that have problems have
> been promoted due to "tflags publish"... which means they weren't necessarily
> selected for promotion due to mass-check results (which may explain their low
> or zero scores).
No doubt a rule can be or become useless by itself but still be useful as part
of another rule. Perhaps the fix should be to allow meta rules to use a rule with
a score of 0 without a warning.
Comment 10 Kevin A. McGrail 2010-11-08 11:20:42 UTC
(In reply to comment #9)
> > I've discovered that most, but not all, of the rules that have problems have
> > been promoted due to "tflags publish"... which means they weren't necessarily
> > selected for promotion due to mass-check results (which may explain their low
> > or zero scores).
> No doubt a rule can be or become useless by itself but still be useful as part
> of another rule. Perhaps the fix should be to allow meta rules to use a rule
> with
> a score of 0 without a warning.

If a rule has a score of 0, I've always treated that as 100% disabled and we would need a flag to disable a rule.

KAM
Comment 11 John Wilcock 2010-11-08 15:08:08 UTC
(In reply to comment #9)
> No doubt a rule can be or become useless by itself but still be useful as part
> of another rule. Perhaps the fix should be to allow meta rules to use a rule
> with a score of 0 without a warning.

Or perhaps more usefully, automatically prepend them with __.
Comment 12 Merijn van den Kroonenberg 2017-10-27 13:35:34 UTC
> Committed revision 1032191.
> [dos@dev new-rule-score-gen]$
> 
> I've temporarily added code to force all generated scores to be non-zero and
> mark the scores-set# files to indicate that the score was changed from 0.000
> to 0.0001 with a "# forced non-zero" notation.
> 

I checked svn and right after this patch was applied in /rulesrc/sandbox/dos/new-rule-score-gen/ the scoresets had this change, so it went live right away.

this tells us that even in nov. 2010 the /rulesrc/sandbox/dos/new-rule-score-gen/ was used instead of /masses/rule-update-score-gen/ which is something we (davej mostly) ran into when reconstructing the new massgen servers.

But I am also wondering why this issue is still open? Is the behaviour still incorrect?

> 
> I've discovered that most, but not all, of the rules that have problems have
> been promoted due to "tflags publish"... which means they weren't
> necessarily selected for promotion due to mass-check results (which may
> explain their low or zero scores).
>