SA Bugzilla – Bug 3627
[review] score generation for 3.0.0
Last modified: 2004-08-09 12:11:09 UTC
This bug is to track progress on 3.0.0's score generation process...
fixing milestone etc.
Created attachment 2167 [details] comments T_ rules so they do not show up in stats etc. first patch for the scores; comment the T_ testing rules while we're doing this.
Created attachment 2168 [details] fix several bugs in masses OK, a few bugs that I threw up when working on this. - score-ranges-from-freqs: if a rule has a score of 0, it will be forced to 0 in the next perceptron run. IMO, this is not correct; it should be allowed to use non-0 scores. I haven't changed this, but I strongly think it *should* be changed. thoughts? - logs-to-c: rules that are immutable with a score of 0 are not recorded in the logs; therefore, will not be used for score generation at all. Unfortunately, 'score-ranges-from-freqs' will set a rule to 0 and immutable if it hits less than 0.01%. This then means that the rule is not recorded, and when "rewrite-cf-with-new-scores" runs, it uses the default score (1.0) for those rules. oops! Fixed this by assuming that if there's a rule in the "generated scores" section, but the GA didn't use it, its score should be set to 0. And if the rule exists according to "parse-rules-for-masses", but the GA didn't use it and there's no score, the same applies.
oh, attachment 2168 [details] depends on the code from Henry's mods in bug 3584 btw.
Created attachment 2169 [details] scores for set 0 and set 1 OK, here's the results of getting the scores and STATISTICS files updated from Henry's results for set 0 and set 1. Manual changes: - locked RCVD_IN_BSP_TRUSTED to -4.3, it's previous setting, as the Perceptron had given it a score of -0.438. (presumably that worked for the mails in the corpus, but isn't much use as a whitelisting value.) - ditto for HABEAS_USER, -8 instead of -0.695 -- also set "userconf" on the rule to avoid that in future. (might be worth doing to RCVD_IN_BSP_TRUSTED as well.) - set the low-hitrate rules that had been disabled by score-ranges-from-freqs to score 0.001, so they'll still be used for the next 2 mass-checks. - note: SPF_FAIL has been disabled. it got a truly crappy hitrate. We have a chicken-and-egg problem here, in that SPF failures won't be noticed unless SpamAssassin gives it a score, though....
+1 I think it makes little sense for HABEAS_USER to have a different score from RCVD_IN_BSP_TRUSTED. (Incidentally, note that HABEAS_USER is not using -trusted, it probably should.) Note: the HABEAS_USER ham hit rate is artificially high because Craig Hughes is listed and I've corresponded directly with Habeas. In contrast, Bonded Sender doesn't list IronPort. In fact, it doesn't list anyone who has not put up a bond. (*All* of my hits are Craig Hughes and Habeas personnel.) Re: SPF_FAIL, I'd give it a 0.001 score since it's basically free if SPF is run at all. Re: the bugs you note, let's track those separately since they'll take more thought to fix right, I think.
Subject: Re: score generation for 3.0.0 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I think it makes little sense for HABEAS_USER to have a different score from > RCVD_IN_BSP_TRUSTED. (Incidentally, note that HABEAS_USER is not using > -trusted, it probably should.) Good point -- it should use -trusted. However on the other issue, it has historically got higher points than BSP; it's worth noting also that Habeas are still required confirmed opt-in, whereas BSP have relaxed that. IMO, that gives it a bit of ground for a manually-set, higher whitelisting score. > Re: SPF_FAIL, I'd give it a 0.001 score since it's basically free if SPF is > run at all. yeah, agreed. pity about the crappy hitrate :( > Re: the bugs you note, let's track those separately since they'll take more > thought to fix right, I think. well, the 'rewrite-cf' bug has been fixed already in patch 2168. the others probably deserve their own bug alright. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFA/1h9QTcbUG5Y7woRAgoZAJ4g2RpEmLRMzFYkrC1ItRy2X+fi7ACfdiwn AOV1AkxwXw1vUi0vAleqhRo= =27BB -----END PGP SIGNATURE-----
please review, esp. the scores. This has to go out today as pre3...
+1
....aaand I've just noticed that "make test" fails because of some over-zealous score-checking: warning: score set for non-existent rule LEY_CHILE_ES_02 warning: score set for non-existent rule CONTRA_REEMBOLSO_ES warning: score set for non-existent rule CLICK_ES warning: score set for non-existent rule REMOVE_ES_05 warning: score set for non-existent rule NO_MAS_MAIL_1_ES warning: score set for non-existent rule PORNO_GRATIS_ES warning: score set for non-existent rule GRATIS_ES warning: score set for non-existent rule REGALO_ES the _ES ruleset doesn't have scores set, therefore the new scripts (which don't understand locales) output scores of 1.0 (their default). however, the "score" lines are in no locale, whereas the rule definitions are in the "lang es" locale. so --lint fails. ick...
Created attachment 2170 [details] fixed scores for set 0 and set 1 this scores file removes the scores for the _ES rules that make "make test" barf, and the score for AWL, which seemed to break whitelist_addrs.t. also included: new t/data/spam files for several tests that failed using the new scoreset.
BTW, there's a couple of bugs remaining in the masses script that need to be watched out for: - the new "rewrite-cf-with-new-scores" will add a score for AWL. this seems to break the whitelist_addrs.t test. - it also adds scores for "lang xx" locale-specific rules. these also need to be omitted or "make test" will break when it does a --lint.
Created attachment 2171 [details] scores, now without 0's in set2/3 another tweak; from IRC: <felicity> jmason23, one thing with the scores is that there shouldn't be any "generated" 0.0 scores. <felicity> especially for sets 2 and 3. <felicity> BARELY_LEGAL, for instance. <felicity> DIRECT_EMAIL, etc. so now there aren't any 0.000s in sets 2 or 3.
+1 Are the STATISTICS files generated against the validation set? They should be, I think. Anyhow, this is fine for now, we can tweak STATISTICS later.
+1 :)
ok, applied! pre3 here we come...
oops. that shouldn't have been closed, there's still an issue open (as in comment 12). BTW re Daniel's comment; yep, the stats are against the validation set, but the hit-frequencies are against the full mass-check.
Created attachment 2195 [details] fixes bugs from comment 12 this fixes these bugs: '- the new "rewrite-cf-with-new-scores" will add a score for AWL. this seems to break the whitelist_addrs.t test. - it also adds scores for "lang xx" locale-specific rules. these also need to be omitted or "make test" will break when it does a --lint.' In addition, it'll set 'tflags net' rules scores to 0 for scoresets 0 and 2, instead of defaulting them to 1 (which makes no sense). see URIBL_OB_SURBL previously.
*** Bug 3622 has been marked as a duplicate of this bug. ***
Created attachment 2207 [details] Generated scores for sets 2 and 3. Statistics files for sets 0 through 3. It seems that some of the scores that are supposed to be immutable in this scoreset have been made mutable. Aside from that, these scores are ready. Any thoughts on why HABEAS_USER and HABEAS_INFRINGER would have been made mutable?
> Any thoughts on why HABEAS_USER and HABEAS_INFRINGER would have been made > mutable? Hmmm... I thought they were actually mutable? It's just that when they didn't have enough hits, we'd have to enter our own scores. Maybe I'm misremembering. RCVD_IN_BSP_* also were changed.
I think Justin's patch really is needed before rewriting the scores: the bugs it fixes (I hope) appear here: AWL is rewritten, other languages scores are rewritten, etc.
Subject: Re: [review] score generation for 3.0.0 On Tue, Aug 03, 2004 at 01:28:32PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > Hmmm... I thought they were actually mutable? It's just that when they > didn't have enough hits, we'd have to enter our own scores. Maybe I'm > misremembering. HABEAS and BSP are supposed to have non-mutable scores so that we "guarantee" a benefit to the respective companies.
Subject: Re: [review] score generation for 3.0.0 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > Any thoughts on why HABEAS_USER and HABEAS_INFRINGER would have been made > > mutable? > > Hmmm... I thought they were actually mutable? It's just that when they > didn't have enough hits, we'd have to enter our own scores. Maybe I'm > misremembering. IIRC, all of the scores outside of the "generated scores" section were initially nonmutable. Then at some point someone changed the scripts to allow those to change as well, but that never quite got to the "cross the i's, dot the t's" stage, in that since then we've often had to hand-tweak the scores that were not supposed to be mutable, after GA runs. This, if you ask me, was a bug ;) - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBD/oxQTcbUG5Y7woRAi79AKC9wX7FSWS0PdLclgDFQf/0q+ahrACgvfHq WaVvn6dSjfDCV+1zm2rVXww= =nR0+ -----END PGP SIGNATURE-----
Created attachment 2208 [details] Revised generated scores with manual corrections for HABEAS and BSP.
wierd -- there's a lot of stuff I thought I'd fixed there; the _ES rules are in there, for example. also: HTML_MESSAGE is being set to 0. RAZOR2_CF_RANGE_51_100 is getting 0, which seems harsh. the bayes scores seem quite small, too.
we need to apply patch 2195 to fix those rewrite bugs. In fact, we need to fix the rewriting bugs in general.... votes please! ;)
Created attachment 2211 [details] use "End of generated scores" divider as immutable/mutable flag This will simply the fix for the bugs mentioned in comment 12 and generally make things saner.
+1 (background: myself and Dan thought through the situation of hacking-in "this rule shouldn't be mutable" code in att 2195, and realized it makes more sense and is a lot more maintainable to just return to the pre-2.50 situation of using the location in the scores file as the indicator as to whether a rule's score is mutable or not.)
btw, both patches 2195 and 2211 are still applicable and need votes
+1 2195 +1 2211 more work is needed after 2195: - consolidate zeroing of "off" rules (Bayes vs. network) - remove any code not needed from 2195
Subject: Re: [review] score generation for 3.0.0 +1 for both
ok, all the code is needed from 2195, so I applied the lot.
right -- this bug is getting to be a mess, with so many patches and attachments of various type. to avoid future messiness, I suggest bugs that affect the masses scripts, but are not actual *scores* to be applied for 3.0.0, be created as their own bugs (like the new bug 3662).
no longer in review. once the patch from 3662 is applied, we're good to go on redoing the score generation (AFAICS).
ok -- that's the lot -- the masses scripts should now all work sanely. ;)
Created attachment 2224 [details] Latest (final?) revision of generated scores
Subject: Re: score generation for 3.0.0 On Fri, Aug 06, 2004 at 02:59:48PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > Created an attachment (id=2224) > --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2224&action=view) > Latest (final?) revision of generated scores Hrm. +score 0 0 +score 0 ?? +score RCVD_IN_BL_SPAMCOP_NET 0 0 0 1.216 +score RCVD_IN_DSBL 0 0 0 3.805 +score RCVD_IN_NJABL_DUL 0 0 0 0.088 +score RCVD_IN_NJABL_PROXY 0 0 0 0.438 +score RCVD_IN_NJABL_RELAY 0 0 0 1.397 +score RCVD_IN_NJABL_SPAM 0 0 0 1.841 +score RCVD_IN_RFC_IPWHOIS 0 0 0 1.664 +score RCVD_IN_RSL 0 0 0 1.720 +score RCVD_IN_SBL 0 0 0 0.107 +score RCVD_IN_SORBS_DUL 0 0 0 1.987 +score RCVD_IN_SORBS_HTTP 0 0 0 0.043 +score RCVD_IN_SORBS_MISC 0 0 0 0.338 +score RCVD_IN_SORBS_SMTP 0 0 0 2.493 +score RCVD_IN_SORBS_SOCKS 0 0 0 2.054 +score RCVD_IN_SORBS_WEB 0 0 0 0.007 +score RCVD_IN_SORBS_ZOMBIE 0 +score RCVD_IN_XBL 0 0 0 3.076 +score URIBL_AB_SURBL 0 0 0 0.417 +score URIBL_OB_SURBL 0 0 0 3.213 +score URIBL_PH_SURBL 0 0 0 2.000 +score URIBL_SBL 0 0 0 0.996 +score URIBL_SC_SURBL 0 0 0 4.263 +score URIBL_WS_SURBL 0 0 0 1.462 0 in set 1?!? there's a bunch of rules which are 0'ed in some sets that just shouldn't be.
Created attachment 2227 [details] new rewrite-cf-with-new-scores This should fix the rewriting bugs. In addition,if there are further bugs, they'll be easier to find now. I think this is a lot easier to follow. Other than fixing bugs, it adds one new feature: comment tags. These allow us to identify rules that are never rewritten by the perceptron and act on them by either removing them or moving them to the non-gen section of the file (for example, NO_DNS_FOR_FROM might want to be moved with the score still set to zero since we plan on fix it later). It's a major rewrite, so I suggest patching and looking at the result (and compare it to the original). If there are any minor issues, let's +1 and fix from this. Hint. ;-)
Henry, I'll let you submit the rewritten scores patch. I'm about to collapse. Let me know if you have any problems with the new rewrite script.
Created attachment 2228 [details] Patch created with fixed rewrite-cf-with-new-scores I really hope that this is the last time that we need to do this. ;)
+1 on 2227; haven't looked at the scores yet...
+1 on 2227
Created attachment 2229 [details] One more time It would have helped had I applied the right patch.
applied 2227
+1 2229
+1 on 2229
+1 on 2299 as well. looks very nice! although I'm a bit confused; are sets 0,1 using a lot more data than 2,3? we'll need to do a final pass to zero low-scoring rules, ie. score HTML_SHOUTING9 0 0.016 0.041 0.000 # n=0 I think...
Applied 2299. Theo didn't submit his spamtrap data for sets 2,3 which is why it is smaller. We should make some improvements to mass-check (caching of rule hits) to make it easier to run the bayes-(no)?net runs in the future.
Subject: Re: [review] score generation for 3.0.0 On Mon, Aug 09, 2004 at 03:26:10PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > +1 on 2299 as well. looks very nice! although I'm a bit confused; are sets 0,1 > using a lot more data than 2,3? Yeah. At a minimum, for sets 0,1, I included my spamtraps for ~2-3 weeks. That was an additional 300k messages. I didn't have the time to run those through for sets 2 + 3. One of the goals for 3.1 is to get mass-check to understand that sets 2+3 are the same as 0+1, but it should add bayes, then do the right thing. The network tests take _forever_. :(
so is this now closeable?
fixed