Bug 3627 - [review] score generation for 3.0.0
Summary: [review] score generation for 3.0.0
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Score Generation (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P1 blocker
Target Milestone: 3.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 3622 (view as bug list)
Depends on: 3584 3662
Blocks:
  Show dependency tree
 
Reported: 2004-07-21 19:53 UTC by Justin Mason
Modified: 2004-08-09 12:11 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
comments T_ rules so they do not show up in stats etc. patch None Justin Mason [HasCLA]
fix several bugs in masses patch None Justin Mason [HasCLA]
scores for set 0 and set 1 patch None Justin Mason [HasCLA]
fixed scores for set 0 and set 1 patch None Justin Mason [HasCLA]
scores, now without 0's in set2/3 patch None Justin Mason [HasCLA]
fixes bugs from comment 12 patch None Justin Mason [HasCLA]
Generated scores for sets 2 and 3. Statistics files for sets 0 through 3. patch None Henry Stern [HasCLA]
Revised generated scores with manual corrections for HABEAS and BSP. patch None Henry Stern [HasCLA]
use "End of generated scores" divider as immutable/mutable flag patch None Daniel Quinlan [HasCLA]
Latest (final?) revision of generated scores patch None Henry Stern [HasCLA]
new rewrite-cf-with-new-scores patch None Daniel Quinlan [HasCLA]
Patch created with fixed rewrite-cf-with-new-scores patch None Henry Stern [HasCLA]
One more time patch None Henry Stern [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2004-07-21 19:53:21 UTC
This bug is to track progress on 3.0.0's score generation process...
Comment 1 Justin Mason 2004-07-21 19:53:53 UTC
fixing milestone etc.
Comment 2 Justin Mason 2004-07-21 20:17:34 UTC
Created attachment 2167 [details]
comments T_ rules so they do not show up in stats etc.

first patch for the scores; comment the T_ testing rules while we're doing
this.
Comment 3 Justin Mason 2004-07-21 20:20:34 UTC
Created attachment 2168 [details]
fix several bugs in masses

OK, a few bugs that I threw up when working on this.

- score-ranges-from-freqs: if a rule has a score of 0, it will be forced to 0
  in the next perceptron run.	IMO, this is not correct; it should be allowed
  to use non-0 scores.	 I haven't changed this, but I strongly think it
  *should* be changed. thoughts?

- logs-to-c: rules that are immutable with a score of 0 are not recorded
  in the logs; therefore, will not be used for score generation at all.

  Unfortunately, 'score-ranges-from-freqs' will set a rule to 0 and
  immutable if it hits less than 0.01%.  This then means that the rule
  is not recorded, and when "rewrite-cf-with-new-scores" runs, it uses
  the default score (1.0) for those rules. oops!

  Fixed this by assuming that if there's a rule in the "generated scores"
  section, but the GA didn't use it, its score should be set to 0.
  And if the rule exists according to "parse-rules-for-masses", but
  the GA didn't use it and there's no score, the same applies.
Comment 4 Justin Mason 2004-07-21 20:21:01 UTC
oh, attachment 2168 [details] depends on the code from Henry's mods in bug 3584 btw.
Comment 5 Justin Mason 2004-07-21 20:28:05 UTC
Created attachment 2169 [details]
scores for set 0 and set 1

OK, here's the results of getting the scores and STATISTICS files updated from
Henry's results for set 0 and set 1.

Manual changes:

- locked RCVD_IN_BSP_TRUSTED to -4.3, it's previous setting, as the Perceptron
  had given it a score of -0.438.  (presumably that worked for the mails
  in the corpus, but isn't much use as a whitelisting value.)

- ditto for HABEAS_USER, -8 instead of -0.695 -- also set "userconf" on the
  rule to avoid that in future.  (might be worth doing to RCVD_IN_BSP_TRUSTED
  as well.)

- set the low-hitrate rules that had been disabled by score-ranges-from-freqs  
     to score 0.001, so they'll still be used for the next 2 mass-checks.

- note: SPF_FAIL has been disabled.  it got a truly crappy hitrate.
  We have a chicken-and-egg problem here, in that SPF failures won't be noticed

  unless SpamAssassin gives it a score, though....
Comment 6 Daniel Quinlan 2004-07-21 21:51:38 UTC
+1

I think it makes little sense for HABEAS_USER to have a different score from
RCVD_IN_BSP_TRUSTED.  (Incidentally, note that HABEAS_USER is not using
-trusted, it probably should.)

Note: the HABEAS_USER ham hit rate is artificially high because Craig Hughes
is listed and I've corresponded directly with Habeas.  In contrast, Bonded
Sender doesn't list IronPort.  In fact, it doesn't list anyone who has not put
up a bond.  (*All* of my hits are Craig Hughes and Habeas personnel.)

Re: SPF_FAIL, I'd give it a 0.001 score since it's basically free if SPF is
run at all.

Re: the bugs you note, let's track those separately since they'll take more
thought to fix right, I think.
Comment 7 Justin Mason 2004-07-21 23:03:16 UTC
Subject: Re:  score generation for 3.0.0 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> I think it makes little sense for HABEAS_USER to have a different score from
> RCVD_IN_BSP_TRUSTED.  (Incidentally, note that HABEAS_USER is not using
> -trusted, it probably should.)

Good point -- it should use -trusted.

However on the other issue, it has historically got higher points than
BSP; it's worth noting also that Habeas are still required confirmed
opt-in, whereas BSP have relaxed that.  IMO, that gives it a bit of
ground for a manually-set, higher whitelisting score.

> Re: SPF_FAIL, I'd give it a 0.001 score since it's basically free if SPF is
> run at all.

yeah, agreed.  pity about the crappy hitrate :(

> Re: the bugs you note, let's track those separately since they'll take more
> thought to fix right, I think.

well, the 'rewrite-cf' bug has been fixed already in patch 2168.
the others probably deserve their own bug alright.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFA/1h9QTcbUG5Y7woRAgoZAJ4g2RpEmLRMzFYkrC1ItRy2X+fi7ACfdiwn
AOV1AkxwXw1vUi0vAleqhRo=
=27BB
-----END PGP SIGNATURE-----

Comment 8 Justin Mason 2004-07-22 12:07:23 UTC
please review, esp. the scores.  This has to go out today as pre3...
Comment 9 Michael Parker 2004-07-22 12:18:38 UTC
+1
Comment 10 Justin Mason 2004-07-22 14:50:47 UTC
....aaand I've just noticed that "make test" fails because of some over-zealous
score-checking:

warning: score set for non-existent rule LEY_CHILE_ES_02
warning: score set for non-existent rule CONTRA_REEMBOLSO_ES
warning: score set for non-existent rule CLICK_ES
warning: score set for non-existent rule REMOVE_ES_05
warning: score set for non-existent rule NO_MAS_MAIL_1_ES
warning: score set for non-existent rule PORNO_GRATIS_ES
warning: score set for non-existent rule GRATIS_ES
warning: score set for non-existent rule REGALO_ES

the _ES ruleset doesn't have scores set, therefore the new scripts (which don't
understand locales) output scores of 1.0 (their default).  however, the "score"
lines are in no locale, whereas the rule definitions are in the "lang es"
locale.  so --lint fails.

ick...
Comment 11 Justin Mason 2004-07-22 18:03:15 UTC
Created attachment 2170 [details]
fixed scores for set 0 and set 1

this scores file removes the scores for the _ES rules that make "make test"
barf, and the score for AWL, which seemed to break whitelist_addrs.t.  also
included: new t/data/spam files for several tests that failed using the new
scoreset.
Comment 12 Justin Mason 2004-07-22 18:04:46 UTC
BTW, there's a couple of bugs remaining in the masses script that need to be
watched out for:

- the new "rewrite-cf-with-new-scores" will add a score for AWL.  this seems to
break the whitelist_addrs.t test.

- it also adds scores for "lang xx" locale-specific rules.  these also need to
be omitted or "make test" will break when it does a --lint.
Comment 13 Justin Mason 2004-07-22 18:15:21 UTC
Created attachment 2171 [details]
scores, now without 0's in set2/3

another tweak; from IRC:

<felicity> jmason23, one thing with the scores is that there shouldn't be any
"generated" 0.0 scores.
<felicity> especially for sets 2 and 3.
<felicity> BARELY_LEGAL, for instance.
<felicity> DIRECT_EMAIL, etc.

so now there aren't any 0.000s in sets 2 or 3.
Comment 14 Michael Parker 2004-07-22 18:28:37 UTC
+1
Comment 15 Daniel Quinlan 2004-07-22 19:06:56 UTC
+1

Are the STATISTICS files generated against the validation set?  They should
be, I think.

Anyhow, this is fine for now, we can tweak STATISTICS later.
Comment 16 Theo Van Dinter 2004-07-22 19:32:45 UTC
+1 :)
Comment 17 Justin Mason 2004-07-22 19:41:45 UTC
ok, applied! pre3 here we come...
Comment 18 Justin Mason 2004-07-23 17:46:02 UTC
oops.  that shouldn't have been closed, there's still an issue open (as in
comment 12).

BTW re Daniel's comment; yep, the stats are against the validation set, but the
hit-frequencies are against the full mass-check.
Comment 19 Justin Mason 2004-07-28 23:58:31 UTC
Created attachment 2195 [details]
fixes bugs from comment 12

this fixes these bugs:

'- the new "rewrite-cf-with-new-scores" will add a score for AWL.  this seems
to
break the whitelist_addrs.t test.
- it also adds scores for "lang xx" locale-specific rules.  these also need to
be omitted or "make test" will break when it does a --lint.'

In addition, it'll set 'tflags net' rules scores to 0 for scoresets 0 and 2,
instead of defaulting them to 1 (which makes no sense).  see URIBL_OB_SURBL
previously.
Comment 20 Henry Stern 2004-08-03 13:14:51 UTC
*** Bug 3622 has been marked as a duplicate of this bug. ***
Comment 21 Henry Stern 2004-08-03 13:20:21 UTC
Created attachment 2207 [details]
Generated scores for sets 2 and 3.  Statistics files for sets 0 through 3.

It seems that some of the scores that are supposed to be immutable in this
scoreset have been made mutable.  Aside from that, these scores are ready.

Any thoughts on why HABEAS_USER and HABEAS_INFRINGER would have been made
mutable?
Comment 22 Daniel Quinlan 2004-08-03 13:28:31 UTC
> Any thoughts on why HABEAS_USER and HABEAS_INFRINGER would have been made
> mutable?

Hmmm... I thought they were actually mutable?  It's just that when they
didn't have enough hits, we'd have to enter our own scores.  Maybe I'm
misremembering.

RCVD_IN_BSP_* also were changed.
Comment 23 Daniel Quinlan 2004-08-03 13:31:47 UTC
I think Justin's patch really is needed before rewriting the scores: the
bugs it fixes (I hope) appear here: AWL is rewritten, other languages scores are
rewritten, etc.
Comment 24 Theo Van Dinter 2004-08-03 13:42:47 UTC
Subject: Re:  [review] score generation for 3.0.0

On Tue, Aug 03, 2004 at 01:28:32PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Hmmm... I thought they were actually mutable?  It's just that when they
> didn't have enough hits, we'd have to enter our own scores.  Maybe I'm
> misremembering.

HABEAS and BSP are supposed to have non-mutable scores so that we
"guarantee" a benefit to the respective companies.

Comment 25 Justin Mason 2004-08-03 13:49:39 UTC
Subject: Re:  [review] score generation for 3.0.0 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> > Any thoughts on why HABEAS_USER and HABEAS_INFRINGER would have been made
> > mutable?
> 
> Hmmm... I thought they were actually mutable?  It's just that when they
> didn't have enough hits, we'd have to enter our own scores.  Maybe I'm
> misremembering.

IIRC, all of the scores outside of the "generated scores" section were
initially nonmutable.

Then at some point someone changed the scripts to allow those to change as
well, but that never quite got to the "cross the i's, dot the t's" stage,
in that since then we've often had to hand-tweak the scores that were not
supposed to be mutable, after GA runs.

This, if you ask me, was a bug ;)

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFBD/oxQTcbUG5Y7woRAi79AKC9wX7FSWS0PdLclgDFQf/0q+ahrACgvfHq
WaVvn6dSjfDCV+1zm2rVXww=
=nR0+
-----END PGP SIGNATURE-----

Comment 26 Henry Stern 2004-08-03 13:53:00 UTC
Created attachment 2208 [details]
Revised generated scores with manual corrections for HABEAS and BSP.
Comment 27 Justin Mason 2004-08-03 15:30:11 UTC
wierd -- there's a lot of stuff I thought I'd fixed there; the _ES rules are in
there, for example.

also: HTML_MESSAGE is being set to 0.  RAZOR2_CF_RANGE_51_100 is getting 0,
which seems harsh.  the bayes scores seem quite small, too.
Comment 28 Justin Mason 2004-08-03 18:29:03 UTC
we need to apply patch 2195 to fix those rewrite bugs.   In fact, we need to fix
the rewriting bugs in general....

votes please! ;)
Comment 29 Daniel Quinlan 2004-08-03 18:55:28 UTC
Created attachment 2211 [details]
use "End of generated scores" divider as immutable/mutable flag

This will simply the fix for the bugs mentioned in comment 12 and generally
make things saner.
Comment 30 Justin Mason 2004-08-03 19:01:21 UTC
+1

(background: myself and Dan thought through the situation of hacking-in "this
rule shouldn't be mutable" code in att 2195, and realized it makes more sense
and is a lot more maintainable to just return to the pre-2.50 situation of using
the location in the scores file as the indicator as to whether a rule's score is
mutable or not.)
Comment 31 Justin Mason 2004-08-03 19:29:11 UTC
btw, both patches 2195 and 2211 are still applicable and need votes
Comment 32 Daniel Quinlan 2004-08-03 20:04:33 UTC
+1 2195
+1 2211

more work is needed after 2195:
 - consolidate zeroing of "off" rules (Bayes vs. network)
 - remove any code not needed from 2195
Comment 33 Michael Parker 2004-08-03 20:20:59 UTC
Subject: Re:  [review] score generation for 3.0.0

+1 for both

Comment 34 Justin Mason 2004-08-03 20:28:17 UTC
ok, all the code is needed from 2195, so I applied the lot.
Comment 35 Justin Mason 2004-08-03 23:34:55 UTC
right -- this bug is getting to be a mess, with so many patches and attachments
of various type.  to avoid future messiness, I suggest bugs that affect the
masses scripts, but are not actual *scores* to be applied for 3.0.0, be created
as their own bugs (like the new bug 3662).
Comment 36 Justin Mason 2004-08-04 19:53:59 UTC
no longer in review.  once the patch from 3662 is applied, we're good to go on
redoing the score generation (AFAICS).
Comment 37 Justin Mason 2004-08-04 20:12:23 UTC
ok -- that's the lot -- the masses scripts should now all work sanely. ;)
Comment 38 Henry Stern 2004-08-06 14:59:47 UTC
Created attachment 2224 [details]
Latest (final?) revision of generated scores
Comment 39 Theo Van Dinter 2004-08-06 15:05:02 UTC
Subject: Re:  score generation for 3.0.0

On Fri, Aug 06, 2004 at 02:59:48PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Created an attachment (id=2224)
>  --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2224&action=view)
> Latest (final?) revision of generated scores

Hrm.

+score 0 0
+score  0

??

+score RCVD_IN_BL_SPAMCOP_NET 0 0 0 1.216
+score RCVD_IN_DSBL 0 0 0 3.805
+score RCVD_IN_NJABL_DUL 0 0 0 0.088
+score RCVD_IN_NJABL_PROXY 0 0 0 0.438
+score RCVD_IN_NJABL_RELAY 0 0 0 1.397
+score RCVD_IN_NJABL_SPAM 0 0 0 1.841
+score RCVD_IN_RFC_IPWHOIS 0 0 0 1.664
+score RCVD_IN_RSL 0 0 0 1.720
+score RCVD_IN_SBL 0 0 0 0.107
+score RCVD_IN_SORBS_DUL 0 0 0 1.987
+score RCVD_IN_SORBS_HTTP 0 0 0 0.043
+score RCVD_IN_SORBS_MISC 0 0 0 0.338
+score RCVD_IN_SORBS_SMTP 0 0 0 2.493
+score RCVD_IN_SORBS_SOCKS 0 0 0 2.054
+score RCVD_IN_SORBS_WEB 0 0 0 0.007
+score RCVD_IN_SORBS_ZOMBIE 0
+score RCVD_IN_XBL 0 0 0 3.076
+score URIBL_AB_SURBL 0 0 0 0.417
+score URIBL_OB_SURBL 0 0 0 3.213
+score URIBL_PH_SURBL 0 0 0 2.000
+score URIBL_SBL 0 0 0 0.996
+score URIBL_SC_SURBL 0 0 0 4.263
+score URIBL_WS_SURBL 0 0 0 1.462

0 in set 1?!?  there's a bunch of rules which are 0'ed in some sets that
just shouldn't be.

Comment 40 Daniel Quinlan 2004-08-09 01:13:25 UTC
Created attachment 2227 [details]
new rewrite-cf-with-new-scores

This should fix the rewriting bugs.  In addition,if there are further bugs,
they'll be easier to find now. I think this is a lot easier to follow.

Other than fixing bugs, it adds one new feature: comment tags.	These allow
us to identify rules that are never rewritten by the perceptron and act
on them by either removing them or moving them to the non-gen section of
the file (for example, NO_DNS_FOR_FROM might want to be moved with the score
still set to zero since we plan on fix it later).

It's a major rewrite, so I suggest patching and looking at the result
(and compare it to the original).

If there are any minor issues, let's +1 and fix from this.  Hint.  ;-)
Comment 41 Daniel Quinlan 2004-08-09 01:14:46 UTC
Henry, I'll let you submit the rewritten scores patch.  I'm about to collapse.

Let me know if you have any problems with the new rewrite script.
Comment 42 Henry Stern 2004-08-09 05:32:14 UTC
Created attachment 2228 [details]
Patch created with fixed rewrite-cf-with-new-scores

I really hope that this is the last time that we need to do this. ;)
Comment 43 Justin Mason 2004-08-09 11:52:40 UTC
+1 on 2227; haven't looked at the scores yet...
Comment 44 Henry Stern 2004-08-09 11:54:40 UTC
+1 on 2227
Comment 45 Henry Stern 2004-08-09 11:57:00 UTC
Created attachment 2229 [details]
One more time

It would have helped had I applied the right patch.
Comment 46 Daniel Quinlan 2004-08-09 13:19:53 UTC
applied 2227
Comment 47 Daniel Quinlan 2004-08-09 14:07:34 UTC
+1 2229
Comment 48 Theo Van Dinter 2004-08-09 14:42:03 UTC
+1 on 2229
Comment 49 Justin Mason 2004-08-09 15:26:09 UTC
+1 on 2299 as well.  looks very nice! although I'm a bit confused; are sets 0,1
using a lot more data than 2,3?

we'll need to do a final pass to zero low-scoring rules, ie. 

score HTML_SHOUTING9 0 0.016 0.041 0.000 # n=0

I think...
Comment 50 Henry Stern 2004-08-09 15:28:46 UTC
Applied 2299.  Theo didn't submit his spamtrap data for sets 2,3 which is why it
is smaller.  We should make some improvements to mass-check (caching of rule
hits) to make it easier to run the bayes-(no)?net runs in the future.
Comment 51 Theo Van Dinter 2004-08-09 15:38:12 UTC
Subject: Re:  [review] score generation for 3.0.0

On Mon, Aug 09, 2004 at 03:26:10PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> +1 on 2299 as well.  looks very nice! although I'm a bit confused; are sets 0,1
> using a lot more data than 2,3?

Yeah.  At a minimum, for sets 0,1, I included my spamtraps for ~2-3 weeks.
That was an additional 300k messages.

I didn't have the time to run those through for sets 2 + 3.  One of
the goals for 3.1 is to get mass-check to understand that sets 2+3
are the same as 0+1, but it should add bayes, then do the right thing.
The network tests take _forever_. :(

Comment 52 Justin Mason 2004-08-09 15:58:27 UTC
so is this now closeable?
Comment 53 Daniel Quinlan 2004-08-09 20:11:09 UTC
fixed