SA Bugzilla – Bug 4099
ALL_TRUSTED changed semantics since 3.0.0
Last modified: 2005-04-26 03:39:38 UTC
mass-check results: 0.001 0.0000 0.0049 0.000 0.44 -2.40 ALL_TRUSTED:theo 0.004 0.0000 0.0106 0.000 0.45 -2.40 ALL_TRUSTED:quinlan 0.004 0.0000 0.0138 0.000 0.45 -2.40 ALL_TRUSTED:parkerm 0.025 0.0000 0.0500 0.000 0.49 -2.40 ALL_TRUSTED:jm 0.000 0.0000 0.0000 0.500 0.00 -2.40 ALL_TRUSTED:hbayle 0.253 0.0000 0.4580 0.000 0.55 -2.40 ALL_TRUSTED:daf 0.035 0.0000 0.2589 0.000 0.48 -2.40 ALL_TRUSTED:bzoetekouw
3.1.0 milestone
Subject: Re: New: ALL_TRUSTED broken: barely hitting any ham On Sun, Jan 23, 2005 at 04:28:14PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > 0.001 0.0000 0.0049 0.000 0.44 -2.40 ALL_TRUSTED:theo > 0.004 0.0000 0.0106 0.000 0.45 -2.40 ALL_TRUSTED:quinlan > 0.004 0.0000 0.0138 0.000 0.45 -2.40 ALL_TRUSTED:parkerm > 0.025 0.0000 0.0500 0.000 0.49 -2.40 ALL_TRUSTED:jm > 0.000 0.0000 0.0000 0.500 0.00 -2.40 ALL_TRUSTED:hbayle > 0.253 0.0000 0.4580 0.000 0.55 -2.40 ALL_TRUSTED:daf > 0.035 0.0000 0.2589 0.000 0.48 -2.40 ALL_TRUSTED:bzoetekouw So is the issue that we think it's not hitting as much as it should, or that it doesn't hit much at all? What surprises me here is how high the numbers for daf and bzoetekouw are.
Subject: Re: ALL_TRUSTED broken: barely hitting any ham > So is the issue that we think it's not hitting as much as it should, > or that it doesn't hit much at all? Either, both. It used to hit much more. The rule broke and stopped hitting as much as it should and it doesn't hit much at all.
Subject: Re: ALL_TRUSTED broken: barely hitting any ham On Sun, Jan 23, 2005 at 06:37:37PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > Either, both. It used to hit much more. The rule broke and stopped > hitting as much as it should and it doesn't hit much at all. I'm not really surprised by the current numbers. My corpus has very little mail sent completely through trusted hosts.
Subject: Re: ALL_TRUSTED broken: barely hitting any ham > I'm not really surprised by the current numbers. My corpus has very > little mail sent completely through trusted hosts. When I previously looked at rule efficacy changes (RANK changes) between 3.0 and today and ALL_TRUSTED was the one rule that declined the most (by a rather large margin) and thus I opened this bug. 3.0 mass-check: 1.045 0.0138 3.0537 0.005 0.67 -2.40 ALL_TRUSTED submit-3.0.0-sets01/{spam,ham}-nobayes-net-theo.log: 0.325 0.0030 2.6253 0.001 0.68 -2.82 ALL_TRUSTED I had a similar drop and my corpus composition has not significantly changed and certainly not everyone's ALL_TRUSTED email corpus-by-volume has dropped by a factor of 500x. I am a developer of SpamAssassin, by the way.
ok, I've mass-checked 94142 ham messages with 3.0.0 and trunk. the differences are entirely messages where there are NO received headers -- in 3.0.0 it's: my ($self) = @_; if ($self->{num_relays_untrusted} > 0) { return 0; } else { return 1; } in 3.1.0 it's: my ($self) = @_; return $self->{num_relays_trusted} && !$self->{num_relays_untrusted} && !$self->{num_relays_unparseable}; by changing that to my ($self) = @_; return $self->{num_relays_trusted} >= 0 && !$self->{num_relays_untrusted} && !$self->{num_relays_unparseable}; I get *identical* hits on that 90k-message corpus to what I get with a mass-check from 3.0.0.
Created attachment 2630 [details] patch to restore 3.0.0 behaviour here's the patch that restore's 3.0.0's behaviour. I can't recall why we might have changed it to what it is in current SVN, but I suggest we change it *back* ;)
ok, ignore the patch -- I've added some new T_ rules with similar effect instead.
gotcha: r56643 | felicity | 2004-11-04 18:54:04 -0800 (Thu, 04 Nov 2004) | 1 line bug 3949: make ALL_TRUSTED test for 'only and at least 1 trusted relay', not 'no untrusted' which could mean no relays at all (just because, or failure to parse headers, or ...)
http://bugzilla.spamassassin.org/ruleqa?s_details=on&s_all=on&date=20050205&rule=T_NO_RELAYS&g=Change 0.629 0.0325 2.6482 0.012 0.62 -0.01 T_NO_RELAYS 0.302 0.1091 1.1949 0.084 0.34 -0.01 T_NO_RELAYS:bzoetekouw 0.000 0.0000 0.0000 0.500 0.47 -0.01 T_NO_RELAYS:daf 0.000 0.0000 0.0000 0.500 0.48 -0.01 T_NO_RELAYS:jm 0.000 0.0000 0.0000 0.500 0.46 -0.01 T_NO_RELAYS:parkerm 2.484 0.0021 6.6431 0.000 0.83 -0.01 T_NO_RELAYS:quinlan 0.127 0.0000 0.9095 0.000 0.49 -0.01 T_NO_RELAYS:rODbegbie 0.006 0.0070 0.0000 1.000 0.39 -0.01 T_NO_RELAYS:theo quinlan, theo, can you check your hits? http://bugzilla.spamassassin.org/ruleqa?s_details=on&s_all=on&date=20050205&rule=%2FT_UNPARSEABLE&g=Change 22.873 26.7186 9.8548 0.731 0.52 0.01 T_UNPARSEABLE_RELAY 19.822 22.2412 8.6383 0.720 0.54 0.01 T_UNPARSEABLE_RELAY:bzoetekouw 12.888 16.3503 5.5636 0.746 0.59 0.01 T_UNPARSEABLE_RELAY:daf 10.931 20.6582 1.2127 0.945 0.70 0.01 T_UNPARSEABLE_RELAY:jm 14.652 20.6082 1.3176 0.940 0.64 0.01 T_UNPARSEABLE_RELAY:parkerm 12.105 16.8323 4.1842 0.801 0.53 0.01 T_UNPARSEABLE_RELAY:quinlan 99.709 99.9894 97.9861 0.505 0.45 0.01 T_UNPARSEABLE_RELAY:rODbegbie 13.349 14.5281 4.0181 0.783 0.55 0.01 T_UNPARSEABLE_RELAY:theo not so interesting IMO. but still, ham hits means our parser is missing something (quite a lot in Rod's case)
Subject: Re: ALL_TRUSTED broken: barely hitting any ham On Sun, Feb 06, 2005 at 11:34:42PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > 0.006 0.0070 0.0000 1.000 0.39 -0.01 T_NO_RELAYS:theo > > quinlan, theo, can you check your hits? The ham all legitimately doesn't relay. It's all on the local machine and has a single Received header: Received: by eclectic.kluge.net (Postfix, from userid 501) id 93E0741469C; Sun, 12 Dec 2004 12:27:11 -0500 (EST) The spam hits are all on mails that shouldn't be in the corpus, unfortunately (blowback-related). :( Have to work out a way to filter that stuff. > http://bugzilla.spamassassin.org/ruleqa?s_details=on&s_all=on&date=20050205&rule=%2FT_UNPARSEABLE&g=Change > > 13.349 14.5281 4.0181 0.783 0.55 0.01 T_UNPARSEABLE_RELAY:theo > > not so interesting IMO. but still, ham hits means our parser is missing > something (quite a lot in Rod's case) Hrm. Is there an easy script, for instance, that would report out the lines that aren't parsing? Going through a couple by hand: Received: from dc-mail-3102.iad3.amazon.com by mail-store-2001.amazon.com with ESMTP (peer crosscheck: dc-mail-3102.iad3.amazon.com) Received: from GWGC6-MTA by gc6.jefferson.co.us with Novell_GroupWise; Tue, 30 Nov 2004 10:09:15 -0700 Received: from no.name.available by [165.224.43.143] via smtpd (for [165.224.216.89]) with ESMTP; Fri, 28 Jan 2005 13:06:39 -0500 Received: from no.name.available by [165.224.216.88] via smtpd (for lists.sourceforge.net [66.35.250.206]) with ESMTP; Fri, 28 Jan 2005 15:42:30 -0500
Here's the Received headers from a randomly plucked email in my corpus: Received: (qmail 22147 invoked by uid 526); 6 Feb 2005 21:11:38 -0000 Received: from 156.56.111.196 by blazing.arsecandle.org (envelope-from <gentoo-announce-return-530-rod=arsecandle.org@lists.gentoo.org>, uid 502) with qmail-scanner-1.24 (clamdscan: 0.80/594. f-prot: 4.4.2/3.14.11. Clear:RC:0(156.56.111.196):. Processed in 0.288806 secs); 06 Feb 2005 21:11:38 -0000 DomainKey-Status: no signature Received: from lists.gentoo.org (HELO parrot.gentoo.org) (156.56.111.196) by blazing.arsecandle.org with (DHE-RSA-AES256-SHA encrypted) SMTP; 6 Feb 2005 21:11:37 -0000 Received: (qmail 3988 invoked by uid 89); 6 Feb 2005 21:11:12 +0000
Subject: Re: ALL_TRUSTED changed semantics since 3.0.0 > Hrm. Is there an easy script, for instance, that would report out the lines > that aren't parsing? Suggestion: in debug mode log unparsable received headers as literally as possible so that someone can see what is getting missed. Possibly include some sort of link (the message id?) to the original message.
ok, the cases listed here are now taken care of; also, unparseable headers are output in debug output. as a bonus, qmail-scanner's "envelope-from" data is added in to the correct relays entry, too. ;) r157200
see bug 4283 for a proposal to make these T_ rules informational only