Bug 4099 - ALL_TRUSTED changed semantics since 3.0.0
Summary: ALL_TRUSTED changed semantics since 3.0.0
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (Eval Tests) (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P3 minor
Target Milestone: 3.1.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 3949
  Show dependency tree
 
Reported: 2005-01-23 16:28 UTC by Daniel Quinlan
Modified: 2005-04-26 03:39 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
patch to restore 3.0.0 behaviour patch None Justin Mason [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Quinlan 2005-01-23 16:28:12 UTC
mass-check results:

  0.001   0.0000   0.0049    0.000   0.44   -2.40  ALL_TRUSTED:theo
  0.004   0.0000   0.0106    0.000   0.45   -2.40  ALL_TRUSTED:quinlan
  0.004   0.0000   0.0138    0.000   0.45   -2.40  ALL_TRUSTED:parkerm
  0.025   0.0000   0.0500    0.000   0.49   -2.40  ALL_TRUSTED:jm
  0.000   0.0000   0.0000    0.500   0.00   -2.40  ALL_TRUSTED:hbayle
  0.253   0.0000   0.4580    0.000   0.55   -2.40  ALL_TRUSTED:daf
  0.035   0.0000   0.2589    0.000   0.48   -2.40  ALL_TRUSTED:bzoetekouw
Comment 1 Daniel Quinlan 2005-01-23 16:28:58 UTC
3.1.0 milestone
Comment 2 Theo Van Dinter 2005-01-23 18:13:05 UTC
Subject: Re:   New: ALL_TRUSTED broken: barely hitting any ham

On Sun, Jan 23, 2005 at 04:28:14PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
>   0.001   0.0000   0.0049    0.000   0.44   -2.40  ALL_TRUSTED:theo
>   0.004   0.0000   0.0106    0.000   0.45   -2.40  ALL_TRUSTED:quinlan
>   0.004   0.0000   0.0138    0.000   0.45   -2.40  ALL_TRUSTED:parkerm
>   0.025   0.0000   0.0500    0.000   0.49   -2.40  ALL_TRUSTED:jm
>   0.000   0.0000   0.0000    0.500   0.00   -2.40  ALL_TRUSTED:hbayle
>   0.253   0.0000   0.4580    0.000   0.55   -2.40  ALL_TRUSTED:daf
>   0.035   0.0000   0.2589    0.000   0.48   -2.40  ALL_TRUSTED:bzoetekouw

So is the issue that we think it's not hitting as much as it should,
or that it doesn't hit much at all?

What surprises me here is how high the numbers for daf and bzoetekouw are.

Comment 3 Daniel Quinlan 2005-01-23 18:37:37 UTC
Subject: Re:  ALL_TRUSTED broken: barely hitting any ham

> So is the issue that we think it's not hitting as much as it should,
> or that it doesn't hit much at all?

Either, both.  It used to hit much more.  The rule broke and stopped
hitting as much as it should and it doesn't hit much at all.

Comment 4 Theo Van Dinter 2005-01-23 19:03:02 UTC
Subject: Re:  ALL_TRUSTED broken: barely hitting any ham

On Sun, Jan 23, 2005 at 06:37:37PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Either, both.  It used to hit much more.  The rule broke and stopped
> hitting as much as it should and it doesn't hit much at all.

I'm not really surprised by the current numbers.  My corpus has very
little mail sent completely through trusted hosts.

Comment 5 Daniel Quinlan 2005-01-23 19:16:05 UTC
Subject: Re:  ALL_TRUSTED broken: barely hitting any ham

> I'm not really surprised by the current numbers.  My corpus has very
> little mail sent completely through trusted hosts.

When I previously looked at rule efficacy changes (RANK changes) between
3.0 and today and ALL_TRUSTED was the one rule that declined the most
(by a rather large margin) and thus I opened this bug.

3.0 mass-check:

  1.045   0.0138   3.0537    0.005   0.67   -2.40  ALL_TRUSTED

submit-3.0.0-sets01/{spam,ham}-nobayes-net-theo.log:

  0.325   0.0030   2.6253    0.001   0.68   -2.82  ALL_TRUSTED

I had a similar drop and my corpus composition has not significantly
changed and certainly not everyone's ALL_TRUSTED email corpus-by-volume
has dropped by a factor of 500x.

I am a developer of SpamAssassin, by the way.

Comment 6 Justin Mason 2005-01-25 23:08:52 UTC
ok, I've mass-checked 94142 ham messages with 3.0.0 and trunk.

the differences are entirely messages where there are NO received headers -- in
3.0.0 it's:

  my ($self) = @_;
  if ($self->{num_relays_untrusted} > 0) {
    return 0;
  } else {
    return 1;
  }

in 3.1.0 it's:

  my ($self) = @_;
  return $self->{num_relays_trusted}
        && !$self->{num_relays_untrusted}
        && !$self->{num_relays_unparseable};

by changing that to

  my ($self) = @_;
  return $self->{num_relays_trusted} >= 0
        && !$self->{num_relays_untrusted}
        && !$self->{num_relays_unparseable};

I get *identical* hits on that 90k-message corpus to what I get with a
mass-check from 3.0.0.

Comment 7 Justin Mason 2005-01-25 23:10:03 UTC
Created attachment 2630 [details]
patch to restore 3.0.0 behaviour

here's the patch that restore's 3.0.0's behaviour.  I can't recall why we might
have changed it to what it is in current SVN, but I suggest we change it *back*
;)
Comment 8 Justin Mason 2005-01-25 23:57:14 UTC
ok, ignore the patch -- I've added some new T_ rules with similar effect instead.
Comment 9 Justin Mason 2005-01-26 16:59:31 UTC
gotcha:

r56643 | felicity | 2004-11-04 18:54:04 -0800 (Thu, 04 Nov 2004) | 1 line

bug 3949: make ALL_TRUSTED test for 'only and at least 1 trusted relay', not 'no
untrusted' which could mean no relays at all (just because, or failure to parse
headers, or ...)
Comment 10 Justin Mason 2005-02-06 23:34:41 UTC
http://bugzilla.spamassassin.org/ruleqa?s_details=on&s_all=on&date=20050205&rule=T_NO_RELAYS&g=Change

  0.629   0.0325   2.6482    0.012   0.62   -0.01  T_NO_RELAYS
  0.302   0.1091   1.1949    0.084   0.34   -0.01  T_NO_RELAYS:bzoetekouw
  0.000   0.0000   0.0000    0.500   0.47   -0.01  T_NO_RELAYS:daf
  0.000   0.0000   0.0000    0.500   0.48   -0.01  T_NO_RELAYS:jm
  0.000   0.0000   0.0000    0.500   0.46   -0.01  T_NO_RELAYS:parkerm
  2.484   0.0021   6.6431    0.000   0.83   -0.01  T_NO_RELAYS:quinlan
  0.127   0.0000   0.9095    0.000   0.49   -0.01  T_NO_RELAYS:rODbegbie
  0.006   0.0070   0.0000    1.000   0.39   -0.01  T_NO_RELAYS:theo

quinlan, theo, can you check your hits?

http://bugzilla.spamassassin.org/ruleqa?s_details=on&s_all=on&date=20050205&rule=%2FT_UNPARSEABLE&g=Change

 22.873  26.7186   9.8548    0.731   0.52    0.01  T_UNPARSEABLE_RELAY
 19.822  22.2412   8.6383    0.720   0.54    0.01  T_UNPARSEABLE_RELAY:bzoetekouw
 12.888  16.3503   5.5636    0.746   0.59    0.01  T_UNPARSEABLE_RELAY:daf
 10.931  20.6582   1.2127    0.945   0.70    0.01  T_UNPARSEABLE_RELAY:jm
 14.652  20.6082   1.3176    0.940   0.64    0.01  T_UNPARSEABLE_RELAY:parkerm
 12.105  16.8323   4.1842    0.801   0.53    0.01  T_UNPARSEABLE_RELAY:quinlan
 99.709  99.9894  97.9861    0.505   0.45    0.01  T_UNPARSEABLE_RELAY:rODbegbie
 13.349  14.5281   4.0181    0.783   0.55    0.01  T_UNPARSEABLE_RELAY:theo

not so interesting IMO.  but still, ham hits means our parser is missing
something (quite a lot in Rod's case)
Comment 11 Theo Van Dinter 2005-02-07 10:40:34 UTC
Subject: Re:  ALL_TRUSTED broken: barely hitting any ham

On Sun, Feb 06, 2005 at 11:34:42PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
>   0.006   0.0070   0.0000    1.000   0.39   -0.01  T_NO_RELAYS:theo
> 
> quinlan, theo, can you check your hits?

The ham all legitimately doesn't relay.  It's all on the local machine and has
a single Received header:

Received: by eclectic.kluge.net (Postfix, from userid 501)
        id 93E0741469C; Sun, 12 Dec 2004 12:27:11 -0500 (EST)

The spam hits are all on mails that shouldn't be in the corpus, unfortunately
(blowback-related).  :(   Have to work out a way to filter that stuff.

> http://bugzilla.spamassassin.org/ruleqa?s_details=on&s_all=on&date=20050205&rule=%2FT_UNPARSEABLE&g=Change
> 
>  13.349  14.5281   4.0181    0.783   0.55    0.01  T_UNPARSEABLE_RELAY:theo
> 
> not so interesting IMO.  but still, ham hits means our parser is missing
> something (quite a lot in Rod's case)

Hrm.  Is there an easy script, for instance, that would report out the lines
that aren't parsing?

Going through a couple by hand:

Received: from dc-mail-3102.iad3.amazon.com by mail-store-2001.amazon.com with ESMTP 
        (peer crosscheck: dc-mail-3102.iad3.amazon.com)

Received: from GWGC6-MTA by gc6.jefferson.co.us
        with Novell_GroupWise; Tue, 30 Nov 2004 10:09:15 -0700

Received: from no.name.available by [165.224.43.143]
        via smtpd (for [165.224.216.89]) with ESMTP; Fri, 28 Jan 2005 13:06:39 -0500

Received: from no.name.available by [165.224.216.88]
          via smtpd (for lists.sourceforge.net [66.35.250.206]) with ESMTP; Fri, 28 Jan 2005 15:42:30 -0500

Comment 12 Rod Begbie 2005-02-07 12:37:19 UTC
Here's the Received headers from a randomly plucked email in my corpus: 

Received: (qmail 22147 invoked by uid 526); 6 Feb 2005 21:11:38 -0000
Received: from 156.56.111.196 by blazing.arsecandle.org (envelope-from
<gentoo-announce-return-530-rod=arsecandle.org@lists.gentoo.org>, uid 502) with
qmail-scanner-1.24
 (clamdscan: 0.80/594. f-prot: 4.4.2/3.14.11.
 Clear:RC:0(156.56.111.196):.
 Processed in 0.288806 secs); 06 Feb 2005 21:11:38 -0000
DomainKey-Status: no signature
Received: from lists.gentoo.org (HELO parrot.gentoo.org) (156.56.111.196)
  by blazing.arsecandle.org with (DHE-RSA-AES256-SHA encrypted) SMTP; 6 Feb 2005
21:11:37 -0000
Received: (qmail 3988 invoked by uid 89); 6 Feb 2005 21:11:12 +0000
Comment 13 Loren Wilton 2005-02-07 22:58:10 UTC
Subject: Re:  ALL_TRUSTED changed semantics since 3.0.0

> Hrm.  Is there an easy script, for instance, that would report out the
lines
> that aren't parsing?

Suggestion: in debug mode log unparsable received headers as literally as
possible so that someone can see what is getting missed.  Possibly include
some sort of link (the message id?) to the original message.

Comment 14 Justin Mason 2005-03-11 17:23:24 UTC
ok, the cases listed here are now taken care of; also, unparseable headers are
output in debug output.  as a bonus, qmail-scanner's "envelope-from" data is
added in to the correct relays entry, too. ;)

r157200
Comment 15 Justin Mason 2005-04-26 11:39:38 UTC
see bug 4283 for a proposal to make these T_ rules informational only