Bug 3949 - [review] ALL_TRUSTED misfires when Received: parsing fails.
Summary: [review] ALL_TRUSTED misfires when Received: parsing fails.
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (Eval Tests) (show other bugs)
Version: 3.0.1
Hardware: All All
: P5 normal
Target Milestone: 3.0.5
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on: 4099
Blocks:
  Show dependency tree
 
Reported: 2004-11-04 06:13 UTC by Matt Kettler
Modified: 2005-12-14 21:26 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
suggested patch patch None Theo Van Dinter [HasCLA]
count unparseable hdrs too, and do not fire ALL_TRUSTED if > 0 patch None Justin Mason [HasCLA]
Patch for 3.0.2 patch None Tom Schulz [NoCLA]
spamassassin-3.0.4-3949-ALL_TRUSTED-unparsable-misfire.patch patch None Warren Togami [HasCLA]
New patch that adds the informational rules NO_RELAYS, UNPARSEABLE_RELAYS, and NO_RECEIVED patch None Sidney Markowitz [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Kettler 2004-11-04 06:13:39 UTC
ALL_TRUSTED currently only checks to make sure no untrusted relays exist.
However, it does not check if any trusted relays exist, causing it to
false-match any time Received: header parsing fails, contributing to FP problems.

Example debug bits from a list post:

debug: received-header: unknown format:
debug: received-header: unknown format:
debug: received-header: unknown format:
debug: received-header: unknown format:
debug: metadata: X-Spam-Relays-Trusted:
debug: metadata: X-Spam-Relays-Untrusted:

debug: tests=ALL_TRUSTED,DRUGS_ERECTILE ...


Might I suggest a minor change to the eval, something like this:

sub check_all_trusted {
  my ($self) = @_;
  if ($self->{num_relays_untrusted} > 0) {
    return 0;
  } else {
    if ($self->{num_relays_trusted} > 0) {
        return 1;
    } else {
       return 0;
    }
  }
}


I'd make a patch, but I'm not perl coder, so I'm not even sure if the above is
valid, it's just an illustration of what I think the code should do.
Comment 1 Justin Mason 2004-11-04 09:28:43 UTC
however, this means that a mail that never passes through SMTP will not be
considered "local" or trusted.  consider a mail delivered from one user on host
A, directly to another on host A -- that should be trusted.
Comment 2 Theo Van Dinter 2004-11-04 10:12:38 UTC
Subject: Re:  ALL_TRUSTED misfires when Received: parsing fails.

On Thu, Nov 04, 2004 at 09:28:44AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> however, this means that a mail that never passes through SMTP will not be
> considered "local" or trusted.  consider a mail delivered from one user on host
> A, directly to another on host A -- that should be trusted.

Arguably though, mails from host A to host A shouldn't be passed through
SpamAssassin.  I know that's not easy if run from procmail or something,
but ...  Thought I'd throw it out there.

Comment 3 Justin Mason 2004-11-04 10:43:49 UTC
ok.  perhaps this isn't such a bad idea, when weighed against the FPs... I'm
coming around to the idea.   anyone else?
Comment 4 Theo Van Dinter 2004-11-04 11:23:46 UTC
Subject: Re:  ALL_TRUSTED misfires when Received: parsing fails.

On Thu, Nov 04, 2004 at 10:43:50AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> ok.  perhaps this isn't such a bad idea, when weighed against the FPs... I'm
> coming around to the idea.   anyone else?

I'm +1 on it.  I never liked ALL_TRUSTED meaning "no untrusted relays",
and not "the relays used were all trusted", which is basically what the
suggested code implements.

Comment 5 Matt Kettler 2004-11-04 13:22:51 UTC
As for Justin's "local, no relay" situation, one could always implement a
separate no_relays eval test that matches when both fields are 0, and set up a
rule for it with a fixed default score of 0.

Those who need this feature can turn that rule on manually, and those who are
having problems can use it as a debugging tool. 

(it might also be helpful on the devel end by allowing mass-check to  do sweeps
of corpi looking for unsupported Received: formats and other parsing issues that
muck up Received paths.)

I have no +/- weight in the devel choices here, but I do feel Theo's got
more-or-less the same thoughts as I do. The only other point I'd add is that in
general a comp rule should be something earned, and not subject to being a
"default" if some part of SA fails (parsing, queries, or whatever).


Comment 6 Theo Van Dinter 2004-11-04 18:51:25 UTC
Created attachment 2506 [details]
suggested patch

implements the "only and at least 1 trusted relay was used".  it's actually
really	trivial. ;)
Comment 7 Daniel Quinlan 2004-11-04 22:14:05 UTC
I'm -0.5 on this change for both trunk and 3.0 branch.

If there really are _no_ Received: headers, then the message should be
trusted, so an exemption should be added for that.  I suppose that's
really unlikely, though...
Comment 8 Justin Mason 2004-11-05 10:39:58 UTC
thinking about this.  how's about we count the number of Received headers found,
parseable and non-parseable, and only allow ALL_TRUSTED to hit if

  (num_untrusted == 0 && num_trusted == num_rcvd)

then if there's an unparseable header, ALL_TRUSTED won't fire.  that would clear
up the problem cases and still maintain the current semantics.
Comment 9 Daniel Quinlan 2004-11-05 11:13:28 UTC
Subject: Re:  [review] ALL_TRUSTED misfires when Received: parsing fails.

> thinking about this.  how's about we count the number of Received
> headers found, parseable and non-parseable, and only allow ALL_TRUSTED
> to hit if
> 
>   (num_untrusted == 0 && num_trusted == num_rcvd)
> 
> then if there's an unparseable header, ALL_TRUSTED won't fire.  that
> would clear up the problem cases and still maintain the current
> semantics.

works for me

Comment 10 Matt Kettler 2004-11-05 12:22:33 UTC
I like that solution Justin.

My alternative was to suggest that you split the rule into ALL_TRUSTED and
NO_RELAYS. The latter rule would just be a negated exists: header rule, and
would catch any mail with no Received: headers.

However, I like Justin's solution a bit better, as it covers a wider variety of
parsing problems (ie: only one Received: header being unparsable).
Comment 11 Justin Mason 2004-11-05 15:47:22 UTC
Created attachment 2508 [details]
count unparseable hdrs too, and do not fire ALL_TRUSTED if > 0

OK, here's an implementation of that; it adds a new counter,
num_relays_unparseable, which is incremented when an unparseable "from"
Received line is encountered.

it still ignores "by" or "(comment)"-only Received lines, btw, but so far that
seems totally safe; and otherwise ALL_TRUSTED would never fire under qmail ;)

here's pre-patch hit-frequencies for ALL_TRUSTED.  compare against tomorrow's
results to see if there's changes, and what they are.

  0.028   0.0008   0.1337    0.006   0.51   -2.40  ALL_TRUSTED
  0.033   0.0029   0.2322    0.012   0.46   -2.40  ALL_TRUSTED:bzoetekouw
  1.981   0.0000   2.0593    0.000   0.94   -2.40  ALL_TRUSTED:cthielen
  0.013   0.0000   0.0250    0.000   0.49   -2.40  ALL_TRUSTED:jm
  0.001   0.0000   0.0035    0.000   0.45   -2.40  ALL_TRUSTED:quinlan
  0.000   0.0000   0.0000    0.500   0.45   -2.40  ALL_TRUSTED:rODbegbie
  0.000   0.0000   0.0000    0.500   0.47   -2.40  ALL_TRUSTED:theo
Comment 12 Justin Mason 2004-11-10 12:50:45 UTC
last night's aLL_TRUSTED results:

  0.022   0.0016   0.1158    0.014   0.49   -2.40  ALL_TRUSTED
  0.030   0.0014   0.2225    0.006   0.46   -2.40  ALL_TRUSTED:bzoetekouw
  0.979   0.0000   1.4774    0.000   0.64   -2.40  ALL_TRUSTED:cthielen
  0.013   0.0000   0.0250    0.000   0.49   -2.40  ALL_TRUSTED:jm
  0.001   0.0000   0.0035    0.000   0.46   -2.40  ALL_TRUSTED:quinlan
  0.000   0.0000   0.0000    0.500   0.38   -2.40  ALL_TRUSTED:rODbegbie
  0.002   0.0027   0.0000    1.000   0.46   -2.40  ALL_TRUSTED:theo

FNs down (good!), except for Theo.  Theo, did you add more mails to yr corpus?
Comment 13 Theo Van Dinter 2004-11-10 13:14:45 UTC
Subject: Re:  [review] ALL_TRUSTED misfires when Received: parsing fails.

On Wed, Nov 10, 2004 at 12:50:46PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
>   0.002   0.0027   0.0000    1.000   0.46   -2.40  ALL_TRUSTED:theo
> 
> FNs down (good!), except for Theo.  Theo, did you add more mails to yr corpus?

My corpus is updated daily.  Let's see what the FPs were ...

Hrm.  4 messages, all bounces that were sent to one of my users who then
reported it as spam.  The bounces were all generated from a secondary MX, so
the ALL_TRUSTED hit is valid.

The problem is getting my users to stop reporting bounces... <sigh>

Comment 14 Daniel Quinlan 2004-11-10 13:18:13 UTC
Subject: Re:  [review] ALL_TRUSTED misfires when Received: parsing fails.

>   0.022   0.0016   0.1158    0.014   0.49   -2.40  ALL_TRUSTED
>   0.030   0.0014   0.2225    0.006   0.46   -2.40  ALL_TRUSTED:bzoetekouw
>   0.979   0.0000   1.4774    0.000   0.64   -2.40  ALL_TRUSTED:cthielen
>   0.013   0.0000   0.0250    0.000   0.49   -2.40  ALL_TRUSTED:jm
>   0.001   0.0000   0.0035    0.000   0.46   -2.40  ALL_TRUSTED:quinlan
>   0.000   0.0000   0.0000    0.500   0.38   -2.40  ALL_TRUSTED:rODbegbie
>   0.002   0.0027   0.0000    1.000   0.46   -2.40  ALL_TRUSTED:theo

Dude, there are virtually no ham hits either!  Not good.

Comment 15 Theo Van Dinter 2004-11-10 13:18:18 UTC
Subject: Re:  [review] ALL_TRUSTED misfires when Received: parsing fails.

On Wed, Nov 10, 2004 at 01:14:46PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Hrm.  4 messages, all bounces that were sent to one of my users who then
> reported it as spam.  The bounces were all generated from a secondary MX, so
> the ALL_TRUSTED hit is valid.

Oh, it's interesting to note that these mails were reported in August.  FWIW.

Comment 16 Justin Mason 2004-11-10 13:28:30 UTC
'Dude, there are virtually no ham hits either!  Not good.'

there weren't beforehand, either.  cthielen dropped about 0.5%, the others by
something on the order of a couple of messages each.  I don't htink any of us
are setting "trusted_networks" in our mass-check runs, so that's not surprising IMO.
Comment 17 Daniel Quinlan 2004-11-14 16:54:13 UTC
Just to be clear, -1 for current code/patch.  ;-)

Reason above.
Comment 18 Justin Mason 2004-11-14 17:13:05 UTC
Daniel's sure there were higher ALL_TRUSTED hit-rates before.   I've asked him
to go do a binary search, because it's not this patch that's causing this ;)
Comment 19 Justin Mason 2004-11-16 16:55:32 UTC
moving to 3.0.3 pending further investigation
Comment 20 Tom Schulz 2004-12-07 11:36:24 UTC
Is there a version of this patch that will apply to 3.0.1?  It looks like the
patch in the attachment is meant to apply to the development version.
Comment 21 Tom Schulz 2004-12-16 14:45:10 UTC
I have hand applied these patches to 3.0.2 and have generated a patch file.
Is anyone interested in having that patch?  If so, I could attach it to this
bug.  The patch includes some of the reordering of the checks in Received.pm
that are in the version in trunk. I did that to be sure that I was making the
changes in the right places.
Comment 22 Justin Mason 2005-02-06 23:28:52 UTC
oh -- btw -- update for people following just this bug.  bug 4099 revealed that
the ALL_TRUSTED drop in hitrates was due to an unrelated change that went into
3.1.0, namely r56643:

  'bug 3949: make ALL_TRUSTED test for 'only and at least 1 trusted relay', not
'no untrusted' which could mean no relays at all (just because, or failure to
parse headers, or ...)'

so (a) this can be applied to 3.0.x if we release another update there, so
please vote; and (b) that change will be fixed in 3.1.0 by adding new rules
which do what ALL_TRUSTED used to do (ie. deal with the no-relays case).
Comment 23 era eriksson 2005-02-23 04:37:44 UTC
(In reply to comment #21)
> Is anyone interested in having that patch?  If so, I could attach it to this

Please do. There's no harm in adding an attachment and I'm sure there are folks
who'd like to try it out with 3.0.2 (including m'self).
Comment 24 Matt Kettler 2005-02-23 07:56:31 UTC
+0

I'm not a developer, so I have no vote. However, this is a rather severe bug for
those who suffer from it, and a fix is long overdue.

It's also, IMO, good logic. The original code took no preparations for error
cases. This code deals with the unparseable header case and the no header case. 

Justin's patch in particular prevents SA from ever firing ALL_TRUSTED for a
message containing mangled received: headers. That's probably a good thing, as
it will prevent spammers from trying to abuse SA by using crafted HELO strings
to try to confuse SA's Received parsing..
Comment 25 Tom Schulz 2005-02-23 08:16:37 UTC
Created attachment 2662 [details]
Patch for 3.0.2
Comment 26 Tom Schulz 2005-02-23 08:19:18 UTC
I forgot the comment!
Here is the patch for 3.0.2.  Note that it is just the previous 2 attachments
made to fit 3.0.2 and does nothing about the problem (if it is one for you)
from comment #22.
Comment 27 Justin Mason 2005-02-23 09:44:55 UTC
oh, FWIW, I should point out clearly that this code is in 3.1.0 already.
Comment 28 Daryl C. W. O'Shea 2005-03-11 23:28:04 UTC
The eval test...

sub check_all_trusted {
  my ($self) = @_;
  return $self->{num_relays_trusted}
        && !$self->{num_relays_untrusted}
        && !$self->{num_relays_unparseable};
}

... doesn't allow for mail between users on a local system (no SMTP).  I'd be +1
on the patch for 3.0.3 if $self->{num_relays_trusted} was removed from the
condition (doing the same in trunk would be good too).

The latter two conditions handle the "all relays trusted" condition by
themselves... if there are no unparsable relays and no untrusted relays, then
all the relays are therefore trusted (where 'all the relays' might be that there
are no relays).
Comment 29 Justin Mason 2005-04-26 11:28:33 UTC
btw, the no-relays-at-all case is dealt with in 3.1.0 using T_NO_RELAYS, see bug
4099.
Comment 30 Michael Parker 2005-04-27 13:58:44 UTC
Move to 3.0.4 milestone.
Comment 31 Malte S. Stretz 2005-05-31 10:04:23 UTC
I'm not sure about adding this patch to 3.0.4.  I always thought the same what 
Theo said in comment 4.  But this is currently expected behaviour and I don't 
think we should change that in the stable branch.  And it's yet another change 
which might break the stability.  So I'm +-0: If you all think this should go 
in I won't veto it but I won't give a +1 either. 
Comment 32 Justin Mason 2005-05-31 10:16:22 UTC
I'm +0 on this, too.   I don't think it's essential in 3.0.x.
Comment 33 Theo Van Dinter 2005-06-05 19:42:47 UTC
moving to 3.0.5 queue
Comment 34 Warren Togami 2005-10-11 13:12:35 UTC
Created attachment 3182 [details]
spamassassin-3.0.4-3949-ALL_TRUSTED-unparsable-misfire.patch

Tom Schulz patch rediffed against 3.0.4 and checked manually.  Note that this
patch does not add the no_relays case that 3.1.x handles.

<warren> jmason, what is the risk of applying this to 3.0.5?
<sabot> Spamassassin bug #3949: [review] ALL_TRUSTED misfires when Received:
parsing fails. Product: Spamassassin, Component: Rules (Eval Tests), Severity:
normal, Assigned to: dev@spamassassin.apache.org, Status: NEW
<jmason> I would say it's worthwhile -- while it's a change in behaviour
halfway through the release cycle, it'd clean up a lot of ALL_TRUSTED fp
reports
Comment 35 Warren Togami 2005-11-07 20:17:02 UTC
I have been testing this patch in production for several weeks and it seems fine.

3 votes needed
Comment 36 Sidney Markowitz 2005-11-07 23:32:05 UTC
Daniel, does the information in comment #22 eliminate the reason for your veto
in comment #17?

And if that is the case, here's my +1
Comment 37 Justin Mason 2005-11-07 23:43:03 UTC
+1 for 3182
Comment 38 Justin Mason 2005-11-08 23:54:48 UTC
'Daniel, does the information in comment #22 eliminate the reason for your veto
in comment #17?'

I would be inclined to think it invalidates the veto, unless Daniel wakes up and
comments to the contrary.  we could be waiting for an answer for quite a while,
otherwise ;)
Comment 39 Duncan Findlay 2005-11-09 02:11:37 UTC
Dan?

(sometimes Dan likes to be ccd on stuff that requires direct action from him...
not to mention gentle prodding over IRC :-) )
Comment 40 Daryl C. W. O'Shea 2005-11-09 04:22:35 UTC
+1 for 3182
Comment 41 Daniel Quinlan 2005-11-13 23:51:05 UTC
I conditionally revoke my -1 now; I would like to see how the nightly results
change after applying this patch *before* we release.  If they improve or remain
about the same, then my -1 is revoked.  If they don't, I want to look into it.
Comment 42 Daryl C. W. O'Shea 2005-11-14 00:54:00 UTC
Nightly results on the 3.0 branch???  This all went into 3.1 (then trunk) over a
year ago.
Comment 43 Justin Mason 2005-11-18 06:14:25 UTC
Daniel, that veto makes no sense.  There are no nightly mass-checks of the 3.0
branch.
Comment 44 Henry Stern 2005-11-19 00:54:21 UTC
Patch applies cleanly and make test is happy.  +1 for 3.0.5 release.
Comment 45 Matt Kettler 2005-11-19 04:38:41 UTC
Suggestion, one that is probably more what DQ is looking for.

Even though the code is a backport, someone probably should run a quick-sanity
test mass-check to make sure the numbers don't do anything strange.

Even a smallish corpus would be sufficient, but a quick comparison of the hits
of ALL_TRUSTED on two runs (one with the patch, one without) would be worthwhle
before puting this patch into a release of a mature branch.

Of course, you're all free to do as you like, but I think the spirit of DQ's
request makes a lot of sense.
Comment 46 Daniel Quinlan 2005-11-19 16:31:23 UTC
Okay.  I tested it myself:

SVN head:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
   6428     3214     3214    0.500   0.00    0.00  (all messages)
100.000  50.0000  50.0000    0.500   0.00    0.00  (all messages as %)
  1.836   0.0000   3.6714    0.000   0.72   -2.40  ALL_TRUSTED

proton:~/devel/svn/3.0-3949/masses $ ./hit-frequencies -xpa -m ALL_TRUST
OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  12293     6147     6146    0.500   0.00    0.00  (all messages)
100.000  50.0041  49.9959    0.500   0.00    0.00  (all messages as %)
  0.000   0.0000   0.0000    0.500   0.44   -2.40  ALL_TRUSTED

same corpus (only difference was how far I let it run), same tree with vs.
without patch, etc.

THIS is why I wanted this patch tested, the code is very easy to break.
Comment 47 Sidney Markowitz 2005-11-19 20:10:45 UTC
Daniel, can you look at some of those ham messages that matched ALL_TRUSTED
before the patch to see if they are valid?

This bug is about ALL_TRUSTED firing when it should not. If your ham does not go
through all trusted relays, then you would never see the rule hit. It may be
that a ham hit of 0% on your corpus is the correct result.

Just because the unpatched rule found more ham doesn't mean that it is better,
not if the rule should not have matched those hams in the first place.

This could be a bug in the patch, but we don't know yet until you look at those
hams.
Comment 48 Daniel Quinlan 2005-11-19 20:27:19 UTC
They are all locally sent messages:

From quinlan@pathname.com Fri Jul 02 11:19:14 2004
Return-path: <quinlan@pathname.com>
Envelope-to: quinlan@pathname.com
Delivery-date: Fri, 02 Jul 2004 11:19:14 -0700
Received: from quinlan by proton.pathname.com with local (Exim 3.35 #1 (Debian))
        id 1BgScz-0003FE-00; Fri, 02 Jul 2004 11:19:13 -0700
From: Daniel Quinlan <quinlan@pathname.com>
[...]
Comment 49 Sidney Markowitz 2005-11-19 20:51:13 UTC
Try that in 3.1 or trunk. It triggers

NO_RELAYS  Informational: message was not relayed via SMTP

and does trigger ALL_TRUSTED, which I think is correct, and is what this patch
is supposed to be doing for 3.0.
Comment 50 Sidney Markowitz 2005-11-19 20:52:11 UTC
Typo, I meant to say "and does _not_ trigger ALL_TRUSTED"
Comment 51 Sidney Markowitz 2005-11-19 22:22:21 UTC
I'm not sure I was clear in my last comments. I'll try again:

3.1 and trunk says that the Received header does not show relayed via SMTP, and
as such does not qualify for a hit for ALL_TRUSTED.

This patch is supposed to prevent FPs caused by having no trusted or untrusted
Received headers resulting in a trigger of ALL_TRUSTED.

Your example is showing this patch behaving as expected, causing 3.0 to have the
same behavior as 3.1 and trunk by no longer triggering ALL_TRUSTED on a local
non-SMTP delivery.
Comment 52 Daniel Quinlan 2005-11-19 23:07:04 UTC
Shouldn't we ship a NO_RELAYS with 3.0, then?
Comment 53 Sidney Markowitz 2005-11-20 00:12:03 UTC
> Shouldn't we ship a NO_RELAYS with 3.0, then?

That's easy and safe. With the patch in place all that is needed is the eval
test in 3.1 that checks for unparseable_relays. The rule is only -0.001
informational so it can't affect any scores.

There are two other similar informational rules, UNPARSEABLE_RELAYS and
NO_RECEIVED. Do you think we should add those in as well?

I can post a patch that includes those three. It's just a few lines that test
things that are already being calculated by the current patch.
Comment 54 Sidney Markowitz 2005-11-20 02:56:10 UTC
Created attachment 3258 [details]
New patch that adds the informational rules NO_RELAYS, UNPARSEABLE_RELAYS, and NO_RECEIVED

Here's an updated patch that includes the three related informational only
rules. It should get the same results as 3.1 on those hams that ALL_TRUSTED did
not fire on in Daniel's test. This is a conservative change that copies the few
lines in EvalTests.pm that does the eval for two of the rules, adds the rules
themselves which check for conditions already calculated by the patch we have
been testing, and score the rules as -0.001 informational.

We're now back to needing more votes. Here's my +1. Daniel, if this satisfies
you, could you put in a +1 as well as retract your -1?
Comment 55 Daniel Quinlan 2005-11-20 23:13:54 UTC
perfect!

+1, use same scores as 3.1 if they are non-0.001-type scores in 3.1
Comment 56 Sidney Markowitz 2005-11-20 23:25:27 UTC
Great! They are all -0.001 in 3.1.

1 more vote needed again :-)

Comment 57 Henry Stern 2005-11-21 00:19:16 UTC
Patch applies cleanly.  Make test passes.  +1
Comment 58 Sidney Markowitz 2005-11-21 00:29:25 UTC
Committed revision 345792.
Comment 59 Tomoki AONO 2005-12-15 06:26:38 UTC
(In reply to comment #54)

I'm using 3.0.5 now (with some funny Received: environment),
but UNPARSEABLE_RELAY does not occur with unparseable Received:
header.

> Created an attachment (id=3258) [edit]
> New patch that adds the informational rules NO_RELAYS, UNPARSEABLE_RELAYS, and
> NO_RECEIVED

$self->{num_relays_unparseable} increments in
$self->parse_received_line(), but current (3.0.x) code
clears it after parsing. We should move initialize
statement of num_relays_unparseable before parse_received_line() ...