SA Bugzilla – Bug 3949
[review] ALL_TRUSTED misfires when Received: parsing fails.
Last modified: 2005-12-14 21:26:38 UTC
ALL_TRUSTED currently only checks to make sure no untrusted relays exist. However, it does not check if any trusted relays exist, causing it to false-match any time Received: header parsing fails, contributing to FP problems. Example debug bits from a list post: debug: received-header: unknown format: debug: received-header: unknown format: debug: received-header: unknown format: debug: received-header: unknown format: debug: metadata: X-Spam-Relays-Trusted: debug: metadata: X-Spam-Relays-Untrusted: debug: tests=ALL_TRUSTED,DRUGS_ERECTILE ... Might I suggest a minor change to the eval, something like this: sub check_all_trusted { my ($self) = @_; if ($self->{num_relays_untrusted} > 0) { return 0; } else { if ($self->{num_relays_trusted} > 0) { return 1; } else { return 0; } } } I'd make a patch, but I'm not perl coder, so I'm not even sure if the above is valid, it's just an illustration of what I think the code should do.
however, this means that a mail that never passes through SMTP will not be considered "local" or trusted. consider a mail delivered from one user on host A, directly to another on host A -- that should be trusted.
Subject: Re: ALL_TRUSTED misfires when Received: parsing fails. On Thu, Nov 04, 2004 at 09:28:44AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > however, this means that a mail that never passes through SMTP will not be > considered "local" or trusted. consider a mail delivered from one user on host > A, directly to another on host A -- that should be trusted. Arguably though, mails from host A to host A shouldn't be passed through SpamAssassin. I know that's not easy if run from procmail or something, but ... Thought I'd throw it out there.
ok. perhaps this isn't such a bad idea, when weighed against the FPs... I'm coming around to the idea. anyone else?
Subject: Re: ALL_TRUSTED misfires when Received: parsing fails. On Thu, Nov 04, 2004 at 10:43:50AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > ok. perhaps this isn't such a bad idea, when weighed against the FPs... I'm > coming around to the idea. anyone else? I'm +1 on it. I never liked ALL_TRUSTED meaning "no untrusted relays", and not "the relays used were all trusted", which is basically what the suggested code implements.
As for Justin's "local, no relay" situation, one could always implement a separate no_relays eval test that matches when both fields are 0, and set up a rule for it with a fixed default score of 0. Those who need this feature can turn that rule on manually, and those who are having problems can use it as a debugging tool. (it might also be helpful on the devel end by allowing mass-check to do sweeps of corpi looking for unsupported Received: formats and other parsing issues that muck up Received paths.) I have no +/- weight in the devel choices here, but I do feel Theo's got more-or-less the same thoughts as I do. The only other point I'd add is that in general a comp rule should be something earned, and not subject to being a "default" if some part of SA fails (parsing, queries, or whatever).
Created attachment 2506 [details] suggested patch implements the "only and at least 1 trusted relay was used". it's actually really trivial. ;)
I'm -0.5 on this change for both trunk and 3.0 branch. If there really are _no_ Received: headers, then the message should be trusted, so an exemption should be added for that. I suppose that's really unlikely, though...
thinking about this. how's about we count the number of Received headers found, parseable and non-parseable, and only allow ALL_TRUSTED to hit if (num_untrusted == 0 && num_trusted == num_rcvd) then if there's an unparseable header, ALL_TRUSTED won't fire. that would clear up the problem cases and still maintain the current semantics.
Subject: Re: [review] ALL_TRUSTED misfires when Received: parsing fails. > thinking about this. how's about we count the number of Received > headers found, parseable and non-parseable, and only allow ALL_TRUSTED > to hit if > > (num_untrusted == 0 && num_trusted == num_rcvd) > > then if there's an unparseable header, ALL_TRUSTED won't fire. that > would clear up the problem cases and still maintain the current > semantics. works for me
I like that solution Justin. My alternative was to suggest that you split the rule into ALL_TRUSTED and NO_RELAYS. The latter rule would just be a negated exists: header rule, and would catch any mail with no Received: headers. However, I like Justin's solution a bit better, as it covers a wider variety of parsing problems (ie: only one Received: header being unparsable).
Created attachment 2508 [details] count unparseable hdrs too, and do not fire ALL_TRUSTED if > 0 OK, here's an implementation of that; it adds a new counter, num_relays_unparseable, which is incremented when an unparseable "from" Received line is encountered. it still ignores "by" or "(comment)"-only Received lines, btw, but so far that seems totally safe; and otherwise ALL_TRUSTED would never fire under qmail ;) here's pre-patch hit-frequencies for ALL_TRUSTED. compare against tomorrow's results to see if there's changes, and what they are. 0.028 0.0008 0.1337 0.006 0.51 -2.40 ALL_TRUSTED 0.033 0.0029 0.2322 0.012 0.46 -2.40 ALL_TRUSTED:bzoetekouw 1.981 0.0000 2.0593 0.000 0.94 -2.40 ALL_TRUSTED:cthielen 0.013 0.0000 0.0250 0.000 0.49 -2.40 ALL_TRUSTED:jm 0.001 0.0000 0.0035 0.000 0.45 -2.40 ALL_TRUSTED:quinlan 0.000 0.0000 0.0000 0.500 0.45 -2.40 ALL_TRUSTED:rODbegbie 0.000 0.0000 0.0000 0.500 0.47 -2.40 ALL_TRUSTED:theo
last night's aLL_TRUSTED results: 0.022 0.0016 0.1158 0.014 0.49 -2.40 ALL_TRUSTED 0.030 0.0014 0.2225 0.006 0.46 -2.40 ALL_TRUSTED:bzoetekouw 0.979 0.0000 1.4774 0.000 0.64 -2.40 ALL_TRUSTED:cthielen 0.013 0.0000 0.0250 0.000 0.49 -2.40 ALL_TRUSTED:jm 0.001 0.0000 0.0035 0.000 0.46 -2.40 ALL_TRUSTED:quinlan 0.000 0.0000 0.0000 0.500 0.38 -2.40 ALL_TRUSTED:rODbegbie 0.002 0.0027 0.0000 1.000 0.46 -2.40 ALL_TRUSTED:theo FNs down (good!), except for Theo. Theo, did you add more mails to yr corpus?
Subject: Re: [review] ALL_TRUSTED misfires when Received: parsing fails. On Wed, Nov 10, 2004 at 12:50:46PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > 0.002 0.0027 0.0000 1.000 0.46 -2.40 ALL_TRUSTED:theo > > FNs down (good!), except for Theo. Theo, did you add more mails to yr corpus? My corpus is updated daily. Let's see what the FPs were ... Hrm. 4 messages, all bounces that were sent to one of my users who then reported it as spam. The bounces were all generated from a secondary MX, so the ALL_TRUSTED hit is valid. The problem is getting my users to stop reporting bounces... <sigh>
Subject: Re: [review] ALL_TRUSTED misfires when Received: parsing fails. > 0.022 0.0016 0.1158 0.014 0.49 -2.40 ALL_TRUSTED > 0.030 0.0014 0.2225 0.006 0.46 -2.40 ALL_TRUSTED:bzoetekouw > 0.979 0.0000 1.4774 0.000 0.64 -2.40 ALL_TRUSTED:cthielen > 0.013 0.0000 0.0250 0.000 0.49 -2.40 ALL_TRUSTED:jm > 0.001 0.0000 0.0035 0.000 0.46 -2.40 ALL_TRUSTED:quinlan > 0.000 0.0000 0.0000 0.500 0.38 -2.40 ALL_TRUSTED:rODbegbie > 0.002 0.0027 0.0000 1.000 0.46 -2.40 ALL_TRUSTED:theo Dude, there are virtually no ham hits either! Not good.
Subject: Re: [review] ALL_TRUSTED misfires when Received: parsing fails. On Wed, Nov 10, 2004 at 01:14:46PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote: > Hrm. 4 messages, all bounces that were sent to one of my users who then > reported it as spam. The bounces were all generated from a secondary MX, so > the ALL_TRUSTED hit is valid. Oh, it's interesting to note that these mails were reported in August. FWIW.
'Dude, there are virtually no ham hits either! Not good.' there weren't beforehand, either. cthielen dropped about 0.5%, the others by something on the order of a couple of messages each. I don't htink any of us are setting "trusted_networks" in our mass-check runs, so that's not surprising IMO.
Just to be clear, -1 for current code/patch. ;-) Reason above.
Daniel's sure there were higher ALL_TRUSTED hit-rates before. I've asked him to go do a binary search, because it's not this patch that's causing this ;)
moving to 3.0.3 pending further investigation
Is there a version of this patch that will apply to 3.0.1? It looks like the patch in the attachment is meant to apply to the development version.
I have hand applied these patches to 3.0.2 and have generated a patch file. Is anyone interested in having that patch? If so, I could attach it to this bug. The patch includes some of the reordering of the checks in Received.pm that are in the version in trunk. I did that to be sure that I was making the changes in the right places.
oh -- btw -- update for people following just this bug. bug 4099 revealed that the ALL_TRUSTED drop in hitrates was due to an unrelated change that went into 3.1.0, namely r56643: 'bug 3949: make ALL_TRUSTED test for 'only and at least 1 trusted relay', not 'no untrusted' which could mean no relays at all (just because, or failure to parse headers, or ...)' so (a) this can be applied to 3.0.x if we release another update there, so please vote; and (b) that change will be fixed in 3.1.0 by adding new rules which do what ALL_TRUSTED used to do (ie. deal with the no-relays case).
(In reply to comment #21) > Is anyone interested in having that patch? If so, I could attach it to this Please do. There's no harm in adding an attachment and I'm sure there are folks who'd like to try it out with 3.0.2 (including m'self).
+0 I'm not a developer, so I have no vote. However, this is a rather severe bug for those who suffer from it, and a fix is long overdue. It's also, IMO, good logic. The original code took no preparations for error cases. This code deals with the unparseable header case and the no header case. Justin's patch in particular prevents SA from ever firing ALL_TRUSTED for a message containing mangled received: headers. That's probably a good thing, as it will prevent spammers from trying to abuse SA by using crafted HELO strings to try to confuse SA's Received parsing..
Created attachment 2662 [details] Patch for 3.0.2
I forgot the comment! Here is the patch for 3.0.2. Note that it is just the previous 2 attachments made to fit 3.0.2 and does nothing about the problem (if it is one for you) from comment #22.
oh, FWIW, I should point out clearly that this code is in 3.1.0 already.
The eval test... sub check_all_trusted { my ($self) = @_; return $self->{num_relays_trusted} && !$self->{num_relays_untrusted} && !$self->{num_relays_unparseable}; } ... doesn't allow for mail between users on a local system (no SMTP). I'd be +1 on the patch for 3.0.3 if $self->{num_relays_trusted} was removed from the condition (doing the same in trunk would be good too). The latter two conditions handle the "all relays trusted" condition by themselves... if there are no unparsable relays and no untrusted relays, then all the relays are therefore trusted (where 'all the relays' might be that there are no relays).
btw, the no-relays-at-all case is dealt with in 3.1.0 using T_NO_RELAYS, see bug 4099.
Move to 3.0.4 milestone.
I'm not sure about adding this patch to 3.0.4. I always thought the same what Theo said in comment 4. But this is currently expected behaviour and I don't think we should change that in the stable branch. And it's yet another change which might break the stability. So I'm +-0: If you all think this should go in I won't veto it but I won't give a +1 either.
I'm +0 on this, too. I don't think it's essential in 3.0.x.
moving to 3.0.5 queue
Created attachment 3182 [details] spamassassin-3.0.4-3949-ALL_TRUSTED-unparsable-misfire.patch Tom Schulz patch rediffed against 3.0.4 and checked manually. Note that this patch does not add the no_relays case that 3.1.x handles. <warren> jmason, what is the risk of applying this to 3.0.5? <sabot> Spamassassin bug #3949: [review] ALL_TRUSTED misfires when Received: parsing fails. Product: Spamassassin, Component: Rules (Eval Tests), Severity: normal, Assigned to: dev@spamassassin.apache.org, Status: NEW <jmason> I would say it's worthwhile -- while it's a change in behaviour halfway through the release cycle, it'd clean up a lot of ALL_TRUSTED fp reports
I have been testing this patch in production for several weeks and it seems fine. 3 votes needed
Daniel, does the information in comment #22 eliminate the reason for your veto in comment #17? And if that is the case, here's my +1
+1 for 3182
'Daniel, does the information in comment #22 eliminate the reason for your veto in comment #17?' I would be inclined to think it invalidates the veto, unless Daniel wakes up and comments to the contrary. we could be waiting for an answer for quite a while, otherwise ;)
Dan? (sometimes Dan likes to be ccd on stuff that requires direct action from him... not to mention gentle prodding over IRC :-) )
I conditionally revoke my -1 now; I would like to see how the nightly results change after applying this patch *before* we release. If they improve or remain about the same, then my -1 is revoked. If they don't, I want to look into it.
Nightly results on the 3.0 branch??? This all went into 3.1 (then trunk) over a year ago.
Daniel, that veto makes no sense. There are no nightly mass-checks of the 3.0 branch.
Patch applies cleanly and make test is happy. +1 for 3.0.5 release.
Suggestion, one that is probably more what DQ is looking for. Even though the code is a backport, someone probably should run a quick-sanity test mass-check to make sure the numbers don't do anything strange. Even a smallish corpus would be sufficient, but a quick comparison of the hits of ALL_TRUSTED on two runs (one with the patch, one without) would be worthwhle before puting this patch into a release of a mature branch. Of course, you're all free to do as you like, but I think the spirit of DQ's request makes a lot of sense.
Okay. I tested it myself: SVN head: OVERALL% SPAM% HAM% S/O RANK SCORE NAME 6428 3214 3214 0.500 0.00 0.00 (all messages) 100.000 50.0000 50.0000 0.500 0.00 0.00 (all messages as %) 1.836 0.0000 3.6714 0.000 0.72 -2.40 ALL_TRUSTED proton:~/devel/svn/3.0-3949/masses $ ./hit-frequencies -xpa -m ALL_TRUST OVERALL% SPAM% HAM% S/O RANK SCORE NAME 12293 6147 6146 0.500 0.00 0.00 (all messages) 100.000 50.0041 49.9959 0.500 0.00 0.00 (all messages as %) 0.000 0.0000 0.0000 0.500 0.44 -2.40 ALL_TRUSTED same corpus (only difference was how far I let it run), same tree with vs. without patch, etc. THIS is why I wanted this patch tested, the code is very easy to break.
Daniel, can you look at some of those ham messages that matched ALL_TRUSTED before the patch to see if they are valid? This bug is about ALL_TRUSTED firing when it should not. If your ham does not go through all trusted relays, then you would never see the rule hit. It may be that a ham hit of 0% on your corpus is the correct result. Just because the unpatched rule found more ham doesn't mean that it is better, not if the rule should not have matched those hams in the first place. This could be a bug in the patch, but we don't know yet until you look at those hams.
They are all locally sent messages: From quinlan@pathname.com Fri Jul 02 11:19:14 2004 Return-path: <quinlan@pathname.com> Envelope-to: quinlan@pathname.com Delivery-date: Fri, 02 Jul 2004 11:19:14 -0700 Received: from quinlan by proton.pathname.com with local (Exim 3.35 #1 (Debian)) id 1BgScz-0003FE-00; Fri, 02 Jul 2004 11:19:13 -0700 From: Daniel Quinlan <quinlan@pathname.com> [...]
Try that in 3.1 or trunk. It triggers NO_RELAYS Informational: message was not relayed via SMTP and does trigger ALL_TRUSTED, which I think is correct, and is what this patch is supposed to be doing for 3.0.
Typo, I meant to say "and does _not_ trigger ALL_TRUSTED"
I'm not sure I was clear in my last comments. I'll try again: 3.1 and trunk says that the Received header does not show relayed via SMTP, and as such does not qualify for a hit for ALL_TRUSTED. This patch is supposed to prevent FPs caused by having no trusted or untrusted Received headers resulting in a trigger of ALL_TRUSTED. Your example is showing this patch behaving as expected, causing 3.0 to have the same behavior as 3.1 and trunk by no longer triggering ALL_TRUSTED on a local non-SMTP delivery.
Shouldn't we ship a NO_RELAYS with 3.0, then?
> Shouldn't we ship a NO_RELAYS with 3.0, then? That's easy and safe. With the patch in place all that is needed is the eval test in 3.1 that checks for unparseable_relays. The rule is only -0.001 informational so it can't affect any scores. There are two other similar informational rules, UNPARSEABLE_RELAYS and NO_RECEIVED. Do you think we should add those in as well? I can post a patch that includes those three. It's just a few lines that test things that are already being calculated by the current patch.
Created attachment 3258 [details] New patch that adds the informational rules NO_RELAYS, UNPARSEABLE_RELAYS, and NO_RECEIVED Here's an updated patch that includes the three related informational only rules. It should get the same results as 3.1 on those hams that ALL_TRUSTED did not fire on in Daniel's test. This is a conservative change that copies the few lines in EvalTests.pm that does the eval for two of the rules, adds the rules themselves which check for conditions already calculated by the patch we have been testing, and score the rules as -0.001 informational. We're now back to needing more votes. Here's my +1. Daniel, if this satisfies you, could you put in a +1 as well as retract your -1?
perfect! +1, use same scores as 3.1 if they are non-0.001-type scores in 3.1
Great! They are all -0.001 in 3.1. 1 more vote needed again :-)
Patch applies cleanly. Make test passes. +1
Committed revision 345792.
(In reply to comment #54) I'm using 3.0.5 now (with some funny Received: environment), but UNPARSEABLE_RELAY does not occur with unparseable Received: header. > Created an attachment (id=3258) [edit] > New patch that adds the informational rules NO_RELAYS, UNPARSEABLE_RELAYS, and > NO_RECEIVED $self->{num_relays_unparseable} increments in $self->parse_received_line(), but current (3.0.x) code clears it after parsing. We should move initialize statement of num_relays_unparseable before parse_received_line() ...