SA Bugzilla – Bug 5842
Should SPF rules be "tflags net" or not?
Last modified: 2015-04-12 14:14:30 UTC
Now that bug 5239 enables the SPF plugin to reuse results from Received-SPF headers, should SPF rules be "tflags net" or not? The rules can work without network tests enabled. For 3.2 we generated scores without "tflags net". My r588457. This was reverted in jm's r596095. Now I'm not sure what to do. We need to generate scores for the rules for set0 (so they shouldn't have tflags net) but those scores probably aren't going to be very accurate since I don't think many of the mass-check contributors have Received-SPF headers in their mail.
I say yes, they should be tflags net. it's exactly analogous to the DNSBL rules; they are network lookups, which can record their results in a header (the reuse data), which can then be reused in set 0 if --reuse is specified. IMO that makes sense for the SPF rules (although the recorded results are put in Received-SPF:). what do we need to do to record Received-SPF: btw?
(In reply to comment #1) > I say yes, they should be tflags net. it's exactly analogous to the DNSBL > rules; they are network lookups, which can record their results in a header (the > reuse data), which can then be reused in set 0 if --reuse is specified. > IMO that makes sense for the SPF rules (although the recorded results are > put in Received-SPF:). I'm not sure I'd consider them exactly analogous. With a mail system that adds Received-SPF headers before SA sees the message SPF results are used (in all, even non-net, scoresets) without SA *ever* doing the actual network checks. Reuse of the data later in mass-checks, etc, isn't of concern. I'm not aware of --reuse applying to spamassassin or spamd. > what do we need to do to record Received-SPF: btw? Record Received-SPF? We don't. This would be added by a mail system that processes the mail before SA sees it. For example, see the headers of any mail I've sent to an ASF mailling list.
(In reply to comment #2) > (In reply to comment #1) > > I say yes, they should be tflags net. it's exactly analogous to the DNSBL > > rules; they are network lookups, which can record their results in a header (the > > reuse data), which can then be reused in set 0 if --reuse is specified. > > IMO that makes sense for the SPF rules (although the recorded results are > > put in Received-SPF:). > > I'm not sure I'd consider them exactly analogous. With a mail system that adds > Received-SPF headers before SA sees the message SPF results are used (in all, > even non-net, scoresets) without SA *ever* doing the actual network checks. ok, not exactly analogous. Just partially ;) > Reuse of the data later in mass-checks, etc, isn't of concern. I'm not aware of > --reuse applying to spamassassin or spamd. true. > > what do we need to do to record Received-SPF: btw? > > Record Received-SPF? We don't. This would be added by a mail system that > processes the mail before SA sees it. For example, see the headers of any mail > I've sent to an ASF mailling list. I meant, what do we need to do *in our MTA configurations* ;) Now that I think about it though, I don't do any SPF lookups in any of my MTAs; I leave that to SpamAssassin. So maybe we should add support for recording it (if there isn't a header already there). Then we *can* use this header as a way to #reuse SPF lookups, *and* we are more standards-compliant (since I think that is dictated in the std). That would help with generating scores for set0 at least. In the meantime I think it'd be acceptable to make these a special case. Generate their scores for set2/3, then simply copy those to set0/1. if we don't have the data, we can't trust the GA, but we should be able to trust that the S/O ratios will be the same (since it's the same domains and the same lookup logic!). However that doesn't solve the core issue -- "tflags net". we need to keep the network lookup code running with "tflags net". This is necessary for the --reuse support, so that it knows to set the rule score to 0 when attempting to reuse hits. However as you note, this means the code doesn't run in set0 at all. What I did in the past with the ROUND_THE_WORLD test was to split it into two rules, ROUND_THE_WORLD and ROUND_THE_WORLD_LOCAL; the latter was set0, the former set2. What about doing that with the SPF rules -- adding a duplicate ruleset for SPF_PASS_LOCAL, SPF_NEUTRAL_LOCAL etc.? (better names welcome of course.) We could even move the current set to __SPF_PASS_LOCAL, __SPF_PASS_NET and combine them into a new SPF_PASS meta rule. This would be --reuse-compatible, and clearly delineate set0 and set2 rules.
(In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > what do we need to do to record Received-SPF: btw? > I meant, what do we need to do *in our MTA configurations* ;) Pick a milter, any milter: http://svn.perl.org/viewvc/qpsmtpd/trunk/plugins/sender_permitted_from?view=markup http://www.openspf.org/Implementations > Now that I think about it though, I don't do any SPF lookups in any of my MTAs; > I leave that to SpamAssassin. So maybe we should add support for recording it > (if there isn't a header already there). Then we *can* use this header as a > way to #reuse SPF lookups, *and* we are more standards-compliant (since I think > that is dictated in the std). That would help with generating scores for set0 > at least. "It is RECOMMENDED that SMTP receivers record the result of SPF processing in the message header." We could, I guess. It'd give us a positive indication that the SPF checks were actually done for domains that don't publish SPF records. > In the meantime I think it'd be acceptable to make these a special case. > Generate their scores for set2/3, then simply copy those to set0/1. 1/3 and 0/2 :) > if we > don't have the data, we can't trust the GA, but we should be able to trust that > the S/O ratios will be the same (since it's the same domains and the same > lookup logic!). Yeah, we're probably going to have to do some copying. I'm not sure who's corpus was responsible for the scores that were generated for 3.2. > However that doesn't solve the core issue -- "tflags net". we need to keep the > network lookup code running with "tflags net". This is necessary for the > --reuse support, so that it knows to set the rule score to 0 when attempting to > reuse hits. Really? Isn't that what #reuse is supposed to be taking care of? Why a dual dependency on both #reuse and "tflags net"? > However as you note, this means the code doesn't run in set0 at all. > > What I did in the past with the ROUND_THE_WORLD test was to split it into two > rules, ROUND_THE_WORLD and ROUND_THE_WORLD_LOCAL; the latter was set0, the > former set2. What about doing that with the SPF rules -- adding a duplicate > ruleset for SPF_PASS_LOCAL, SPF_NEUTRAL_LOCAL etc.? (better names welcome of > course.) I *really* want to avoid different rules like above. It'll confuse people and cause those who aren't confused to write metas to combine the two versions. > We could even move the current set to __SPF_PASS_LOCAL, __SPF_PASS_NET > and combine them into a new SPF_PASS meta rule. This would be > --reuse-compatible, and clearly delineate set0 and set2 rules. This might complicate score generation further. :(
(In reply to comment #4) > (In reply to comment #3) > > (In reply to comment #2) > > > (In reply to comment #1) > > if we don't have the data, we can't trust the GA, but we should be able to > > trust that the S/O ratios will be the same (since it's the same domains and > > the same lookup logic!). > > Yeah, we're probably going to have to do some copying. I'm not sure who's > corpus was responsible for the scores that were generated for 3.2. well, we had no form of reuse for SPF, so the scores were based on whatever SPF records were in place at the time of mass-check. in my opinion, basing scores on this is better than on no data at all. > > However that doesn't solve the core issue -- "tflags net". we need to keep > > the network lookup code running with "tflags net". This is necessary for > > the --reuse support, so that it knows to set the rule score to 0 when > > attempting to reuse hits. > > Really? Isn't that what #reuse is supposed to be taking care of? Why a dual > dependency on both #reuse and "tflags net"? We could modify the code so that it knows that #reuse implies "tflags net", sure. basically I didn't want the effects of #reuse to be too widespread in the main Mail::SpamAssassin classes, since it's only supposed to affect mass-checks. > > What I did in the past with the ROUND_THE_WORLD test was to split it into two > > rules, ROUND_THE_WORLD and ROUND_THE_WORLD_LOCAL; the latter was set0, the > > former set2. What about doing that with the SPF rules -- adding a duplicate > > ruleset for SPF_PASS_LOCAL, SPF_NEUTRAL_LOCAL etc.? (better names welcome of > > course.) > > I *really* want to avoid different rules like above. It'll confuse people and > cause those who aren't confused to write metas to combine the two versions. I see your point -- they are pretty messy. other suggestions?
(In reply to comment #5) > well, we had no form of reuse for SPF, so the scores were based on whatever > SPF records were in place at the time of mass-check. in my opinion, basing > scores on this is better than on no data at all. Actually we did have #reuse for the SPF tests. I was actually questioning where the set0/2 scores for the SPF rules came from, but of course, they were generated based on the results of single set3 mass-check (just like any other set0/2 rules). So that's fine (assuming all/most of the corpus submitters have SPF checks enabled). > We could modify the code so that it knows that #reuse implies "tflags net", > sure. basically I didn't want the effects of #reuse to be too widespread > in the main Mail::SpamAssassin classes, since it's only supposed to affect > mass-checks. I can't think of any reason why #reuse must or should only apply to net rules. If you wanted to #reuse bayes rules (which I've been considering for a while) I think you should be able to. Same goes for any other rule if it makes sense for that rule. > I see your point -- they are pretty messy. other suggestions? I don't see what the harm would be of *not* having "tflags net" for the SPF rules and just having mass-check #reuse whatever we tell it to.
I figured I would add a few thoughts since I'm working this issue in my mail system now. I think that this should be set to "tflags net" even if SA doesn't need to go do the actual lookup. The reason is because sometimes it's reasonable to omit the network rules for reasons other than performance. Suppose you want to allow a user to authenticate and relay without worrying about RBLs, SPF, or other network tests, but you don't want to completely omit the message from SA either. In this case turning off network tests works well, but only if SPF is a network test. If it isn't a network test, then the authenticated user will get hit with SPF rules unless you whitelist your domain which isn't desirable for other reasons. schu
(In reply to comment #7) > Suppose you want to > allow a user to authenticate and relay without worrying about RBLs, SPF, or > other network tests, but you don't want to completely omit the message from SA > either. In this case turning off network tests works well, but only if SPF is > a network test. If it isn't a network test, then the authenticated user will > get hit with SPF rules unless you whitelist your domain which isn't desirable > for other reasons. SA provides numerous ways to detect that a user is authenticated. Turning off net tests is not the correct, nor accurate, way to handle this situation.
moving most remaining 3.3.0 bugs to 3.3.1 milestone
reassigning, too
moving all open 3.3.1 bugs to 3.3.2
Moving back off of Security, which got changed by accident during the mass Target Milestone move.
Moving all open bugs where target is defined and 3.4.0 or lower to 3.4.1 target
leaving as tflags Net for SPF