SA Bugzilla – Bug 5895
[review] Configurable Originating Headers for RBL
Last modified: 2010-01-06 10:02:43 UTC
This patch will allow you to specify extra originating headers in your conf file for rbl checks.
Created attachment 4308 [details] Patch to allow configurable originating headers for rbl checks
I've been asked to consider this RFE for inclusion in 3.3.0, which it is already targeted for but at P5 so it would not have made the cut if not brought to our attention. The purpose of this enhancement, as I see it, is to add a config option to facilitate an ISP making use of their own heuristics for determining a sending IP address so that it can be checked against the DNSRBL rules in circumstances where the trusted_network heuristics don't do the job, for example some mail sent from a webmail interface or via some forwarding methods. My apologies to the original submitter if I didn't get that right. I have two questions for the other devs: 1) Is it too late -- Are we too frozen to consider adding this? 2) Look at finish_tests() defined in the proposed patch. The comments in Plugin.pm imply that finish_tests() is to be defined by a plugin to clean up an eval'd method that th plugin has pushed on a global array. That is how it is used in the plugins that currently define it. This patch only pushes string values from the configuration, not eval's methods, on to @EXTRA_RBL_HEADERS, so maybe finish_tests is not needed. However, there are no other plugins that use a global array in this way - Either they define such a global variable and push eval'd methods on it and then define finish_tests, or else they push the config values into a hash on $self such as the way $self->{uridnsbls} is used in URIDNSBL.pm. My question is, should this patch use $self->{dnsevalextras} (a new name I just made up) instead of @EXTRA_RBL_HEADERS, and if not, does it need to define finish_tests() to empty @EXTRA_RBL_HEADERS? I think that question 2 is more a matter of style than of substance and should not be used as a reason to reject this patch for 3.3.0, in case anyone is inclined to say that having such questions makes it too uncertain to go with at this point. I think the decision about including this should depend on whether or not it is too late to include what I think is a very safe and simple enhancement.
I was thinking about this sometime ago, tested adding other generic headers. The defaults are pretty much the most common, didn't get much that much better results. Aren't SA config variables are usually read with $pms->{conf}? The global array doesn't make much sense. Also after this change, there isn't any point to have hardcoded values. The "rbl" part doesn't sound good either, since what this really affects is received path. A quick propose: clear_originating_headers add_originating_header I can rewrite it quickly unless no one objects..
Duh.. make that: clear_originating_headers originating_headers To be more like trusted_networks, and add_* seems to mean adding a header into message.
Seems I remembered wrong. The headers are only processed rbl lookups. Though in the future I'd like to see that change. Maybe having X-Spam-Relays-Originating or such so the information could be used more generally. But for now we can go for the simple change..
(in reply to comment 5) > Seems I remembered wrong. The headers are only processed rbl lookups I was following you and agreeing until I saw that. I don't understand what's wrong with what you said in comment 3 and comment 4. I agree that what this option means is more like clear_originating_headers originating_headers which would allow us to have a default initial configuration of the ones that are hardcoded but allow full flexibility in the configuration. And these are a list of headers that specify the originating IP address, not necessarily just something for DNSRBL. Maybe original_ip_header would be better than originating_header. If you want to rewrite it to use $pms->{conf} instead of the array and with those names, I think that would be great.
Created attachment 4618 [details] originating_ip_headers option Please review, especially the documentation, since it's not my forte. ;-)
Reading it over, it all looks good. I'm still running tests. It looks like originating_ip_headers and clear_originating_ip_headers have to be added to the list in t/cross_user_config_leak.t just like trusted_networks and clear_trusted_networks or else that test fails. But that is just updating the test case, not a problem with the patch.
+1 to Henrik's patch. this isn't too late for 3.3.0, as it won't affect mass-check results/rescoring results for users who leave it at the defaults. minor typo: +These IP addresses virtually appended into the Received: chain, so they are +used in RBL checks where appropriate. => +These IP addresses are virtually appended into the Received: chain, so they are +used in RBL checks where appropriate. +Currently the IP addresses are not added into X-Spam-Relays-* headers, but +may do so in the future. => +Currently the IP addresses are not added into X-Spam-Relays-* headers, but +they may be in the future.
(In reply to comment #8) > It looks like originating_ip_headers and clear_originating_ip_headers have to > be added to the list in t/cross_user_config_leak.t just like trusted_networks > and clear_trusted_networks or else that test fails. But that is just updating > the test case, not a problem with the patch. The Parser.pm claims: It is good practice to set a 'type', if possible, describing how your settings are stored on the Conf object; this allows the SpamAssassin test suite to validate that the settings do not 'leak' between users. so I'd say that a: type => $CONF_TYPE_STRING, should be added to the originating_ip_headers and clear_originating_ip_headers setting, not the test tweaked. (or better yet, a $CONF_TYPE_NOARG type invented for settings like clear_headers, clear_trusted_networks, clear_internal_networks, clear_msa_networks, clear_originating_ip_headers, clear_report_template, clear_unsafe_report_template)
(in reply to comment 10) I tried adding type => $CONF_TYPE_STRING, to originating_ip_headers but it doesn't work because originating_ip_headers takes an array of strings, not a string. Similarly, there is no constant to use for options like clear_originating_ip_headers that take no argument. As you suggested, we can create new types and eliminate the corresponding entries in the list in t/cross_user_config_leak.t, ans I'm +1 for doing that, but that's outside the scope of this bug. For now, the right thing to do is to add originating_ip_headers and clear_originating_ip_headers to the exception list in t/cross_user_config_leak.t and deal with making new types in a separate bug. With the change to t/cross_user_config_leak.t and Justin's suggested change to the comments in comment #9 I'm +1 on committing this.
Fixed as per comments and committed. Sending lib/Mail/SpamAssassin/Conf.pm Sending lib/Mail/SpamAssassin/Plugin/DNSEval.pm Sending rules/10_default_prefs.cf Sending t/cross_user_config_leak.t Transmitting file data .... Committed revision 896064.
Thanks for the quick work on this guys and making this happen. This looks much cleaner then the version I submitted. So if I understand this right unless you specify "clear_originating_ip_headers" in your local.cf file you will get the default headers plus what ever you add with subsequent "originating_ip_headers" lines correct?
(In reply to comment #13) > > So if I understand this right unless you specify "clear_originating_ip_headers" > in your local.cf file you will get the default headers plus what ever you add > with subsequent "originating_ip_headers" lines correct? Correct.
(In reply to comment #11) > I tried adding type => $CONF_TYPE_STRING, to originating_ip_headers but it > doesn't work because originating_ip_headers takes an array of strings, not a > string. I don't understand - tried it myself too. The setting does its own split on whitespace. What happens (that I missed)?
(in reply to comment #15) I didn't look into the cause and it could very well have been my mistake when changing things to test it, but this is what I got when I declared originating_ip_headers as a string type, removed it from the test exclude list, then ran just the one test: $ make test TEST_FILES=t/cross_user_config_leak.t /opt/local/bin/perl build/mkrules --exit_on_no_src --src rulesrc --out rules --manifest MANIFEST --manifestskip MANIFEST.SKIP mkrules: no rules updated /opt/local/bin/perl build/preprocessor -Mvars -DVERSION="3.003000" -DPREFIX="/opt/local" -DDEF_RULES_DIR="/opt/local/share/spamassassin" -DLOCA\ L_RULES_DIR="/etc/opt/mail/spamassassin" -DLOCAL_STATE_DIR="/var/opt/spamassassin" -DINSTALLSITELIB="/opt/local/lib/perl5/site_perl/5.8.9" -DCON\ TACT_ADDRESS="sidney@example.com" -Msharpbang -Mconditional -DPERL_BIN="/opt/local/bin/perl" -DPERL_WARN="" -DPERL_TAINT="" -m755 -isa-update.raw\ -osa-update cp sa-update blib/script/sa-update /opt/local/bin/perl "-MExtUtils::MY" -e "MY->fixin(shift)" blib/script/sa-update PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/cross_user_config_leak.t t/cross_user_config_leak.t .. 1/6 Can't use string ("__test_expected_str") as an ARRAY ref while "strict refs" in use at ../lib/Mail/SpamAssassi\ n/Conf.pm line 3910. t/cross_user_config_leak.t .. Dubious, test returned 2 (wstat 512, 0x200) Failed 3/6 subtests
Created attachment 4620 [details] Suggested additional patch Some documentation change, one simplification (consistent with split(' ') in set_addrlist_value) and adding a default value [] to originating_ip_headers. As for the documentation change: at least for newer documentation I prefer to keep terminology consistent with RFC 5322 (ex RFC 2822). Namely, the loose term 'header' can mean any of the three things: - header section (a mail message consists of a header section and a body; header section consists of header fields); - header field (possibly multi-line element of a header section, consists of a field name, a colon, and a field body) - (header) field name - (header) filed body
(In reply to comment #16) > I didn't look into the cause and it could very well have been my mistake when > changing things to test it, but this is what I got when I declared > originating_ip_headers as a string type, removed it from the test exclude list, > then ran just the one test: > > $ make test TEST_FILES=t/cross_user_config_leak.t > t/cross_user_config_leak.t .. 1/6 Can't use string ("__test_expected_str") > as an ARRAY ref while "strict refs" in use > at ../lib/Mail/SpamAssassin/Conf.pm line 3910. Thanks. Now that rises a question, is a constant $CONF_TYPE_STRING (and its sisters) supposed to tell the syntax of arguments in a config file, or is it supposed to tell the internal data structure used to represent these arguments???
reopening to iron out details
(in reply to comment #18) Looking over the test code I think the type would have to indicate the internal representation. In order to read in the config file we have to call the routines that are set to read it, which contain embedded methods to parse the text file. This is especially true for this test which is checking for leaks between loads of the configuration, as it has to actually load the configuration and check that the internal data structures are not leaking through. Given that, the solution will be to define a string array type and a null type so that we can remove options such as trusted_networks, originating_ip_headers for the former, and the clear_* options for the latter. Of course the null type should cause the option to be skipped just like if it is in the ignore list. That will leave a few more options still in the ignore list, but it will be a good start.
+1 on the changes in comment #17 I think the question of config types and the leakage test can be in a new bug report, allowing us to close this one again, after we get another vote to commit this latest patch.
(In reply to comment #21) > +1 on the changes in comment #17 > > I think the question of config types and the leakage test can be in a new bug > report, allowing us to close this one again, after we get another vote to > commit this latest patch. Carrying over the out-of-scope topics to a new bug: Bug 6274 - Semantics of CONF_TYPE_* constants and additional config argument types Need one more vote for patch in comment 17.
sure +1 Though I don't get the value of split(' '), since all the others use /\s+/..
(in reply to comment 23) Oh, I didn't notice that change in the split from /\s+/ to ' ' I agree that it should be /\s+/ to be consistent with every other such config option. It would not be good for someone to use tabs in their config file and then have it break on that one option when it has always worked for them. I did a crude benchmark timing the two ways of calling split and couldn't measure a difference, so performance is not a reason to use ' '.
> Oh, I didn't notice that change in the split from /\s+/ to ' ' > > I agree that it should be /\s+/ to be consistent with every other such config > option. I'll apply the patch but keep the /\s+/. As for consistency, some code splits on ' ' (like set_addrlist_value), some does on /\s+/, there is no consistency right now, it would be nice to fix it some day. > It would not be good for someone to use tabs in their config file and > then have it break on that one option when it has always worked for them. This is a misinterpretation on what split(' ') does. There is no difference as far as tabs are concerned: man perlfunc: As a special case, specifying a PATTERN of space (' ') will split on white space just as "split" with no arguments does. [...] A "split" on "/\s+/" is like a "split(' ')" except that any leading whitespace produces a null first field.
> I'll apply the patch but keep the /\s+/. Bug 5895: a documentation change and a simplification Sending lib/Mail/SpamAssassin/Conf/Parser.pm Sending lib/Mail/SpamAssassin/Conf.pm Committed revision 896403.
> man perlfunc: > As a special case, specifying a PATTERN of space (' ') will > split on white space just as "split" with no arguments does. [...] > A "split" on "/\s+/" is like a "split(' ')" except that > any leading whitespace produces a null first field. ...which isn't desirable for us. However, I think previous code will deal with that case, so there's no real difference in results. resolving; this is now fixed, right?
(reply to comment #25 and comment @27) I didn't know either of those things about split -- I am not a perl expert in spite of working with it for SpamAssassin. The special case handling of ' ' in split feels to me like about the weirdest perl feature I've come across yet, and yes, I do know that perl has weird features. And to top it off whatever I used to search through Conf.pm for how it was being used missed the places do use split ' '. So I was wrong all around :-( I assume that when parsing the option line the arguments string begins with the non-whitespace after the initial keyword so there can be no leading whitespace that would produce a null first field if you use split /\s+/ instead of splitting on ' '. So it really doesn't make a difference either way.