Bug 5895 - [review] Configurable Originating Headers for RBL
Summary: [review] Configurable Originating Headers for RBL
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 3.2.4
Hardware: Other All
: P1 enhancement
Target Milestone: 3.3.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-25 08:07 UTC by William Taylor
Modified: 2010-01-06 10:02 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Patch to allow configurable originating headers for rbl checks patch None William Taylor [NoCLA]
originating_ip_headers option patch None Henrik Krohns [HasCLA]
Suggested additional patch patch None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description William Taylor 2008-04-25 08:07:59 UTC
This patch will allow you to specify extra originating headers in your conf file for rbl checks.
Comment 1 William Taylor 2008-04-25 08:08:58 UTC
Created attachment 4308 [details]
Patch to allow configurable originating headers for rbl checks
Comment 2 Sidney Markowitz 2010-01-04 16:45:46 UTC
I've been asked to consider this RFE for inclusion in 3.3.0, which it is already targeted for but at P5 so it would not have made the cut if not brought to our attention.

The purpose of this enhancement, as I see it, is to add a config option to facilitate an ISP making use of their own heuristics for determining a sending IP address so that it can be checked against the DNSRBL rules in circumstances where the trusted_network heuristics don't do the job, for example some mail sent from a webmail interface or via some forwarding methods. My apologies to the original submitter if I didn't get that right.

I have two questions for the other devs:

1) Is it too late -- Are we too frozen to consider adding this?

2) Look at finish_tests() defined in the proposed patch. The comments in Plugin.pm imply that finish_tests() is to be defined by a plugin to clean up an eval'd method that th plugin has pushed on a global array. That is how it is used in the plugins that currently define it. This patch only pushes string values from the configuration, not eval's methods, on to @EXTRA_RBL_HEADERS, so maybe finish_tests is not needed. However, there are no other plugins that use a global array in this way - Either they define such a global variable and push eval'd methods on it and then define finish_tests, or else they push the config values into a hash on $self such as the way $self->{uridnsbls} is used in URIDNSBL.pm. My question is, should this patch use $self->{dnsevalextras} (a new name I just made up) instead of @EXTRA_RBL_HEADERS, and if not, does it need to define finish_tests() to empty @EXTRA_RBL_HEADERS?

I think that question 2 is more a matter of style than of substance and should not be used as a reason to reject this patch for 3.3.0, in case anyone is inclined to say that having such questions makes it too uncertain to go with at this point. I think the decision about including this should depend on whether or not it is too late to include what I think is a very safe and simple enhancement.
Comment 3 Henrik Krohns 2010-01-04 23:36:19 UTC
I was thinking about this sometime ago, tested adding other generic headers. The defaults are pretty much the most common, didn't get much that much better results.

Aren't SA config variables are usually read with $pms->{conf}? The global array doesn't make much sense. Also after this change, there isn't any point to have hardcoded values. The "rbl" part doesn't sound good either, since what this really affects is received path.

A quick propose:

clear_originating_headers
add_originating_header

I can rewrite it quickly unless no one objects..
Comment 4 Henrik Krohns 2010-01-04 23:43:40 UTC
Duh.. make that:

clear_originating_headers
originating_headers

To be more like trusted_networks, and add_* seems to mean adding a header into message.
Comment 5 Henrik Krohns 2010-01-04 23:53:57 UTC
Seems I remembered wrong. The headers are only processed rbl lookups. Though in the future I'd like to see that change. Maybe having X-Spam-Relays-Originating or such so the information could be used more generally. But for now we can go for the simple change..
Comment 6 Sidney Markowitz 2010-01-05 00:23:41 UTC
(in reply to comment 5)

> Seems I remembered wrong. The headers are only processed rbl lookups

I was following you and agreeing until I saw that. I don't understand what's wrong with what you said in comment 3 and comment 4.

I agree that what this option means is more like

clear_originating_headers
originating_headers

which would allow us to have a default initial configuration of the ones that are hardcoded but allow full flexibility in the configuration. And these are a list of headers that specify the originating IP address, not necessarily just something for DNSRBL. Maybe original_ip_header would be better than originating_header.

If you want to rewrite it to use $pms->{conf} instead of the array and with those names, I think that would be great.
Comment 7 Henrik Krohns 2010-01-05 01:11:55 UTC
Created attachment 4618 [details]
originating_ip_headers option


Please review, especially the documentation, since it's not my forte. ;-)
Comment 8 Sidney Markowitz 2010-01-05 03:12:04 UTC
Reading it over, it all looks good. I'm still running tests.

It looks like originating_ip_headers and clear_originating_ip_headers have to be added to the list in t/cross_user_config_leak.t just like trusted_networks and clear_trusted_networks or else that test fails. But that is just updating the test case, not a problem with the patch.
Comment 9 Justin Mason 2010-01-05 03:32:42 UTC
+1 to Henrik's patch.   this isn't too late for 3.3.0, as it won't affect mass-check results/rescoring results for users who leave it at the defaults.

minor typo:


+These IP addresses virtually appended into the Received: chain, so they are
+used in RBL checks where appropriate.

=>

+These IP addresses are virtually appended into the Received: chain, so they are
+used in RBL checks where appropriate.


+Currently the IP addresses are not added into X-Spam-Relays-* headers, but
+may do so in the future.

=>

+Currently the IP addresses are not added into X-Spam-Relays-* headers, but
+they may be in the future.
Comment 10 Mark Martinec 2010-01-05 05:21:24 UTC
(In reply to comment #8)
> It looks like originating_ip_headers and clear_originating_ip_headers have to
> be added to the list in t/cross_user_config_leak.t just like trusted_networks
> and clear_trusted_networks or else that test fails. But that is just updating
> the test case, not a problem with the patch.

The Parser.pm claims:
It is good practice to set a 'type', if possible, describing how your settings
are stored on the Conf object; this allows the SpamAssassin test suite to
validate that the settings do not 'leak' between users.

so I'd say that a:
  type => $CONF_TYPE_STRING,
should be added to the originating_ip_headers and clear_originating_ip_headers
setting, not the test tweaked.

(or better yet, a $CONF_TYPE_NOARG type invented for settings like
clear_headers, clear_trusted_networks, clear_internal_networks,
clear_msa_networks, clear_originating_ip_headers, clear_report_template,
clear_unsafe_report_template)
Comment 11 Sidney Markowitz 2010-01-05 06:01:20 UTC
(in reply to comment 10)

I tried adding type => $CONF_TYPE_STRING, to originating_ip_headers but it doesn't work because originating_ip_headers takes an array of strings, not a string. Similarly, there is no constant to use for options like clear_originating_ip_headers that take no argument.

As you suggested, we can create new types and eliminate the corresponding entries in the list in t/cross_user_config_leak.t, ans I'm +1 for doing that, but that's outside the scope of this bug.

For now, the right thing to do is to add originating_ip_headers and clear_originating_ip_headers to the exception list in t/cross_user_config_leak.t and deal with making new types in a separate bug.

With the change to t/cross_user_config_leak.t and Justin's suggested change to the comments in comment #9 I'm +1 on committing this.
Comment 12 Henrik Krohns 2010-01-05 06:27:43 UTC
Fixed as per comments and committed.

Sending        lib/Mail/SpamAssassin/Conf.pm
Sending        lib/Mail/SpamAssassin/Plugin/DNSEval.pm
Sending        rules/10_default_prefs.cf
Sending        t/cross_user_config_leak.t
Transmitting file data ....
Committed revision 896064.
Comment 13 William Taylor 2010-01-05 08:15:40 UTC
Thanks for the quick work on this guys and making this happen. This looks much cleaner then the version I submitted.

So if I understand this right unless you specify "clear_originating_ip_headers" in your local.cf file you will get the default headers plus what ever you add with subsequent "originating_ip_headers" lines correct?
Comment 14 Henrik Krohns 2010-01-05 08:23:17 UTC
(In reply to comment #13)
>
> So if I understand this right unless you specify "clear_originating_ip_headers"
> in your local.cf file you will get the default headers plus what ever you add
> with subsequent "originating_ip_headers" lines correct?

Correct.
Comment 15 Mark Martinec 2010-01-05 08:37:32 UTC
(In reply to comment #11)
> I tried adding type => $CONF_TYPE_STRING, to originating_ip_headers but it
> doesn't work because originating_ip_headers takes an array of strings, not a
> string.

I don't understand - tried it myself too. The setting does its own split
on whitespace. What happens (that I missed)?
Comment 16 Sidney Markowitz 2010-01-05 09:31:07 UTC
(in reply to comment #15)

I didn't look into the cause and it could very well have been my mistake when changing things to test it, but this is what I got when I declared originating_ip_headers as a string type, removed it from the test exclude list, then ran just the one test:

$ make test TEST_FILES=t/cross_user_config_leak.t
/opt/local/bin/perl build/mkrules --exit_on_no_src --src rulesrc --out rules --manifest MANIFEST --manifestskip MANIFEST.SKIP
mkrules: no rules updated
/opt/local/bin/perl build/preprocessor  -Mvars -DVERSION="3.003000" -DPREFIX="/opt/local" -DDEF_RULES_DIR="/opt/local/share/spamassassin" -DLOCA\
L_RULES_DIR="/etc/opt/mail/spamassassin" -DLOCAL_STATE_DIR="/var/opt/spamassassin" -DINSTALLSITELIB="/opt/local/lib/perl5/site_perl/5.8.9" -DCON\
TACT_ADDRESS="sidney@example.com" -Msharpbang -Mconditional -DPERL_BIN="/opt/local/bin/perl" -DPERL_WARN="" -DPERL_TAINT="" -m755 -isa-update.raw\
 -osa-update
cp sa-update blib/script/sa-update
/opt/local/bin/perl "-MExtUtils::MY" -e "MY->fixin(shift)" blib/script/sa-update
PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/cross_user_config_leak.t
t/cross_user_config_leak.t .. 1/6 Can't use string ("__test_expected_str") as an ARRAY ref while "strict refs" in use at ../lib/Mail/SpamAssassi\
n/Conf.pm line 3910.
t/cross_user_config_leak.t .. Dubious, test returned 2 (wstat 512, 0x200)
Failed 3/6 subtests
Comment 17 Mark Martinec 2010-01-05 09:43:27 UTC
Created attachment 4620 [details]
Suggested additional patch

Some documentation change, one simplification (consistent with split(' ') in
set_addrlist_value) and adding a default value [] to originating_ip_headers.

As for the documentation change: at least for newer documentation I prefer
to keep terminology consistent with RFC 5322 (ex RFC 2822). Namely, the
loose term 'header' can mean any of the three things:
- header section (a mail message consists of a header section and a body;
    header section consists of header fields);
- header field (possibly multi-line element of a header section, consists
    of a field name, a colon, and a field body)
- (header) field name
- (header) filed body
Comment 18 Mark Martinec 2010-01-05 10:03:54 UTC
(In reply to comment #16)
> I didn't look into the cause and it could very well have been my mistake when
> changing things to test it, but this is what I got when I declared
> originating_ip_headers as a string type, removed it from the test exclude list,
> then ran just the one test:
> 
> $ make test TEST_FILES=t/cross_user_config_leak.t
> t/cross_user_config_leak.t .. 1/6 Can't use string ("__test_expected_str")
> as an ARRAY ref while "strict refs" in use
> at ../lib/Mail/SpamAssassin/Conf.pm line 3910.

Thanks. Now that rises a question, is a constant $CONF_TYPE_STRING
(and its sisters) supposed to tell the syntax of arguments in a config file,
or is it supposed to tell the internal data structure used to represent
these arguments???
Comment 19 Mark Martinec 2010-01-05 12:33:00 UTC
reopening to iron out details
Comment 20 Sidney Markowitz 2010-01-05 13:32:42 UTC
(in reply to comment #18)

Looking over the test code I think the type would have to indicate the internal representation. In order to read in the config file we have to call the routines that are set to read it, which contain embedded methods to parse the text file. This is especially true for this test which is checking for leaks between loads of the configuration, as it has to actually load the configuration and check that the internal data structures are not leaking through.

Given that, the solution will be to define a string array type and a null type so that we can remove options such as trusted_networks, originating_ip_headers for the former, and the clear_* options for the latter. Of course the null type should cause the option to be skipped just like if it is in the ignore list.

That will leave a few more options still in the ignore list, but it will be a good start.
Comment 21 Sidney Markowitz 2010-01-05 14:51:25 UTC
+1 on the changes in comment #17

I think the question of config types and the leakage test can be in a new bug report, allowing us to close this one again, after we get another vote to commit this latest patch.
Comment 22 Mark Martinec 2010-01-05 16:05:36 UTC
(In reply to comment #21)
> +1 on the changes in comment #17
> 
> I think the question of config types and the leakage test can be in a new bug
> report, allowing us to close this one again, after we get another vote to
> commit this latest patch.

Carrying over the out-of-scope topics to a new bug:
  Bug 6274 - Semantics of CONF_TYPE_* constants and additional
  config argument types


Need one more vote for patch in comment 17.
Comment 23 Henrik Krohns 2010-01-05 23:30:25 UTC
sure +1

Though I don't get the value of split(' '), since all the others use /\s+/..
Comment 24 Sidney Markowitz 2010-01-06 00:06:02 UTC
(in reply to comment 23)

Oh, I didn't notice that change in the split from /\s+/ to ' '

I agree that it should be /\s+/ to be consistent with every other such config option. It would not be good for someone to use tabs in their config file and then have it break on that one option when it has always worked for them.

I did a crude benchmark timing the two ways of calling split and couldn't measure a difference, so performance is not a reason to use ' '.
Comment 25 Mark Martinec 2010-01-06 03:20:22 UTC
> Oh, I didn't notice that change in the split from /\s+/ to ' '
> 
> I agree that it should be /\s+/ to be consistent with every other such config
> option.

I'll apply the patch but keep the /\s+/.

As for consistency, some code splits on ' ' (like set_addrlist_value),
some does on /\s+/, there is no consistency right now, it would be nice
to fix it some day.

> It would not be good for someone to use tabs in their config file and
> then have it break on that one option when it has always worked for them.

This is a misinterpretation on what split(' ') does.
There is no difference as far as tabs are concerned:

man perlfunc:
  As a special case, specifying a PATTERN of space (' ') will
  split on white space just as "split" with no arguments does. [...]
  A "split" on "/\s+/" is like a "split(' ')" except that
  any leading whitespace produces a null first field.
Comment 26 Mark Martinec 2010-01-06 03:44:38 UTC
> I'll apply the patch but keep the /\s+/.

Bug 5895: a documentation change and a simplification
Sending        lib/Mail/SpamAssassin/Conf/Parser.pm
Sending        lib/Mail/SpamAssassin/Conf.pm
Committed revision 896403.
Comment 27 Justin Mason 2010-01-06 04:12:20 UTC
> man perlfunc:
>   As a special case, specifying a PATTERN of space (' ') will
>   split on white space just as "split" with no arguments does. [...]
>   A "split" on "/\s+/" is like a "split(' ')" except that
>   any leading whitespace produces a null first field.

...which isn't desirable for us.  However, I think previous code will deal with that case, so there's no real difference in results.

resolving; this is now fixed, right?
Comment 28 Sidney Markowitz 2010-01-06 10:02:43 UTC
(reply to comment #25 and comment @27)

I didn't know either of those things about split -- I am not a perl expert in spite of working with it for SpamAssassin. The special case handling of ' ' in split feels to me like about the weirdest perl feature I've come across yet, and yes, I do know that perl has weird features.

And to top it off whatever I used to search through Conf.pm for how it was being used missed the places do use split ' '. So I was wrong all around :-(

I assume that when parsing the option line the arguments string begins with the non-whitespace after the initial keyword so there can be no leading whitespace that would produce a null first field if you use split /\s+/ instead of splitting on ' '. So it really doesn't make a difference either way.