Bug 7224 - Multiple issues with SPF Plugin
Summary: Multiple issues with SPF Plugin
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: PC Linux
: P2 normal
Target Milestone: 3.4.3
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 6920 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-07-16 18:12 UTC by Alessandro Vesely
Modified: 2018-10-19 13:01 UTC (History)
7 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
patch for SPF plugin patch None Alessandro Vesely [HasCLA]
before text/plain None Alessandro Vesely [HasCLA]
after text/plain None Alessandro Vesely [HasCLA]
Apache Individual Contributor License Agreement (ICLA) text/plain None Alessandro Vesely [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Alessandro Vesely 2015-07-16 18:12:16 UTC
Created attachment 5315 [details]
patch for SPF plugin

Benny,
incorrect header parsing is also described in bug #6920 which seems old and inaccurate.  You may want to invalidate that and fix this.

First, the line:

    my @internal_hdrs = split("\n", $scanner->get('ALL-INTERNAL'));

does not do what it means.  Perhaps the PerMessageStatus::get() function returns different values depending on some other software?  I saved the string it returns to a file, and it compared equal to the output of:

    sed -n '1,20s/$/\n/p' < mailmessage

That is, doubled newlines and no unfolding (line 20 is the first Received:).  Thus, splitting at newlines makes no sense.  I attach a patch to unfold header fields correctly.  (NOTE: although the patch is taken against old stuff, the trunk doesn't seem to differ --see sub get_all_hdrs_in_rcvd_index_range at line 2897.)

Second, a single Authentication-Results field can contain multiple SPF results.  The plugin gets the first one only.  I just mention this in view of the third issue, since HELO is seldom used.

Third, it may be useless to check HELO results when an MFROM result was already found in an internal Received-SPF or Authentication-Results field.  It would be handy to have a configuration item to avoid a DNS lookup in such cases.

Ale
Comment 1 Dave Jones 2017-06-14 22:33:18 UTC
This seems like a valid issue with a patch.  Is this patched yet in later versions of SA?
Comment 2 Alessandro Vesely 2017-06-16 11:35:24 UTC
Created attachment 5448 [details]
before

Dumper(\@internal_hdrs)
Comment 3 Alessandro Vesely 2017-06-16 11:35:54 UTC
Created attachment 5449 [details]
after

Dumper(\@internal_hdrs)
Comment 4 Alessandro Vesely 2017-06-16 11:37:40 UTC
That doesn't seem to be the case.

I got a fresh trunk version, and added the following, using Data::Dumper:

open(my $hdbgfile, ">", "spf-dbg-hdrs-before") or die $!;
print $hdbgfile Dumper(\@internal_hdrs);
close($hdbgfile);

Then I scanned the notification mail I just received.  Then, I patched SPF.pm, changed the file name to "spf-dbg-hdrs-after" and scanned the message again.  The output was the same, except for

Jun 16 13:07:01.063 [29898] dbg: spf: checking to see if the message has a Received-SPF header that we can use
Jun 16 13:07:01.064 [29898] dbg: spf: found a Received-SPF header added by an internal host: Received-SPF: none (Address does not pass the Sender Policy Framework)
Jun 16 13:07:01.064 [29898] dbg: spf: re-using mfrom result from Received-SPF header: none
Jun 16 13:07:01.064 [29898] dbg: spf: found a Received-SPF header added by an internal host: Received-SPF: unknown (Address does not pass the Sender Policy Framework)
Jun 16 13:07:01.064 [29898] dbg: spf: could not parse result from existing Received-SPF header

turned into

Jun 16 13:20:36.638 [29989] dbg: spf: checking to see if the message has a Received-SPF header that we can use
Jun 16 13:20:36.639 [29989] dbg: spf: found a Received-SPF header added by an internal host: Received-SPF: none (Address does not pass the Sender Policy Framework) SPF=MAILFROM; sender=bugzilla-daemon@bugzilla.spamassassin.org; remoteip=209.188.14.142; remotehost=pnap-us-west-generic-nat.apache.org; helo=spamd3-us-west.apache.org; receiver=wmail.tana.it;
Jun 16 13:20:36.639 [29989] dbg: spf: re-using mfrom result from Received-SPF header: none
Jun 16 13:20:36.639 [29989] dbg: spf: found a Received-SPF header added by an internal host: Received-SPF: unknown (Address does not pass the Sender Policy Framework) SPF=HELO; sender=spamd3-us-west.apache.org; remoteip=209.188.14.142; remotehost=pnap-us-west-generic-nat.apache.org; helo=spamd3-us-west.apache.org; receiver=wmail.tana.it;
Jun 16 13:20:36.639 [29989] dbg: spf: could not parse result from existing Received-SPF header
Jun 16 13:20:36.666 [29989] dbg: spf: using Mail::SPF for SPF checks

The Received-SPF cannot be parsed in both cases, in the 2nd attempt because of its format.  I attach the debug files showing what the plugin was trying to parse.
Comment 5 Dave Jones 2018-01-28 16:52:27 UTC
Anyone been running with this patch in production for a while to test it in the real world?  The before and after attachments seem to be drastically different so I am concerned that this patch could cause unintended consequences.  I would need to run it live for a while to make sure there are no unforeseen problems.
Comment 6 Alessandro Vesely 2018-01-29 10:30:45 UTC
> Anyone been running with this patch in production for a while
> to test it in the real world?

Oops, not me.  My patched version must have been overwritten during some system upgrade.

> The before and after attachments seem to be drastically different

Yes, the ``split("\n", scanner->get'' approach assumes Received-SPF headers stay in a single line.  Since the header field is rather longish, and the standard used to impose 78-char limits, Received-SPF: are customarily folded into several lines.  Of course, only the first line contains the header field name "Received-SPF".  See "Long Header Fields" in the standard:
https://tools.ietf.org/html/rfc5322#section-2.2.3

The existing code catches only the SPF result ($1), only in the vast majority of cases.  The code looking for identity ($2) tag won't usually act.

For Authentication-Results: header fields, the most common habit is to fold after each semicolon, so the likelihood to catch an SPF result that way is very low.

An alternative, and more general way to fix this flaw is to equip the scanner object with a method to retrieve unfolded header fields.  Unfolding is the purpose of the substitution ``s/\n\n\s+/ /sg'', in the patch.  In that respect, the presence of two consecutive line feeds should be considered a bug in its own right, since only a single line feed is present in either folding white space (FWS) or between consecutive header fields.

In the same respect, since Authentication-Results: is a generic header field used also by other authentication methods (such as DKIM and DMARC) it should be parsed once.

My understanding is that the SA plugins which deal with those protocols are lagging in a useless state limbo.  In fact, the way email authentication is currently deployed, makes it relevant only to giant mailbox providers, who have such a wide user base that they can keep reliable statistics about identified actors.  The rest of us have not yet understood how to use SPF and DKIM results.

Ale
Comment 7 Kevin A. McGrail 2018-08-29 01:44:51 UTC
Alessandro, can you submit an ICLA so we can consider your patch? https://www.apache.org/licenses/icla.pdf
Comment 8 Alessandro Vesely 2018-08-29 11:10:34 UTC
Created attachment 5589 [details]
Apache Individual Contributor License Agreement (ICLA)

After Kevin's request I compile and submit an ICLA.
Comment 9 Kevin A. McGrail 2018-08-29 11:20:36 UTC
The content of attachment 5589 [details] has been deleted for the following reason:

ICLA not intended for BZ
Comment 10 Alessandro Vesely 2018-08-29 11:37:31 UTC
(In reply to Kevin A. McGrail from comment #9)
> The content of attachment 5589 [details] has been deleted for the following
> reason:
> 
> ICLA not intended for BZ

What BZ?
Comment 11 Kevin A. McGrail 2018-08-29 11:42:57 UTC
(In reply to Alessandro Vesely from comment #10)
> (In reply to Kevin A. McGrail from comment #9)
> > The content of attachment 5589 [details] has been deleted for the following
> > reason:
> > 
> > ICLA not intended for BZ
> 
> What BZ?

Shorthand for Bugzilla.  I forwarded your ICLA to the secretary and removed it here. :-)
Comment 12 Kevin A. McGrail 2018-08-29 15:48:18 UTC
Your ICLA is all filed and reflected now on your Bugzilla account.  Thanks!
Comment 13 Bill Cole 2018-09-05 00:16:18 UTC
I've applied and tested the provided patch to both 3.4 and trunk, but I am concerned by what seems to be a deeper problem of the excess linefeeds in ALL-INTERNAL. I am not comfortable committing the working patch which is reliant on that perversely (buggy?) behavior. 

Unless I find something illuminating in the code, I don't expect this fix will make it to 3.4.2.
Comment 14 Bill Cole 2018-09-05 00:36:13 UTC
Root bug: M:S:PMS:get_all_hdrs_in_rcvd_index_range() calls M:S:Message:get_pristine_header() and splits the returned text block on '/^/m' which leaves trailing linefeeds intact on each line. It then concatenates them with added linefeeds. 

To detangle this may take a bit of time.
Comment 15 Alessandro Vesely 2018-09-05 14:19:08 UTC
(In reply to Bill Cole from comment #14)
> Root bug: M:S:PMS:get_all_hdrs_in_rcvd_index_range() calls
> M:S:Message:get_pristine_header() and splits the returned text block on
> '/^/m' which leaves trailing linefeeds intact on each line. It then
> concatenates them with added linefeeds. 
> 
> To detangle this may take a bit of time.

Likely, callers of that function are somewhat buggy too.

The correct way, according to rfc5322, would be to unwrap header fields, possibly leaving newlines in place.  In fact, the semantics of field contents is independent on how they are wrapped, so CRLFs could be removed as well.  Note that some servers, e.g. Courier-MTA, replace CRLFs with plain LFs.

jm2c
Comment 16 Henrik Krohns 2018-10-19 12:53:28 UTC
*** Bug 6920 has been marked as a duplicate of this bug. ***
Comment 17 Henrik Krohns 2018-10-19 12:57:51 UTC
$pms->get(ALL) returns unfolded header lines. The problem was ALL-INTERNAL etc not doing the same. Now they return unfolded lines unless :raw called.

Sending        spamassassin-3.4/lib/Mail/SpamAssassin/PerMsgStatus.pm
Sending        trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1844334.

This would have been caught in t/spf.t, if it didn't use test messages with unfolded headers for some reason? (*DUHHH*)
Comment 18 Henrik Krohns 2018-10-19 13:01:22 UTC
Fixed unfolded test files in trunk. Won't bother with 3.4.

Sending        MANIFEST
Sending        t/data/nice/spf1
Sending        t/data/nice/spf2
Sending        t/data/nice/spf3
Sending        t/data/nice/spf3-received-spf
Adding         t/data/nice/spf4-received-spf-nofold
Sending        t/spf.t
Transmitting file data .......done
Committing transaction...
Committed revision 1844336.