Bug 6549 - Squirrelmail headers should be parsed
Summary: Squirrelmail headers should be parsed
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: PC Linux
: P2 blocker
Target Milestone: 3.4.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-04 11:58 UTC by Cedric Knight
Modified: 2012-05-15 12:10 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
A few lines of Received.pm to deal with squirrelmail patch None Cedric Knight [HasCLA]
A simpler patch to find authenticated Squirrelmail users application/octet-stream None Daniel J McDonald [NoCLA]
Patch that parses headers for Squirrelmail again patch None Kevin A. McGrail [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Cedric Knight 2011-03-04 11:58:00 UTC
Created attachment 4847 [details]
A few lines of Received.pm to deal with squirrelmail

The resolution to bug 3236 simply ignores headers added by SquirrelMail.  The intention presumably was to trust SquirrelMail authentication so that local users did not hit dynamic IP checks.  I've seen a number of false negatives (eg loan offers, webmail phishing) sent through compromised webmail accounts which originate from web clients with IP addresses in West Africa.  If these had been correctly included in X-Spam-Relays-Untrusted, then they would have at least hit uceprotect-level2 and uceprotect-level3.

As DO'S said prophetically in 2004 in bug 3302 "I wonder if the header would be useful having (for something at some point) now that we can extend the trust boundary to it (bug 2462)."  (I would have reopened that but don't think I have editbugs privileges).

Attached patch is not thoroughly tested on live server, but does add to Trusted when needed and do lookups when not.
Comment 1 Daniel J McDonald 2011-06-25 17:54:30 UTC
Created attachment 4929 [details]
A simpler patch to find authenticated Squirrelmail users

I've also noticed false negatives from Squirrelmail users.  Recommend parsing the headers in the same manner as other webmail clients
Comment 2 Adam Katz 2011-06-28 02:03:18 UTC
Let's look at the reason we "fixed" bug 3236:

If the Squirrelmail server is in the trust path, the webmail client's IP becomes the last-external "server" and can therefore hit DNSBLs and dynamic detectors (although dynamic detectors should be safe due to their dependence on __LAST_EXTERNAL_RELAY_NO_AUTH, assuming our Squirrelmail parser correctly recognizes the authentication note).

We fixed that with a workaround which is now coming back to bite us.

This goes right to the issue of whether or not an authenticated last-external relay should live in the trust path.

The relevant scenario is:  trusted relay -> trusted webmail server -> client
where the IP of the client (the end-user) can be:  trusted, clean, or dirty (DNSBL, dynamic detection, etc).  When trusted, ALL_TRUSTED fires and there is no problem.  When clean, there are similarly no problems.  When dirty, we either have a spammer or a legitimate user at e.g. a hotel.

(The other scenario, in which the trust path ends earlier, effects rules that examine beyond the last-external/last-untrusted, but those rules are written with this in mind and therefore are not problems.)

Without parsing squirrelmail's inserted Received header, we can't trace the sender.  This gives dirty IPs a free pass, enabling email from the hotel.  The fix to bug 3236 took this final header out of the equation so that HAM sent from a dirty client IP could still go out.  However, it opened the door for SPAM to benefit in the same way, enabling Nigerian scams.


There are four approaches that I see.

  1. Do not parse the header (keep things as they are)
  2. Parse the header (put things back the way they were)
  3. Parse the header and extend trust to a so-called authenticated client
  4. Require most last-external rules match __LAST_EXTERNAL_RELAY_NO_AUTH

In order to understand what to do, we need to look at more than just Squirrelmail and Horde (which appears to do this too, given bug 3236 comment 3 as exemplified in attachment 2511 [details]).  Let's look at the larger webmail providers:

Hotmail uses the X-Originating-IP header rather than forging a Received header.  I've also seen X-WebmailclientIP used for this purpose.  GMail does not log the webmail client's IP anywhere in the email.

Yahoo works like Squirrelmail, adding a Received header containing "via HTTP" like this:

Received: from [192.0.2.169] by web33701.mail.mud.yahoo.com via HTTP;
        Thu, 12 May 2011 10:38:16 PDT

We currently parse this normally, so I would argue that we should similarly parse the Squirrelmail and Horde headers.

The difference is that nobody is going to add Yahoo's webmail servers to their trust path (or at least, nobody should!), so the issue never comes up there, whereas it isn't uncommon for a company to add its webmail servers to its own internal networks list, which is where the third and fourth approaches may be of use.

Note that the fourth approach is already partially in effect; a large number of rules (e.g. all dynamic host detection rules) already depend on __LAST_EXTERNAL_RELAY_NO_AUTH.  What do the pre- bug 3236 FPs' X-Spam-Relays-External pseudoheaders look like?  Do they recognize the authentication?  What are the actual FP'ing rules?

To further the __LAST_EXTERNAL_RELAY_NO_AUTH workaround (rather than altering the trust path), we could merely add it to meta rules that wrap around the RCVD_IN_* rules, mimicking things like RDNS_DYNAMIC.  This has the side effect of nuking some of the helpful text provided by DNSBL hits...
Comment 3 Kevin A. McGrail 2011-11-02 00:55:48 UTC
Created attachment 5004 [details]
Patch that parses headers for Squirrelmail again

I believe DOS' comment on bug 3236 was correct as Adam points out.  Trusted Networks appears to be the correct answer and the code for IMP/Squirrelmail should be reintroduced back to Received.pm.

Then as Adam says, we need to see what rules are causing FPs and if they are caused by incorrect configuration.

In fact, as best I can tell the bug 3236 horde/imp code no longer exists or was never committed.

Here is a patch that turns back on Squirrelmail processing thanks to Daniel J. McDonald's efficient rule change. I've also fixed the rcvd_parser test so it passes and moved one of the tests I found in the wrong category at the same time.

+1 to commit to trunk from anyone?
Comment 4 Kevin A. McGrail 2011-12-10 14:40:33 UTC
svn commit -m 'Bug 6549 fix for received headers for authenticated squirrelmail
senders and small test fixes for rcvd_parser.t' t/rcvd_parser.t
lib/Mail/SpamAssassin/Message/Metadata/Received.pm 
Sending        lib/Mail/SpamAssassin/Message/Metadata/Received.pm
Sending        t/rcvd_parser.t
Transmitting file data ..
Committed revision 1212802.
Comment 5 Adam Katz 2011-12-10 19:36:47 UTC
My only hesitation with that code is the   /(${IP_ADDRESS}).{10,80}/   part, where ".{10,80}" could conceivably continue the IP given the lack of a delimiter.  The IP_ADDRESS constant ends with   (?![a-f0-9:])   but lacks a IPv4-ish equivalent, thus allowing mathces like "1.2.3.4.example.com" or even "1.2.3.4.5"

This regex is the only one that contains an ambiguous ending to an IP matcher, at least that I could find with   /IP_ADDRESS\}?\)?[\[.]/


This really should be a change to IP_ADDRESS itself, but for consistency, I'd like to change the regex here to include a space after ${IP_ADDRESS}
Comment 6 Kevin A. McGrail 2011-12-12 17:07:48 UTC
(In reply to comment #5)
> My only hesitation with that code is the   /(${IP_ADDRESS}).{10,80}/   part,
> where ".{10,80}" could conceivably continue the IP given the lack of a
> delimiter.  The IP_ADDRESS constant ends with   (?![a-f0-9:])   but lacks a
> IPv4-ish equivalent, thus allowing mathces like "1.2.3.4.example.com" or even
> "1.2.3.4.5"
> 
> This regex is the only one that contains an ambiguous ending to an IP matcher,
> at least that I could find with   /IP_ADDRESS\}?\)?[\[.]/
> 
> 
> This really should be a change to IP_ADDRESS itself, but for consistency, I'd
> like to change the regex here to include a space after ${IP_ADDRESS}

Sounds good to me.  The test cases should show if that works and perhaps add an extra test that is bogus.
Comment 7 Kevin A. McGrail 2012-04-02 23:23:41 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > My only hesitation with that code is the   /(${IP_ADDRESS}).{10,80}/   part,
> > where ".{10,80}" could conceivably continue the IP given the lack of a
> > delimiter.  The IP_ADDRESS constant ends with   (?![a-f0-9:])   but lacks a
> > IPv4-ish equivalent, thus allowing mathces like "1.2.3.4.example.com" or even
> > "1.2.3.4.5"
> > 
> > This regex is the only one that contains an ambiguous ending to an IP matcher,
> > at least that I could find with   /IP_ADDRESS\}?\)?[\[.]/
> > 
> > 
> > This really should be a change to IP_ADDRESS itself, but for consistency, I'd
> > like to change the regex here to include a space after ${IP_ADDRESS}
> 
> Sounds good to me.  The test cases should show if that works and perhaps add an
> extra test that is bogus.

Following up on this, is this the change you are asking for?

Index: lib/Mail/SpamAssassin/Message/Metadata/Received.pm
===================================================================
--- lib/Mail/SpamAssassin/Message/Metadata/Received.pm  (revision 1308610)
+++ lib/Mail/SpamAssassin/Message/Metadata/Received.pm  (working copy)
@@ -449,7 +449,7 @@
     # Expanded to NaSMail Bug 6783
     if (/ \((?:SquirrelMail|NaSMail) authenticated user /) {
       #REVERTING bug 3236 and implementing re: bug 6549
-      if (/(${IP_ADDRESS}).{10,80}by (\S+) with HTTP/) {
+      if (/(${IP_ADDRESS}) .{10,80}by (\S+) with HTTP/) {
         $ip = $1; $by = $2; goto enough;
       }
     }

If so, that change causes this failure which frankly baffles me a bit but appears to have to do with a reverse lookup.  Any thoughts?

# Failed test 65 in t/rcvd_parser.t at line 497 fail #63
not ok 65
expected: [ ip=143.166.226.16 rdns= helo= by=www.penguintowne.org ident= envfrom= id= auth=Sendmail msa=0 ]
got     : [ ip=143.166.226.16 rdns= helo=ausisaps301-dmz.aus.amer.dell.com by=www.penguintowne.org ident= envfrom= id= auth=Sendmail msa=0 ]
hdr sample: -------------------------------------------------------------------
from ausisaps301-dmz.aus.amer.dell.com ([143.166.226.16]) (SquirrelMail authenticated user hoolis); by www.penguintowne.org with HTTP; Mon, 22 Mar 2004 12:54:13 -0600 (CST)
------------------------------------------------------------------------------
Comment 8 Adam Katz 2012-04-06 15:08:56 UTC
(In reply to comment #7)
> Following up on this, is this the change you are asking for?
> 
> Index: lib/Mail/SpamAssassin/Message/Metadata/Received.pm
> ===================================================================
> --- lib/Mail/SpamAssassin/Message/Metadata/Received.pm  (revision 1308610)
> +++ lib/Mail/SpamAssassin/Message/Metadata/Received.pm  (working copy)
> @@ -449,7 +449,7 @@
>      # Expanded to NaSMail Bug 6783
>      if (/ \((?:SquirrelMail|NaSMail) authenticated user /) {
>        #REVERTING bug 3236 and implementing re: bug 6549
> -      if (/(${IP_ADDRESS}).{10,80}by (\S+) with HTTP/) {
> +      if (/(${IP_ADDRESS}) .{10,80}by (\S+) with HTTP/) {
>          $ip = $1; $by = $2; goto enough;
>        }
>      }

Not quite; you didn't allow for brackets or parens.  How about this:

-      if (/(${IP_ADDRESS}).{10,80}by (\S+) with HTTP/) {
+      if (/(${IP_ADDRESS})\b(?![.-]).{10,80}by (\S+) with HTTP/) {

That should prevent 1.2.3.4.example.com and 1.2.3.4-dyn.example.com but still allow [1.2.3.4] and (1.2.3.4)    (No other nonword characters are legal in domains or IPv4.)

> If so, that change causes this failure which frankly baffles me a bit but
> appears to have to do with a reverse lookup.  Any thoughts?
> 
> # Failed test 65 in t/rcvd_parser.t at line 497 fail #63
> not ok 65
> expected: [ ip=143.166.226.16 rdns= helo= by=www.penguintowne.org ident=
> envfrom= id= auth=Sendmail msa=0 ]
> got     : [ ip=143.166.226.16 rdns= helo=ausisaps301-dmz.aus.amer.dell.com
> by=www.penguintowne.org ident= envfrom= id= auth=Sendmail msa=0 ]
> hdr sample: -------------------------------------------------------------------
> from ausisaps301-dmz.aus.amer.dell.com ([143.166.226.16]) (SquirrelMail
> authenticated user hoolis); by www.penguintowne.org with HTTP; Mon, 22 Mar
> 2004 12:54:13 -0600 (CST)
> ------------------------------------------------------------------------------

Presumably, it got parsed by another portion of the code; your edit requires a space after the IP but that sample has a square bracket.
Comment 9 Kevin A. McGrail 2012-05-15 12:10:03 UTC
Thanks Adam.

svn commit -m 'more parsing cleanup thanks to Adam Katz per bug 6549' lib/Mail/SpamAssassin/Message/Metadata/Received.pm
Sending        lib/Mail/SpamAssassin/Message/Metadata/Received.pm
Transmitting file data .
Committed revision 1338665.