Bug 7834

Summary: get_envelope_from may return junk with multiple return-path headers
Product: Spamassassin Reporter: Rob Mosher <nyt-apachebz>
Component: LibrariesAssignee: SpamAssassin Developer Mailing List <dev>
Status: NEW ---    
Severity: normal CC: apache, nyt-apachebz
Priority: P2    
Version: 3.4 SVN branch   
Target Milestone: Undefined   
Hardware: PC   
OS: Windows NT   
Whiteboard:

Description Rob Mosher 2020-07-06 02:19:15 UTC
I ran into an issue with multiple Return-Path headers.  It seems when there are multiple, get_envelope_from may return a mashed up copy of the strings.

I think a simple fix may be changing this line:

https://github.com/apache/spamassassin/blob/3.4/lib/Mail/SpamAssassin/PerMsgStatus.pm#L3047
  $envf =~ s/>*\s*\z//s;        # remove >, whitespace, newlines

To this:

  $envf =~ s/[>\015\012].*//s;        # remove > and trailing data

This will just scrap anything after the > or possible cr / lf.

This is the behavior I was seeing without the patch:

    $sender = $scanner->get("EnvelopeFrom:addr");
    dbg("spf: pms:from:addr " . $sender);
    dbg("spf: pms:from:raw " . $scanner->get("EnvelopeFrom:raw"));
    dbg("spf: pms:from " . $scanner->get("EnvelopeFrom"));
    dbg("spf: pms:get:rp " . $scanner->get("Return-Path"));

$ grep Return-Path: mail2.txt
Return-Path: <user@domain.com>
Return-Path: <user@domain.com>

dbg: spf: pms:from:addr user@domain.com> <user@domain.com
dbg: spf: pms:from:raw user@domain.com>
dbg: spf: pms:from user@domain.com>
dbg: spf: pms:get:rp <user@domain.com>

But with only one Return-Path:
dbg: spf: pms:from:addr user@domain.com
dbg: spf: pms:from:raw user@domain.com
dbg: spf: pms:from user@domain.com
dbg: spf: pms:get:rp <user@domain.com>
Comment 1 Henrik Krohns 2020-07-09 08:34:20 UTC
I can't quickly reproduce it.

Please post exact version you are using, and a complete test email file.
Comment 2 Rob Mosher 2020-07-09 09:36:26 UTC
This was on 3.4.4, but I believe I saw this on 3.2 as well.

$ spamassassin -V
SpamAssassin version 3.4.4
  running on Perl version 5.26.1

I've sanitized this message a bit, but left most of the important bits in place.  Apparently something with qmail/vmailmgr is causing the return-paths to be written twice up top.  This trips up the parser as indicated.

$ cat file | spamassassin -x -t -D 2>&1 | grep 'pms:'
Jul  9 05:14:29.398 [25536] dbg: spf: pms:from:addr user@example.com> <user@example.com
Jul  9 05:14:29.398 [25536] dbg: spf: pms:from:raw user@example.com>
Jul  9 05:14:29.398 [25536] dbg: spf: pms:from user@example.com>
Jul  9 05:14:29.398 [25536] dbg: spf: pms:get:rp <user@example.com>

Return-Path: <user@example.com>
Delivered-To: vmailacct-rmosher@example.com
Return-Path: <user@example.com>
Delivered-To: vmailacct-rmosher@example.com
Received: (qmail 7982 invoked from network); 5 Jul 2020 20:47:56 -0000
Received: from mail.example.com (1.1.1.1)
  by mail2.example.com with SMTP; 5 Jul 2020 20:47:56 -0000
Received: from mail.example.com (mail.example.com [2.2.2.2])
  by example.com
  for <rmosher@example.com>; Sun, 5 Jul 2020 13:47:45 -0700
From: Sending User <user@example.com>
Subject: Quick test mail
Date: Sun, 5 Jul 2020 13:47:43 -0700
To: Rob Mosher <rmosher@example.com>



With just one return path (removed top two lines), it works fine...

$ cat file | spamassassin -x -t -D 2>&1 | grep 'pms:'
Jul  9 05:16:01.288 [25777] dbg: spf: pms:from:addr user@example.com
Jul  9 05:16:01.288 [25777] dbg: spf: pms:from:raw user@example.com
Jul  9 05:16:01.288 [25777] dbg: spf: pms:from user@example.com
Jul  9 05:16:01.288 [25777] dbg: spf: pms:get:rp <user@example.com>


However, in some cases like mailing lists or forwarders, there may be another Return-Path. In this case I'm seeing empty data returned.

Return-Path: <mailing-list-admin@example.com>
Delivered-To: vmailacct-rmosher@example.com
Received: (qmail 7982 invoked from network); 5 Jul 2020 20:47:56 -0000
Received: from mail.example.com (1.1.1.1)
  by mail2.example.com with SMTP; 5 Jul 2020 20:47:56 -0000
Return-Path: <user@example.com>
Received: from mail.example.com (mail.example.com [2.2.2.2])
        by example.com
        for <rmosher@example.com>; Sun, 5 Jul 2020 13:47:45 -0700
From: Sending User <user@example.com>
Subject: Quick test mail
Date: Sun, 5 Jul 2020 13:47:43 -0700
To: Rob Mosher <rmosher@example.com>


$ cat file | spamassassin -x -t -D 2>&1 | grep 'pms:'
Jul  9 05:17:11.015 [25866] dbg: spf: pms:from:addr
Jul  9 05:17:11.015 [25866] dbg: spf: pms:from:raw
Jul  9 05:17:11.016 [25866] dbg: spf: pms:from
Jul  9 05:17:11.016 [25866] dbg: spf: pms:get:rp <mailing-list-admin@example.com>


Specifying 'envelope_sender_header Return-Path' in config seems to fix both of these cases as that portion of code is never reached, bug the bug is present when that is not specified.

Changing the regex as indicated in the original bug report fixes the issue for the first case.
  $envf =~ s/[>\015\012].*//s;        # remove > and trailing data

The second case with empty data appears related to this logic, which is never accessed if envelope_sender_header is set.

    if ($self->get("ALL") =~ /^Received:.*?^Return-Path:/smi) {
      dbg("message: Return-Path header found after 1 or more Received lines, cannot trust envelope-from");
    } else {
      goto ok;
    }
Comment 3 Henrik Krohns 2020-07-09 10:51:28 UTC
Atleast the first case is already fixed in trunk, backporting changes:

Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Message/Metadata/Received.pm
Sending        spamassassin-3.4/lib/Mail/SpamAssassin/PerMsgStatus.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1879700.

Need to look at the forwarder stuff..
Comment 4 Henrik Krohns 2020-07-09 10:58:36 UTC
(In reply to Henrik Krohns from comment #3)
>
> Need to look at the forwarder stuff..

There's probably discussions in old bugs, but what should we do about this? I think it's by design that EnvelopeFrom isn't used in forwarder situations. Should we trust the next Return-Path if it's inside trusted networks, or what?