Bug 2787 - [review] Type of received line misunderstood
Summary: [review] Type of received line misunderstood
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 2.61
Hardware: PC Linux
: P5 normal
Target Milestone: 2.62
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 2896
  Show dependency tree
 
Reported: 2003-11-27 17:03 UTC by Niels Teglsbo
Modified: 2004-01-13 14:43 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
Insert my received rule above the other patch None Niels Teglsbo [NoCLA]
replacement patch patch None Justin Mason [HasCLA]
Insert my new rule above the Content Technologies rule patch None Niels Teglsbo [NoCLA]
implementation take 2 patch None Justin Mason [HasCLA]
impl take 3 patch None Justin Mason [HasCLA]
impl 4 patch None Justin Mason [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Niels Teglsbo 2003-11-27 17:03:46 UTC
I receive mails with a received line that looks like this:

Received: from heloname (212.54.64.170) by cpmail.dk.tiscali.com (6.7.018) id 
3FC619BA30000383B for user@example.com; Fri, 28 Nov 2003 01:25:17 +0100

Unfortunately the received line is matched by this rule in Received.pm:

    # Received: from scv3.apple.com (scv3.apple.com) by mailgate2.apple.com 
(Content Technologies SMTPRS 4.2.1) with ESMTP id  
<T61095998e1118164e13f8@mailgate2.apple.com>; Mon, 17 Mar 2003 17:04:54 -0800
    if (/^from (\S+) \((\S+)\) by (\S+) \(/) {
      $helo = $1; $rdns = $2; $by = $3; goto enough;
    }

And that rule doesn't assign the IP address correctly (or at all).

I have added a new rule above that rule to catch "my" type of received lines:

    # Received: from heloname (212.54.64.170) by cpmail.dk.tiscali.com (6.7.018) 
id 3FC619A30000383B for user@example.com; Fri, 28 Nov 2003 01:25:17 +0100
    if (/^from (\S+) \((${IP_ADDRESS})\) by (\S+) \(\S+\) id \S+ for \S+/) {
      $helo = $1; $ip = $2; $by = $3; goto enough;
    }

As far as I can tell this works.

I don't know if you would also want to narrow down the regular expression for 
the Content Technologies received header.

Best regards
Niels Teglsbo
Comment 1 Niels Teglsbo 2003-11-27 17:06:54 UTC
Created attachment 1585 [details]
Insert my received rule above the other

This patch inserts my rule above the other rule in Received.pm
Comment 2 Justin Mason 2003-12-01 23:55:58 UTC
Created attachment 1603 [details]
replacement patch

Good point, it does need that.	However, we already have a regexp to match it. 
oops!  So this (replacement) patch basically moves that existing line above the
Content Technologies line.
Comment 3 Malte S. Stretz 2003-12-08 04:49:41 UTC
+1 makes sense 
Comment 4 Theo Van Dinter 2003-12-08 16:07:40 UTC
+1  sure.  merry christmas. :)
Comment 5 Justin Mason 2003-12-08 22:16:28 UTC
applied -- merry christmas yerself! ;)

(hope you're not too snowed in over there)
Comment 6 Niels Teglsbo 2004-01-01 19:23:41 UTC
I am sorry to report that I have problems with SA 2.61 and received lines from 
the same mail server:

Received: from dyn-81-166-39-132.ppp.tiscali.fr (81.166.39.132) by cpmail.dk.
tiscali.com (6.7.018)
        id 3FE6899B004FE7A4; Thu, 1 Jan 2004 05:28:49 +0100

The line is not caught by the replacement patch's regex:
  /^from (\S+) \((${IP_ADDRESS})\) by (\S+) with /
because there is no "with" in the line.

Also my own patch would not have caught it either because there is no "for 
user@example.com" in this instance of the header.

So I will propose a path similar to my first one, that will make a special rule 
for that kind of mail server.


Comments on Received.pm in general:

The Content Technologies rule is very broad and doesn't even assign $ip. When 
control reaches the enough label the parser will just return because there is no 
$ip defined - without any debugging information.

Also the tests for defined $ip above the call to extract_ipv4_addr_from_string 
are bogus, the $ip variable is initialized with
 my $ip = '';
So it is always defined below that, the tests should be
 $ip eq ''
And you might want to add tests just after the enough label, maybe something 
like:

  if ($ip eq '') {
    dbg ("received-header: known format, unknown IP address: $_");
  }

At least that would have made it easier for me to track down the bug.

Another nice thing would have been easy identification of the rule that matched 
the header, that might be done simply by a dbg call for every rule.

Also some regression testing for header lines would be nice, it's pretty hard to 
tell if you are ruining some other rule when you add or modify a rule.
Comment 7 Niels Teglsbo 2004-01-01 19:25:55 UTC
Created attachment 1648 [details]
Insert my new rule above the Content Technologies rule
Comment 8 Theo Van Dinter 2004-01-01 21:53:50 UTC
moving to 2.62, haven't looked into the issue yet.
Comment 9 Justin Mason 2004-01-07 18:47:10 UTC
Created attachment 1663 [details]
implementation take 2

Argh.  Thanks for spotting that -- again :(

The attached patch fixes the previous regexp to support that mailserver style
(correctly this time!) -- and adds debugging along the lines of what you
suggest, because it makes sense.
Comment 10 Theo Van Dinter 2004-01-07 18:58:41 UTC
+1   looks like it's ok.
Comment 11 Niels Teglsbo 2004-01-08 09:00:11 UTC
It should be noted that "implementation take 2" is incompatible with the patch 
proposed for bug #2896 (which I think was made in response to my comments in 
this bug).

The patch for #2896 makes $ip undefined, and "implementation take 2" tests if 
$ip eq ''.

So only one of them should be applied, or one of them should be changed.
Comment 12 Niels Teglsbo 2004-01-08 09:49:57 UTC
With the suggested regex:

/^from ([^\d]\S+) \((${IP_ADDRESS})\) by (\S+) /

You won't catch this actual (but anonymized) header:

Received: from 2mail.example.org (180.136.3.16) by cpmail.dk.tiscali.com (6.7.
018)
        id 3FE6899B00861222 for example@example.dk; Thu, 8 Jan 2004 18:37:02 
+0100

Domains may start with a number, and the kind of mail server in question accepts 
a helo with such a domain name.
Comment 13 Justin Mason 2004-01-11 21:08:51 UTC
Created attachment 1680 [details]
impl take 3

ok, here's another go ;)

This also takes bug 2896 into account.
Comment 14 Justin Mason 2004-01-11 21:42:56 UTC
Created attachment 1681 [details]
impl 4

I'm losing it; that one doesn't even compile.
Comment 15 Malte S. Stretz 2004-01-13 16:26:56 UTC
+1 1618: It looks good, 'make' and 'make test' succeed. (That's all I tested 
though.) 
Comment 16 Niels Teglsbo 2004-01-13 17:09:20 UTC
impl 4 seems to hit the received headers in question.

The regex seems awfully general, I can't tell if it hits some received headers 
it shouldn't hit.

Also the Content Technologies regex is still very general.
Comment 17 Justin Mason 2004-01-13 23:43:27 UTC
OK, applied.  it's general, but that far down the list I think it should work
quite well.