SA Bugzilla – Bug 2787
[review] Type of received line misunderstood
Last modified: 2004-01-13 14:43:27 UTC
I receive mails with a received line that looks like this: Received: from heloname (212.54.64.170) by cpmail.dk.tiscali.com (6.7.018) id 3FC619BA30000383B for user@example.com; Fri, 28 Nov 2003 01:25:17 +0100 Unfortunately the received line is matched by this rule in Received.pm: # Received: from scv3.apple.com (scv3.apple.com) by mailgate2.apple.com (Content Technologies SMTPRS 4.2.1) with ESMTP id <T61095998e1118164e13f8@mailgate2.apple.com>; Mon, 17 Mar 2003 17:04:54 -0800 if (/^from (\S+) \((\S+)\) by (\S+) \(/) { $helo = $1; $rdns = $2; $by = $3; goto enough; } And that rule doesn't assign the IP address correctly (or at all). I have added a new rule above that rule to catch "my" type of received lines: # Received: from heloname (212.54.64.170) by cpmail.dk.tiscali.com (6.7.018) id 3FC619A30000383B for user@example.com; Fri, 28 Nov 2003 01:25:17 +0100 if (/^from (\S+) \((${IP_ADDRESS})\) by (\S+) \(\S+\) id \S+ for \S+/) { $helo = $1; $ip = $2; $by = $3; goto enough; } As far as I can tell this works. I don't know if you would also want to narrow down the regular expression for the Content Technologies received header. Best regards Niels Teglsbo
Created attachment 1585 [details] Insert my received rule above the other This patch inserts my rule above the other rule in Received.pm
Created attachment 1603 [details] replacement patch Good point, it does need that. However, we already have a regexp to match it. oops! So this (replacement) patch basically moves that existing line above the Content Technologies line.
+1 makes sense
+1 sure. merry christmas. :)
applied -- merry christmas yerself! ;) (hope you're not too snowed in over there)
I am sorry to report that I have problems with SA 2.61 and received lines from the same mail server: Received: from dyn-81-166-39-132.ppp.tiscali.fr (81.166.39.132) by cpmail.dk. tiscali.com (6.7.018) id 3FE6899B004FE7A4; Thu, 1 Jan 2004 05:28:49 +0100 The line is not caught by the replacement patch's regex: /^from (\S+) \((${IP_ADDRESS})\) by (\S+) with / because there is no "with" in the line. Also my own patch would not have caught it either because there is no "for user@example.com" in this instance of the header. So I will propose a path similar to my first one, that will make a special rule for that kind of mail server. Comments on Received.pm in general: The Content Technologies rule is very broad and doesn't even assign $ip. When control reaches the enough label the parser will just return because there is no $ip defined - without any debugging information. Also the tests for defined $ip above the call to extract_ipv4_addr_from_string are bogus, the $ip variable is initialized with my $ip = ''; So it is always defined below that, the tests should be $ip eq '' And you might want to add tests just after the enough label, maybe something like: if ($ip eq '') { dbg ("received-header: known format, unknown IP address: $_"); } At least that would have made it easier for me to track down the bug. Another nice thing would have been easy identification of the rule that matched the header, that might be done simply by a dbg call for every rule. Also some regression testing for header lines would be nice, it's pretty hard to tell if you are ruining some other rule when you add or modify a rule.
Created attachment 1648 [details] Insert my new rule above the Content Technologies rule
moving to 2.62, haven't looked into the issue yet.
Created attachment 1663 [details] implementation take 2 Argh. Thanks for spotting that -- again :( The attached patch fixes the previous regexp to support that mailserver style (correctly this time!) -- and adds debugging along the lines of what you suggest, because it makes sense.
+1 looks like it's ok.
It should be noted that "implementation take 2" is incompatible with the patch proposed for bug #2896 (which I think was made in response to my comments in this bug). The patch for #2896 makes $ip undefined, and "implementation take 2" tests if $ip eq ''. So only one of them should be applied, or one of them should be changed.
With the suggested regex: /^from ([^\d]\S+) \((${IP_ADDRESS})\) by (\S+) / You won't catch this actual (but anonymized) header: Received: from 2mail.example.org (180.136.3.16) by cpmail.dk.tiscali.com (6.7. 018) id 3FE6899B00861222 for example@example.dk; Thu, 8 Jan 2004 18:37:02 +0100 Domains may start with a number, and the kind of mail server in question accepts a helo with such a domain name.
Created attachment 1680 [details] impl take 3 ok, here's another go ;) This also takes bug 2896 into account.
Created attachment 1681 [details] impl 4 I'm losing it; that one doesn't even compile.
+1 1618: It looks good, 'make' and 'make test' succeed. (That's all I tested though.)
impl 4 seems to hit the received headers in question. The regex seems awfully general, I can't tell if it hits some received headers it shouldn't hit. Also the Content Technologies regex is still very general.
OK, applied. it's general, but that far down the list I think it should work quite well.