Bug 1455 - Parse IP addresses correctly
Summary: Parse IP addresses correctly
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P2 normal
Target Milestone: 2.60
Assignee: Justin Mason
URL:
Whiteboard:
Keywords: backport
Depends on:
Blocks: 1403 1740
  Show dependency tree
 
Reported: 2003-02-07 14:23 UTC by Theo Van Dinter
Modified: 2003-04-07 08:13 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Start on FORGED_RCVD_TRAIL modified version with better Received-matching regexps - very rough! text/plain None Allen Smith [NoCLA]
just a better IP-address regexp patch None Justin Mason [HasCLA]
apply on top of prev patch; fix as suggested by Theo patch None Justin Mason [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Theo Van Dinter 2003-02-07 14:23:49 UTC
I mentioned this on sa-dev a while ago while fixing a related bug, but I don't remember if there's a tracking bug for it, so ...

Basically, in EvalTests, the $IP_ADDRESS RE does match IP addresses, but ($IP_ADDRESS) will capture the leading '[' (ie: "[1.2.3.4").  This may or may not make things work incorrectly.

Either the RE needs to be fixed, or what I'd like is a function to get the IPs that makes sure what is returned are _just_ the IPs.
Comment 1 Daniel Quinlan 2003-02-07 19:52:51 UTC
I'll take this bug.

I'm going to revise the entire Received: header parsing for 2.60, extract the
IP addresses the right way and also identify which headers we should be able
to trust and which are not trustable.  I'm also talking to the bondedsender
guys about a new header to better identify trusted Received headers (their
idea).
Comment 2 Allen Smith 2003-02-08 15:09:13 UTC
Subject: Re: [SAdev]  Parse IP addresses correctly

>quinlan@pathname.com changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>                 CC|                            |spamassassin-
>                   |                            |devel@lists.sourceforge.net
>         AssignedTo|spamassassin-               |quinlan@pathname.com
>                   |devel@lists.sourceforge.net |
>
>
>
>------- Additional Comments From quinlan@pathname.com  2003-02-07 19:52
>I'll take this bug.

Thank you.

>I'm going to revise the entire Received: header parsing for 2.60,

I'll attach the beginning of some code for this, which I was testing out in
regard to FORGED_RCVD_TRAIL.

>extract the IP addresses the right way

There are some portions of my DNS work that was attempting to do this, both
via improved regexes and via a subroutine (to account for that one can have
more than one IP address - or at least more than one thing that looks like
an IP address - per Received line, which the current DNS code doesn't, BTW):

$IP_ADDRESS = qr/(?:\b|\D)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?:\b|\D)/;
$OCT1 = '(?:2[0-4]\d|25[0-5]|[0-1]?\d\d|\d)';
$IP_ADDRESS2 = '(?<![0-9\.])' . join('\.',$OCT1,$OCT1,$OCT1,$OCT1) .
 '(?![0-9\.])';
$IP_ADDRESS2 = qr/$IP_ADDRESS2/;
$IP_ADDRESS3 = '(?<![-\w:\.])' .join('\.',$OCT1,$OCT1,$OCT1,$OCT1) .
 '(?![-\w:\.])';
$IP_ADDRESS3 = qr/$IP_ADDRESS3/;
$OCT2 = '(2[0-4]\d|25[0-5]|[0-1]?\d\d|\d)';
$IP_ADDRESS4 = '(?<![0-9\.])' . join('\.',$OCT2,$OCT2,$OCT2,$OCT2) .
 '(?![0-9\.])';

The subroutine is _get_rcvd_ips in EvalTests.pm.

>and also identify which headers we should be able to trust and which are
>not trustable.

Given that this will be involving database work, and (as per the Win32 DB
stuff) you're more experienced at that than I am (especially in interacting
with spamd et al), good. (It also looks like that I'll be doing more
concentration on the GA.) BTW, that speed problem with my DNS code? Mainly
due to that I had a whole bunch of IPs being tested that wouldn't normally
be, specifically for seeing what the optimal IP addresses to test were. With
a few other revisions, it's currently down to 1.4* the CPU usage of the old
code (run sans Razor and with DNS servers being used other than
dogberry.rutgers.edu). I'll try to work on some code cleanup and post the
result.

>I'm also talking to the bondedsender guys about a new header to better
>identify trusted Received headers (their idea).

Interesting! Making sure that RCVD_IN_BONDEDSENDER is _not_ allowed to
trigger for FORGED_RCVD_TRAIL (if made adequately unlikely to FP), anything
with an open proxy spotted in the Received headers (or, say, with
RECEIVED_IDENT_SQUID), etcetera, would also improve accuracy versus spammer
forgeries.

	-Allen

Comment 3 Allen Smith 2003-02-08 15:13:16 UTC
Created attachment 642 [details]
Start on FORGED_RCVD_TRAIL modified version with better Received-matching regexps - very rough!
Comment 4 Justin Mason 2003-04-05 13:31:51 UTC
OK, I've taken this and added a new module, Received.pm, in charge of correct
rcvd-hdr parsing.  Now in HEAD.
Comment 5 Justin Mason 2003-04-05 14:58:20 UTC
Created attachment 865 [details]
just a better IP-address regexp
Comment 6 Justin Mason 2003-04-05 15:00:21 UTC
ok, I *just* fixed the regexp.  This is very simple and should be backported to
2.53, then we can close this bug; the rest of the stuff should move to a new bug
to track the Received.pm DNS revisions.
Comment 7 Theo Van Dinter 2003-04-06 13:03:33 UTC
ISSUE: if we're going to do this, we may as well do the 2xx ones correctly. 
instead of '2\d\d' we should do '2[0-4]\d|25[0-5]'.
Comment 8 Justin Mason 2003-04-07 16:09:59 UTC
Created attachment 870 [details]
apply on top of prev patch; fix as suggested by Theo
Comment 9 Justin Mason 2003-04-07 16:13:44 UTC
moved patches to 1455; everything else in this bug is fixed. closing