SA Bugzilla – Bug 1455
Parse IP addresses correctly
Last modified: 2003-04-07 08:13:44 UTC
I mentioned this on sa-dev a while ago while fixing a related bug, but I don't remember if there's a tracking bug for it, so ... Basically, in EvalTests, the $IP_ADDRESS RE does match IP addresses, but ($IP_ADDRESS) will capture the leading '[' (ie: "[1.2.3.4"). This may or may not make things work incorrectly. Either the RE needs to be fixed, or what I'd like is a function to get the IPs that makes sure what is returned are _just_ the IPs.
I'll take this bug. I'm going to revise the entire Received: header parsing for 2.60, extract the IP addresses the right way and also identify which headers we should be able to trust and which are not trustable. I'm also talking to the bondedsender guys about a new header to better identify trusted Received headers (their idea).
Subject: Re: [SAdev] Parse IP addresses correctly >quinlan@pathname.com changed: > > What |Removed |Added >---------------------------------------------------------------------------- > CC| |spamassassin- > | |devel@lists.sourceforge.net > AssignedTo|spamassassin- |quinlan@pathname.com > |devel@lists.sourceforge.net | > > > >------- Additional Comments From quinlan@pathname.com 2003-02-07 19:52 >I'll take this bug. Thank you. >I'm going to revise the entire Received: header parsing for 2.60, I'll attach the beginning of some code for this, which I was testing out in regard to FORGED_RCVD_TRAIL. >extract the IP addresses the right way There are some portions of my DNS work that was attempting to do this, both via improved regexes and via a subroutine (to account for that one can have more than one IP address - or at least more than one thing that looks like an IP address - per Received line, which the current DNS code doesn't, BTW): $IP_ADDRESS = qr/(?:\b|\D)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?:\b|\D)/; $OCT1 = '(?:2[0-4]\d|25[0-5]|[0-1]?\d\d|\d)'; $IP_ADDRESS2 = '(?<![0-9\.])' . join('\.',$OCT1,$OCT1,$OCT1,$OCT1) . '(?![0-9\.])'; $IP_ADDRESS2 = qr/$IP_ADDRESS2/; $IP_ADDRESS3 = '(?<![-\w:\.])' .join('\.',$OCT1,$OCT1,$OCT1,$OCT1) . '(?![-\w:\.])'; $IP_ADDRESS3 = qr/$IP_ADDRESS3/; $OCT2 = '(2[0-4]\d|25[0-5]|[0-1]?\d\d|\d)'; $IP_ADDRESS4 = '(?<![0-9\.])' . join('\.',$OCT2,$OCT2,$OCT2,$OCT2) . '(?![0-9\.])'; The subroutine is _get_rcvd_ips in EvalTests.pm. >and also identify which headers we should be able to trust and which are >not trustable. Given that this will be involving database work, and (as per the Win32 DB stuff) you're more experienced at that than I am (especially in interacting with spamd et al), good. (It also looks like that I'll be doing more concentration on the GA.) BTW, that speed problem with my DNS code? Mainly due to that I had a whole bunch of IPs being tested that wouldn't normally be, specifically for seeing what the optimal IP addresses to test were. With a few other revisions, it's currently down to 1.4* the CPU usage of the old code (run sans Razor and with DNS servers being used other than dogberry.rutgers.edu). I'll try to work on some code cleanup and post the result. >I'm also talking to the bondedsender guys about a new header to better >identify trusted Received headers (their idea). Interesting! Making sure that RCVD_IN_BONDEDSENDER is _not_ allowed to trigger for FORGED_RCVD_TRAIL (if made adequately unlikely to FP), anything with an open proxy spotted in the Received headers (or, say, with RECEIVED_IDENT_SQUID), etcetera, would also improve accuracy versus spammer forgeries. -Allen
Created attachment 642 [details] Start on FORGED_RCVD_TRAIL modified version with better Received-matching regexps - very rough!
OK, I've taken this and added a new module, Received.pm, in charge of correct rcvd-hdr parsing. Now in HEAD.
Created attachment 865 [details] just a better IP-address regexp
ok, I *just* fixed the regexp. This is very simple and should be backported to 2.53, then we can close this bug; the rest of the stuff should move to a new bug to track the Received.pm DNS revisions.
ISSUE: if we're going to do this, we may as well do the 2xx ones correctly. instead of '2\d\d' we should do '2[0-4]\d|25[0-5]'.
Created attachment 870 [details] apply on top of prev patch; fix as suggested by Theo
moved patches to 1455; everything else in this bug is fixed. closing