Bug 6299 - [review] Update, enhance, and expand RCVD_ILLEGAL_IP
Summary: [review] Update, enhance, and expand RCVD_ILLEGAL_IP
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (Eval Tests) (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other All
: P2 normal
Target Milestone: 3.3.2
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard: ready to commit
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-21 15:56 UTC by Adam Katz
Modified: 2010-05-25 20:09 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Katz 2010-01-21 15:56:23 UTC
Forking bug 6295 as per Henrik's request.

The bogon database populating RCVD_ILLEGAL_IP is 6+ years out of date, with entries added and (more importantly) removed.

As noted in bug 6295#c4, Cymru has a nice blow-by-blow of the updates at http://www.cymru.com/Documents/bogon-list.html

For example, as I noted in bug 6295#c29, 223/8 has returned to bogon state as well.

A more dangerous example: the Cymru update list states "Removed 7/8 from bogon list due to dispute in allocation status" while the authoritative document we're using at http://www.iana.org/assignments/ipv4-address-space/ states that 007/8 is "Administered by ARIN" with legacy status.  This means that mail from the 7/8 block will be wrongly marked as illegal.

To start, I ripped this:

wget -q -O - 'http://www.iana.org/assignments/ipv4-address-space/' |perl -ne 'next unless (/^\d/ && ! /LEGACY| ALLOCATED| IANA - /); chomp; s:/8.*::; print $_ . "|"'

Then, I used its output with Mark's(?) version to hand-modify the rule:

header RCVD_ILLEGAL_IP	X-Spam-Relays-Untrusted =~ / (?:by|ip)=(?:[05]|14|23|3[1679]|4[29]|50|1(?:0[1234567]|7[679]|8[15])|22[3-9]|2[3-9]\d|[12]\d{3,}|[3-9]\d\d+)\.\d+\.\d+\.\d+ /

(Note that Henrik noted he has a cidr-to-regex script that might make this easier to maintain in the future.  I might argue that would be useful to incorporate as an eval rule ... though can't SA already handle CIDRs?)

I know we want to avoid local and private net blocks, but should we include 0/8?

I'm not versed in IPv6 or SA's handling of it, so I assume that despite the dots (rather than colons), the last two checks for too-large numbers are bounded like that for IPv6 concerns.  Otherwise, the second-to-last should merely be \d{4,} and the last should be [3-9]\d\d

(Leading zeros are truncated, right?)

Note this only observes the CIDR/8's and not the IANA page's footnoted smaller blocks:

header RCVD_TEST_NET	X-Spam-Relays-Untrusted =~ / (?:by|ip)=(?:192\.0\.2|198\.51\.100|203\.0\.113)\.\d+ /
describe RCVD_TEST_NET	Received: uses test IP address, violating RFC5737

header RCVD_LINK_LOCAL	X-Spam-Relays-Untrusted =~ / (?:by|ip)=169\.254\.d+\.\d+ /
describe RCVD_LINK_LOCAL	Received: uses link-local IP, violating RFC 3927

My understanding of the link-local block is that it is used by DHCP-driven clients that cannot find a DHCP server, allowing local-only communications (like 127/8) plus anything *directly* connected to it (that means no routable addresses are available, e.g. in an ad hoc network).

Also perhaps of interest, a rule that checks for valid octets (RCVD_ILLEGAL_IP does this on only the first octet, and we'd be able to shorten it by using this rule):

# NOTE, THIS NEEDS IPv6 HELP
header RCVD_INVALID_IP	X-Spam-Relays-Untrusted =~ / (?:by|ip)=(?!(?:(?:\.(?:1?\d?\d|2(?[0-4]\d|5[0-4])))\.(?:1?\d?\d|2(?[0-4]\d|5[0-4]))){3} )/
describe T_RCVD_INVALID_IP	Received: contains an invalidly formatted IP

In the light of adding new similar rules, I'd also like to better clarify the original rule's description:

- describe RCVD_ILLEGAL_IP Received: contains illegal IP address
+ describe RCVD_ILLEGAL_IP Received: contains reserved or unallocated IP

I'm checking the above rules (with RCVD_ILLEGAL_IP as T_KHOP_RCVD_ILLEGAL_IP) as soon as I have a bug number to name the file with.  You'll find it in my sandbox.
Comment 1 Adam Katz 2010-01-21 16:04:23 UTC
Checked into trunk/rulesrc/sandbox/khopesh/20_bug_6299.cf at r901932

Considering priority bump to address the pending FPs in the newly allocated 7.0.0.0/8 space (et al).
Comment 2 Adam Katz 2010-01-21 16:05:52 UTC
(In reply to comment #1)
> Checked into trunk/rulesrc/sandbox/khopesh/20_bug_6299.cf at r901932
> 
> Considering priority bump to address the pending FPs in the newly allocated
> 7.0.0.0/8 space (et al).

uh.  that's r901931
Comment 3 Justin Mason 2010-01-21 16:35:02 UTC
that fails lint:

lint: config: invalid regexp for rule T_RCVD_INVALID_IP: / (?:by|ip)=(?!(?:(?:\.(?:1?\d?\d|2(?[0-4]\d|5[0-4])))\.(?:1?\d?\d|2(?[0-4]\d|5[0-4]))){3} )/: Sequence (?[...) not recognized in regex; marked by <-- HERE in m/ (?:by|ip)=(?!(?:(?:\.(?:1?\d?\d|2(?[ <-- HERE 0-4]\d|5[0-4])))\.(?:1?\d?\d|2(?[0-4]\d|5[0-4]))){3} )/


commenting for now, feel free to uncomment when it's fixed ;)

: 32...; svn commit -m "comment out rule which fails lint" rulesrc/sandbox/khopesh/20_bug_6299.cf
Sending        rulesrc/sandbox/khopesh/20_bug_6299.cf
Transmitting file data .
Committed revision 901948.
Comment 4 Mark Martinec 2010-01-21 16:38:29 UTC
(In reply to comment #0)
> Then, I used its output with Mark's(?) version to hand-modify the rule:

Not mine, I just fixed it for Bug 6237 (removing 2.0.0.0/8 and 223.0.0.0/8).
Comment 5 Mark Martinec 2010-01-21 16:45:01 UTC
Btw, the newly allocated 1.0.0.0/8 and 2.0.0.0/8 will start causing
false positives for SA 3.2.* as they come into more widespread use:
  score RCVD_ILLEGAL_IP 3.199 3.196 2.902 1.908

Perhaps the new rule should be pushed into 3.2 updates.
Comment 6 John Hardin 2010-01-21 20:13:40 UTC
(In reply to comment #0)

> I know we want to avoid local and private net blocks,

We should also avoid the edge case where somebody is using a bogon subnet for their private network, and only test the last external hop.

header RCVD_ILLEGAL_IP X-Spam-Relays-Untrusted =~ /^[^\]]+ (?:by|ip)=...etc/
Comment 7 Adam Katz 2010-01-21 21:06:15 UTC
(In reply to comment #3)
> that fails lint:

oops, missed the colons.  fixed in r901999
any comment on how that rule can work with IPv6?

(In reply to comment #6)
> > I know we want to avoid local and private net blocks,
> 
> We should also avoid the edge case where somebody is using a bogon subnet for
> their private network, and only test the last external hop.
> 
> header RCVD_ILLEGAL_IP X-Spam-Relays-Untrusted =~ /^[^\]]+ (?:by|ip)=...etc/

I'm under the impression that spammers often forge extra headers to look more authentic.  Sometimes those are in private blocks, sometimes they are in arbitrarily-chosen blocks.

We can test this theory, but my inclination is to leave it stand as is and merely update the IP list.
Comment 8 Mark Martinec 2010-01-22 04:09:04 UTC
> I'm under the impression that spammers often forge extra headers to look more
> authentic.  Sometimes those are in private blocks, sometimes they are in
> arbitrarily-chosen blocks.

Yes, this is my impression too, examining a bunch of messages during recent
days that hit the rule.
Comment 9 Adam Katz 2010-01-26 15:57:08 UTC
(In reply to comment #5)
> Btw, the newly allocated 1.0.0.0/8 and 2.0.0.0/8 will start causing
> false positives for SA 3.2.* as they come into more widespread use:
>   score RCVD_ILLEGAL_IP 3.199 3.196 2.902 1.908

It also FPs on 7.0.0.0/8, as does the replacement that went into 3.3..

Looks like it's already happening.  Hege's two comparative rules (T_BUG_6295*) show the newer version (_1) has far more FPs:
http://ruleqa.spamassassin.org/?rule=/RCVD_ILLEGAL_IP

Strangely enough, my version, which is more up-to-date, has slightly more FPs than the version we're shipping.  In triple-checking my regex, I actually found 100/8 was missing from my regex.  The extremely mild increase in ham% was offset by almost double the spam% and the highest S/O and GA rank of all the variants.

> Perhaps the new rule should be pushed into 3.2 updates.

I agree.

(In reply to comment #6)
> We should also avoid the edge case where somebody is using a bogon subnet for
> their private network, and only test the last external hop.

I've added this to my sandbox to see what it does.  As per my comment #7  I suspect it reduces both ham and spam too much to be worthwhile.
Comment 10 Mark Martinec 2010-01-29 11:39:04 UTC
Reassigning to 3.3.1 before it is forgotten and gone for good.
Comment 11 Adam Katz 2010-01-29 12:42:40 UTC
(In reply to comment #8)
> > I'm under the impression that spammers often forge extra headers to look more
> > authentic.  Sometimes those are in private blocks, sometimes they are in
> > arbitrarily-chosen blocks.
> 
> Yes, this is my impression too, examining a bunch of messages during recent
> days that hit the rule.

Verified.  Restricting this rule to last-external actually resulted in zero hits in the masscheck, as evidenced by comparing T_KHOP_RCVD_ILLEGAL_IP_LE, which is an exact copy of T_KHOP_RCVD_ILLEGAL_IP with the limitation suggested by comment #6


Today's numbers are more of a mixed bag; the spam% of the updated rule is almost exactly double the current rule, but the ham hits moved from 3 (0.0010%) to 34 (0.0111%), as contrasted to the 20100126 results which showed only ONE more ham hit on the updated rule.

Clearly, there are internal networks that allocate (ex-)bogon spaces rather than using the reserved private network allocations of.  Maybe hitting this rule will help network administrators realize this.

I still think this is worth pushing forward, both on trunk and on each supported branch.
Comment 12 Adam Katz 2010-02-05 17:48:07 UTC
I am folding RCVD_TEST_NET and RCVD_LINK_LOCAL into the proposed RCVD_ILLEGAL_IP despite their having no hits since they are fully reserved IPs; they're only not on the IANA list as anything but footnotes because the list contains only /8 networks.

I finally did the research on how SA handles IPv6 for this.  No conflict and no reason for checking things like 299.1.2.3 or 300.1.2.3 or 1234.5.6.7 as the parser will refuse to put those into the pseudo-header.  I removed those aspects of the expression as well as the trailing octets (which made it far simpler to add the non-/8 reserved networks).

In a few days, I'll push this bug to review.  Just want to let that merging get tested a little more and for more opportunity for people to voice opinions.
Comment 13 Adam Katz 2010-03-03 17:44:30 UTC
some results:

http://ruleqa.spamassassin.org/20100302-r917919-n/%2FRCVD_ILLEGAL

  SPAM%     HAM%     S/O    RANK   SCORE  NAME
 0.2855   0.0004   0.999    0.66    3.40  RCVD_ILLEGAL_IP  
 0.6228   0.0838   0.881    0.63    0.01  T_KHOP_RCVD_ILLEGAL_IP  
      0        0   0.500    0.49    0.01  T_KHOP_RCVD_ILLEGAL_IP_LE  

What do we think?

I think this paints a picture that lots of us wanted to ignore; the reserved space is being used as a private space more commonly than we'd like for this rule.  I still think it's worth updating, perhaps even for 3.3.1.

Shall I put this up for review, or is that too much clutter during the release process? (I've been out of contact for a while, not quite sure how far along we are.)
Comment 14 Justin Mason 2010-03-23 16:33:37 UTC
moving all open 3.3.1 bugs to 3.3.2
Comment 15 Karsten Bräckelmann 2010-03-23 17:42:43 UTC
Moving back off of Security, which got changed by accident during the mass Target Milestone move.
Comment 16 Justin Mason 2010-03-24 11:46:51 UTC
btw, I just came across this message:

'The numerous Team Cymru bogon projects have been updated as of 12 FEB
2010 to reflect the following IANA allocation made on 11 FEB 2010:

50/8 ARIN 2010-02 whois.arin.net ALLOCATED 

107/8 ARIN 2009-04 whois.arin.net ALLOCATED'

I think this is newer than the last update....
Comment 17 Adam Katz 2010-03-27 00:20:30 UTC
(In reply to comment #16)
> btw, I just came across this message:
> 
> 'The numerous Team Cymru bogon projects have been updated as of 12 FEB
> 2010 to reflect the following IANA allocation made on 11 FEB 2010:
> 
> 50/8 ARIN 2010-02 whois.arin.net ALLOCATED 
> 
> 107/8 ARIN 2009-04 whois.arin.net ALLOCATED'
> 
> I think this is newer than the last update....

Thanks Justin.  This is fixed in r928025
pushing to review.
+1, two more needed.
Comment 18 Justin Mason 2010-04-21 11:48:05 UTC
+1
Comment 19 Kevin A. McGrail 2010-05-25 17:54:11 UTC
(In reply to comment #18)
> +1

+1
Comment 20 Adam Katz 2010-05-25 19:53:21 UTC
% svn commit -m "update, enhance, and expand RCVD_ILLEGAL_IP from bug 6299 and rulesrc/sandbox/khopesh/20_bug_6299.cf" rulesrc/sandbox/khopesh/20_bug_6299.cf rules/20_head_tests.cf
Sending        rules/20_head_tests.cf
Deleting       rulesrc/sandbox/khopesh/20_bug_6299.cf
Transmitting file data .
Committed revision 948259.

Done, closing.
Comment 21 Adam Katz 2010-05-25 20:09:03 UTC
Also committed on branch:

% svn commit -m "(3.3 branch) update, enhance, and expand RCVD_ILLEGAL_IP from bug 6299" rules/20_head_tests.cf
Sending        rules/20_head_tests.cf
Transmitting file data .
Committed revision 948261.
%