SA Bugzilla – Bug 6299
[review] Update, enhance, and expand RCVD_ILLEGAL_IP
Last modified: 2010-05-25 20:09:03 UTC
Forking bug 6295 as per Henrik's request. The bogon database populating RCVD_ILLEGAL_IP is 6+ years out of date, with entries added and (more importantly) removed. As noted in bug 6295#c4, Cymru has a nice blow-by-blow of the updates at http://www.cymru.com/Documents/bogon-list.html For example, as I noted in bug 6295#c29, 223/8 has returned to bogon state as well. A more dangerous example: the Cymru update list states "Removed 7/8 from bogon list due to dispute in allocation status" while the authoritative document we're using at http://www.iana.org/assignments/ipv4-address-space/ states that 007/8 is "Administered by ARIN" with legacy status. This means that mail from the 7/8 block will be wrongly marked as illegal. To start, I ripped this: wget -q -O - 'http://www.iana.org/assignments/ipv4-address-space/' |perl -ne 'next unless (/^\d/ && ! /LEGACY| ALLOCATED| IANA - /); chomp; s:/8.*::; print $_ . "|"' Then, I used its output with Mark's(?) version to hand-modify the rule: header RCVD_ILLEGAL_IP X-Spam-Relays-Untrusted =~ / (?:by|ip)=(?:[05]|14|23|3[1679]|4[29]|50|1(?:0[1234567]|7[679]|8[15])|22[3-9]|2[3-9]\d|[12]\d{3,}|[3-9]\d\d+)\.\d+\.\d+\.\d+ / (Note that Henrik noted he has a cidr-to-regex script that might make this easier to maintain in the future. I might argue that would be useful to incorporate as an eval rule ... though can't SA already handle CIDRs?) I know we want to avoid local and private net blocks, but should we include 0/8? I'm not versed in IPv6 or SA's handling of it, so I assume that despite the dots (rather than colons), the last two checks for too-large numbers are bounded like that for IPv6 concerns. Otherwise, the second-to-last should merely be \d{4,} and the last should be [3-9]\d\d (Leading zeros are truncated, right?) Note this only observes the CIDR/8's and not the IANA page's footnoted smaller blocks: header RCVD_TEST_NET X-Spam-Relays-Untrusted =~ / (?:by|ip)=(?:192\.0\.2|198\.51\.100|203\.0\.113)\.\d+ / describe RCVD_TEST_NET Received: uses test IP address, violating RFC5737 header RCVD_LINK_LOCAL X-Spam-Relays-Untrusted =~ / (?:by|ip)=169\.254\.d+\.\d+ / describe RCVD_LINK_LOCAL Received: uses link-local IP, violating RFC 3927 My understanding of the link-local block is that it is used by DHCP-driven clients that cannot find a DHCP server, allowing local-only communications (like 127/8) plus anything *directly* connected to it (that means no routable addresses are available, e.g. in an ad hoc network). Also perhaps of interest, a rule that checks for valid octets (RCVD_ILLEGAL_IP does this on only the first octet, and we'd be able to shorten it by using this rule): # NOTE, THIS NEEDS IPv6 HELP header RCVD_INVALID_IP X-Spam-Relays-Untrusted =~ / (?:by|ip)=(?!(?:(?:\.(?:1?\d?\d|2(?[0-4]\d|5[0-4])))\.(?:1?\d?\d|2(?[0-4]\d|5[0-4]))){3} )/ describe T_RCVD_INVALID_IP Received: contains an invalidly formatted IP In the light of adding new similar rules, I'd also like to better clarify the original rule's description: - describe RCVD_ILLEGAL_IP Received: contains illegal IP address + describe RCVD_ILLEGAL_IP Received: contains reserved or unallocated IP I'm checking the above rules (with RCVD_ILLEGAL_IP as T_KHOP_RCVD_ILLEGAL_IP) as soon as I have a bug number to name the file with. You'll find it in my sandbox.
Checked into trunk/rulesrc/sandbox/khopesh/20_bug_6299.cf at r901932 Considering priority bump to address the pending FPs in the newly allocated 7.0.0.0/8 space (et al).
(In reply to comment #1) > Checked into trunk/rulesrc/sandbox/khopesh/20_bug_6299.cf at r901932 > > Considering priority bump to address the pending FPs in the newly allocated > 7.0.0.0/8 space (et al). uh. that's r901931
that fails lint: lint: config: invalid regexp for rule T_RCVD_INVALID_IP: / (?:by|ip)=(?!(?:(?:\.(?:1?\d?\d|2(?[0-4]\d|5[0-4])))\.(?:1?\d?\d|2(?[0-4]\d|5[0-4]))){3} )/: Sequence (?[...) not recognized in regex; marked by <-- HERE in m/ (?:by|ip)=(?!(?:(?:\.(?:1?\d?\d|2(?[ <-- HERE 0-4]\d|5[0-4])))\.(?:1?\d?\d|2(?[0-4]\d|5[0-4]))){3} )/ commenting for now, feel free to uncomment when it's fixed ;) : 32...; svn commit -m "comment out rule which fails lint" rulesrc/sandbox/khopesh/20_bug_6299.cf Sending rulesrc/sandbox/khopesh/20_bug_6299.cf Transmitting file data . Committed revision 901948.
(In reply to comment #0) > Then, I used its output with Mark's(?) version to hand-modify the rule: Not mine, I just fixed it for Bug 6237 (removing 2.0.0.0/8 and 223.0.0.0/8).
Btw, the newly allocated 1.0.0.0/8 and 2.0.0.0/8 will start causing false positives for SA 3.2.* as they come into more widespread use: score RCVD_ILLEGAL_IP 3.199 3.196 2.902 1.908 Perhaps the new rule should be pushed into 3.2 updates.
(In reply to comment #0) > I know we want to avoid local and private net blocks, We should also avoid the edge case where somebody is using a bogon subnet for their private network, and only test the last external hop. header RCVD_ILLEGAL_IP X-Spam-Relays-Untrusted =~ /^[^\]]+ (?:by|ip)=...etc/
(In reply to comment #3) > that fails lint: oops, missed the colons. fixed in r901999 any comment on how that rule can work with IPv6? (In reply to comment #6) > > I know we want to avoid local and private net blocks, > > We should also avoid the edge case where somebody is using a bogon subnet for > their private network, and only test the last external hop. > > header RCVD_ILLEGAL_IP X-Spam-Relays-Untrusted =~ /^[^\]]+ (?:by|ip)=...etc/ I'm under the impression that spammers often forge extra headers to look more authentic. Sometimes those are in private blocks, sometimes they are in arbitrarily-chosen blocks. We can test this theory, but my inclination is to leave it stand as is and merely update the IP list.
> I'm under the impression that spammers often forge extra headers to look more > authentic. Sometimes those are in private blocks, sometimes they are in > arbitrarily-chosen blocks. Yes, this is my impression too, examining a bunch of messages during recent days that hit the rule.
(In reply to comment #5) > Btw, the newly allocated 1.0.0.0/8 and 2.0.0.0/8 will start causing > false positives for SA 3.2.* as they come into more widespread use: > score RCVD_ILLEGAL_IP 3.199 3.196 2.902 1.908 It also FPs on 7.0.0.0/8, as does the replacement that went into 3.3.. Looks like it's already happening. Hege's two comparative rules (T_BUG_6295*) show the newer version (_1) has far more FPs: http://ruleqa.spamassassin.org/?rule=/RCVD_ILLEGAL_IP Strangely enough, my version, which is more up-to-date, has slightly more FPs than the version we're shipping. In triple-checking my regex, I actually found 100/8 was missing from my regex. The extremely mild increase in ham% was offset by almost double the spam% and the highest S/O and GA rank of all the variants. > Perhaps the new rule should be pushed into 3.2 updates. I agree. (In reply to comment #6) > We should also avoid the edge case where somebody is using a bogon subnet for > their private network, and only test the last external hop. I've added this to my sandbox to see what it does. As per my comment #7 I suspect it reduces both ham and spam too much to be worthwhile.
Reassigning to 3.3.1 before it is forgotten and gone for good.
(In reply to comment #8) > > I'm under the impression that spammers often forge extra headers to look more > > authentic. Sometimes those are in private blocks, sometimes they are in > > arbitrarily-chosen blocks. > > Yes, this is my impression too, examining a bunch of messages during recent > days that hit the rule. Verified. Restricting this rule to last-external actually resulted in zero hits in the masscheck, as evidenced by comparing T_KHOP_RCVD_ILLEGAL_IP_LE, which is an exact copy of T_KHOP_RCVD_ILLEGAL_IP with the limitation suggested by comment #6 Today's numbers are more of a mixed bag; the spam% of the updated rule is almost exactly double the current rule, but the ham hits moved from 3 (0.0010%) to 34 (0.0111%), as contrasted to the 20100126 results which showed only ONE more ham hit on the updated rule. Clearly, there are internal networks that allocate (ex-)bogon spaces rather than using the reserved private network allocations of. Maybe hitting this rule will help network administrators realize this. I still think this is worth pushing forward, both on trunk and on each supported branch.
I am folding RCVD_TEST_NET and RCVD_LINK_LOCAL into the proposed RCVD_ILLEGAL_IP despite their having no hits since they are fully reserved IPs; they're only not on the IANA list as anything but footnotes because the list contains only /8 networks. I finally did the research on how SA handles IPv6 for this. No conflict and no reason for checking things like 299.1.2.3 or 300.1.2.3 or 1234.5.6.7 as the parser will refuse to put those into the pseudo-header. I removed those aspects of the expression as well as the trailing octets (which made it far simpler to add the non-/8 reserved networks). In a few days, I'll push this bug to review. Just want to let that merging get tested a little more and for more opportunity for people to voice opinions.
some results: http://ruleqa.spamassassin.org/20100302-r917919-n/%2FRCVD_ILLEGAL SPAM% HAM% S/O RANK SCORE NAME 0.2855 0.0004 0.999 0.66 3.40 RCVD_ILLEGAL_IP 0.6228 0.0838 0.881 0.63 0.01 T_KHOP_RCVD_ILLEGAL_IP 0 0 0.500 0.49 0.01 T_KHOP_RCVD_ILLEGAL_IP_LE What do we think? I think this paints a picture that lots of us wanted to ignore; the reserved space is being used as a private space more commonly than we'd like for this rule. I still think it's worth updating, perhaps even for 3.3.1. Shall I put this up for review, or is that too much clutter during the release process? (I've been out of contact for a while, not quite sure how far along we are.)
moving all open 3.3.1 bugs to 3.3.2
Moving back off of Security, which got changed by accident during the mass Target Milestone move.
btw, I just came across this message: 'The numerous Team Cymru bogon projects have been updated as of 12 FEB 2010 to reflect the following IANA allocation made on 11 FEB 2010: 50/8 ARIN 2010-02 whois.arin.net ALLOCATED 107/8 ARIN 2009-04 whois.arin.net ALLOCATED' I think this is newer than the last update....
(In reply to comment #16) > btw, I just came across this message: > > 'The numerous Team Cymru bogon projects have been updated as of 12 FEB > 2010 to reflect the following IANA allocation made on 11 FEB 2010: > > 50/8 ARIN 2010-02 whois.arin.net ALLOCATED > > 107/8 ARIN 2009-04 whois.arin.net ALLOCATED' > > I think this is newer than the last update.... Thanks Justin. This is fixed in r928025 pushing to review. +1, two more needed.
+1
(In reply to comment #18) > +1 +1
% svn commit -m "update, enhance, and expand RCVD_ILLEGAL_IP from bug 6299 and rulesrc/sandbox/khopesh/20_bug_6299.cf" rulesrc/sandbox/khopesh/20_bug_6299.cf rules/20_head_tests.cf Sending rules/20_head_tests.cf Deleting rulesrc/sandbox/khopesh/20_bug_6299.cf Transmitting file data . Committed revision 948259. Done, closing.
Also committed on branch: % svn commit -m "(3.3 branch) update, enhance, and expand RCVD_ILLEGAL_IP from bug 6299" rules/20_head_tests.cf Sending rules/20_head_tests.cf Transmitting file data . Committed revision 948261. %