SA Bugzilla – Bug 3216
Received header rules, multiple IP
Last modified: 2004-03-26 04:38:12 UTC
Rule offered to me from someone not on SA lists, > The rule below was sent to me by Regis Wilson. He offered me the rule, > validating whether it's generally useful, and asking me to post it if it > works. > > It works! Results of my mass-check here: > > Section 3 -- Frequencies Log > (First numeric frequencies, followed by percentage frequencies) > > OVERALL SPAM HAM S/O SCORE NAME > 119325 98981 20344 0.830 0.00 0.00 (all messages) > 9199 9198 1 0.999 0.00 3.00 SUSP_IP_RECEIVED > > OVERALL% SPAM% HAM% S/O RANK SCORE NAME > 119325 98981 20344 0.830 0.00 0.00 (all messages) > 100.000 82.9508 17.0492 0.830 0.00 0.00 (all messages as %) > 7.709 9.2927 0.0049 0.999 0.00 3.00 SUSP_IP_RECEIVED > > Matched 9.3% of all spam in my corpus, and matched only 1 ham. > > So I responded back to him, asking > RM> I'd like to not only post it, but submit it to the SpamAssassin Devs > RM> for consideration in their next release. Do you give your permission > RM> for them to include and distribute the rule with no conditions? > > His response to me, Thu, 25 Mar 2004 07:43:24 -0800 (PST), message id > <200403251543.i2PFhOsc067145@wmgnp.tempdomainname.com>, was: > > Yes, absolutely. Rule as follows: header SUSP_IP_RECEIVED Received =~ /from\s+((?:1?\d\d?|2[0-4]\d|25[0-4])\.) {3}(?:1?\d\d?|2[0-4]\d|25[0-4])\s+by\s+((?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1? \d\d?|2[0-4]\d|25[0-4])/i describe SUSP_IP_RECEIVED Received line is suspicious (from IP by IP) score SUSP_IP_RECEIVED 3.0
Bob -- I was right. ;) I think this is a duplicate of some rules Dan's been working on for the last while... but yes, it's *very* accurate.
With your earlier warning, I tried to search for Dan's work along these lines, but couldn't find it. If this is a duplicate, can be closed as such. If not, then hopefully it'll help.
Hi Bob, yes, that's a great rule. I worked on this for a while last month, the original rule came from Martin Radford in bug 2992 (L_SPAMMY_RCVD), but it evolved quite a bit after that. In SVN, the rules are now: RCVD_DOUBLE_IP_SPAM RCVD_DOUBLE_IP_LOOSE (the two don't overlap at all due to some meta usage) results for everyone in nightly corpus: 26.009 31.5196 0.0033 1.000 0.98 1.00 RCVD_DOUBLE_IP_SPAM 5.464 6.6032 0.0868 0.987 0.91 1.00 RCVD_DOUBLE_IP_LOOSE my results vs. T_SUSP_IP_RECEIVED: OVERALL% SPAM% HAM% S/O RANK SCORE NAME 29409 14430 14979 0.491 0.00 0.00 (all messages) 100.000 49.0666 50.9334 0.491 0.00 0.00 (all messages as %) 8.066 16.4380 0.0000 1.000 1.00 1.00 RCVD_DOUBLE_IP_SPAM 5.192 10.5475 0.0334 0.997 0.98 0.01 T_SUSP_IP_RECEIVED 2.149 4.3590 0.0200 0.995 0.96 1.00 RCVD_DOUBLE_IP_LOOSE Looking at the T_SUSP_IP_RECEIVED spam hits, there were 1522 spam hits which hit these related and semi-related rules (count, rule): 1522 T_SUSP_IP_RECEIVED 1522 RCVD_BY_IP 1520 RCVD_DOUBLE_IP_SPAM 232 T_RCVD_NUMERIC_HELO 231 RCVD_HELO_IP_MISMATCH 212 RCVD_NUMERIC_HELO 2 RCVD_DOUBLE_IP_LOOSE so it looks like we're pretty well covered, so I'm closing as a duplicate. It might be worth trying a variation of the RCVD_DOUBLE_IP rules just looking for actual IP addresses instead of using \d{1,3}, but I doubt that would remove a significant number of false positives. *** This bug has been marked as a duplicate of 2992 ***