Bug 3216 - Received header rules, multiple IP
Summary: Received header rules, multiple IP
Status: RESOLVED DUPLICATE of bug 2992
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 2.63
Hardware: Other other
: P5 normal
Target Milestone: 3.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-03-25 18:18 UTC by Bob Menschel
Modified: 2004-03-26 04:38 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Bob Menschel 2004-03-25 18:18:50 UTC
Rule offered to me from someone not on SA lists, 

> The rule below was sent to me by Regis Wilson. He offered me the rule,
> validating whether it's generally useful, and asking me to post it if it
> works.
> 
> It works!  Results of my mass-check here:
> 
> Section 3 -- Frequencies Log
> (First numeric frequencies, followed by percentage frequencies)
> 
> OVERALL     SPAM      HAM     S/O   SCORE  NAME
>  119325    98981    20344    0.830   0.00    0.00  (all messages)
>    9199     9198        1    0.999   0.00   3.00  SUSP_IP_RECEIVED
> 
> OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
>  119325    98981    20344    0.830   0.00    0.00  (all messages)
> 100.000  82.9508  17.0492    0.830   0.00    0.00  (all messages as %)
>   7.709   9.2927   0.0049    0.999   0.00    3.00  SUSP_IP_RECEIVED
> 
> Matched 9.3% of all spam in my corpus, and matched only 1 ham.
> 
> So I responded back to him, asking
> RM> I'd like to not only post it, but submit it to the SpamAssassin Devs
> RM> for consideration in their next release. Do you give your permission
> RM> for them to include and distribute the rule with no conditions?
> 
> His response to me, Thu, 25 Mar 2004 07:43:24 -0800 (PST), message id
> <200403251543.i2PFhOsc067145@wmgnp.tempdomainname.com>, was:
> > Yes, absolutely.

Rule as follows: 

header   SUSP_IP_RECEIVED  Received =~ /from\s+((?:1?\d\d?|2[0-4]\d|25[0-4])\.)
{3}(?:1?\d\d?|2[0-4]\d|25[0-4])\s+by\s+((?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?
\d\d?|2[0-4]\d|25[0-4])/i
describe SUSP_IP_RECEIVED  Received line is suspicious (from IP by IP)
score    SUSP_IP_RECEIVED  3.0
Comment 1 Justin Mason 2004-03-25 19:06:25 UTC
Bob -- I was right. ;)  I think this is a duplicate of some rules Dan's been
working on for the last while... but yes, it's *very* accurate.
Comment 2 Bob Menschel 2004-03-25 19:11:37 UTC
With your earlier warning, I tried to search for Dan's work along these lines, 
but couldn't find it.  If this is a duplicate, can be closed as such. If not, 
then hopefully it'll help. 
Comment 3 Daniel Quinlan 2004-03-26 13:38:12 UTC
Hi Bob, yes, that's a great rule.  I worked on this for a while last month,
the original rule came from Martin Radford in bug 2992 (L_SPAMMY_RCVD), but
it evolved quite a bit after that.

In SVN, the rules are now:

  RCVD_DOUBLE_IP_SPAM
  RCVD_DOUBLE_IP_LOOSE

(the two don't overlap at all due to some meta usage)

results for everyone in nightly corpus:

 26.009  31.5196   0.0033    1.000   0.98    1.00  RCVD_DOUBLE_IP_SPAM
  5.464   6.6032   0.0868    0.987   0.91    1.00  RCVD_DOUBLE_IP_LOOSE

my results vs. T_SUSP_IP_RECEIVED:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  29409    14430    14979    0.491   0.00    0.00  (all messages)
100.000  49.0666  50.9334    0.491   0.00    0.00  (all messages as %)
  8.066  16.4380   0.0000    1.000   1.00    1.00  RCVD_DOUBLE_IP_SPAM
  5.192  10.5475   0.0334    0.997   0.98    0.01  T_SUSP_IP_RECEIVED
  2.149   4.3590   0.0200    0.995   0.96    1.00  RCVD_DOUBLE_IP_LOOSE

Looking at the T_SUSP_IP_RECEIVED spam hits, there were 1522 spam hits
which hit these related and semi-related rules (count, rule):

  1522    T_SUSP_IP_RECEIVED
  1522    RCVD_BY_IP
  1520    RCVD_DOUBLE_IP_SPAM
  232     T_RCVD_NUMERIC_HELO
  231     RCVD_HELO_IP_MISMATCH
  212     RCVD_NUMERIC_HELO
  2       RCVD_DOUBLE_IP_LOOSE

so it looks like we're pretty well covered, so I'm closing as a
duplicate.

It might be worth trying a variation of the RCVD_DOUBLE_IP rules just
looking for actual IP addresses instead of using \d{1,3}, but I doubt
that would remove a significant number of false positives.


*** This bug has been marked as a duplicate of 2992 ***