Bug 370 - Better RBL handling
Summary: Better RBL handling
Status: RESOLVED DUPLICATE of bug 399
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P2 normal
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-05-31 00:28 UTC by Marc MERLIN
Modified: 2002-06-09 15:34 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
patch file patch None Marc MERLIN [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Marc MERLIN 2002-05-31 00:28:21 UTC
As promised, here's the patch in bugzilla, along with the announcement Email

So, why did I have to write this patch?                                         
                                                                                
1) I use MAPS DUL and relays.osirusoft.com, which also has a DUL section.       
   The problem is that I had machines that were being penalized twice for       
   being on more than one DUL.                                                  
   This actually let to some non SPAM being reported as spam more than once     
   (and my threshold is 7, not even 5)                                          
                                                                                
2) Furthermore, we shouldn't overly penalize people because they're sending     
   mail from a dialup IP if they properly relayed through their ISP             
                                                                                
3) There was no support for querying multi RBL zones like relays.osirusoft.com  
   Let's say  an IP  is flagged  with a  score of  2.0 as  an open  relay in    
   orbs. Because there  is already a  match for set relay,  osirusoft checks    
   won't run against it, even if you had a match (2.0) plus a return code of    
   # 127.0.0.6 which would have given you another 3.0, so you lose a perfect    
   5.0 score.                                                                   
                                                                                
4) Probably a few other problems of that sort                                   
                                                                                
                                                                                
Over the  last 7-10  days, I tried  different ways to  fix this,  some being    
rather misguided,  trying not  to run  tests if other  ones ran,  and having    
overrides to  ignore the  first IP  for dul, which  gets interesting  if you    
compare checks  in set  dialup and checks  in set relay  which can  return a    
match of dialup.                                                                
Needless to say, this went nowhere, I couldn't understand my own code before    
long.                                                                           
                                                                                
The next idea, to change the score of some rules to 0 if other ones already     
matched seemed misguided too, especially since I wasn't sure if it wouldn't     
cause problems with spamd                                                       
                                                                                
I eventually came up with this: adding rules to counter other ones, and         
using a function called check_two_rbl_results to add a negative score if        
two RBLs matched on the same thing.                                             
Putting osirusoft in set relay was also a mistake, I've put it in its own       
osirusoft set since it can have many different meanings.                        
Last, but not least, an RBL rule  that ends with -firsthop is magic, it only    
matches on the originating IP provided there is a relay in the middle           
                                                                                
The rest should make sense if you look at the diff and the example RBL rules    
in the docs
Comment 1 Marc MERLIN 2002-05-31 00:29:30 UTC
Created attachment 129 [details]
patch file
Comment 2 Daniel Quinlan 2002-05-31 12:52:56 UTC
I don't really like "counteract" type rules.  I don't know what works best
for the GA, but what I've done for some other rules (future and past dates,
relay tests, etc.) is make similar rules not overlap with the expectation
that the spammier versions will get higher scores.

Also, the intent is for machines to be penalized more than once!  The more
places a machine is reported, the more we can believe that the RBL is correct.

Would it be feasible to separate the DUL tests as follows?

  # rules for mail sent through DUL and not relayed through ISP
  RCVD_IN_DUL_1 - machine appears in one DUL list
  RCVD_IN_DUL_2 - machine appears in two DUL lists
  RCVD_IN_DUL_3 - machine appears in three DUL lists
  RCVD_IN_DUL_4_MORE - machine appears in four or more DUL lists

  RCVD_IN_DUL_ISP_1 - machine appears in one DUL list and is relayed through ISP
  RCVD_IN_DUL_ISP_2 - machine appears in two DUL lists and is relayed through ISP
  ...

Machines only test positive for one of the above rules or none.  Not both.
Very similar to MSG_ID_ADDED_BY_MTA_2 and MSG_ID_ADDED_BY_MTA_3.  Assign
differing scores to each.

It may be feasible to stop testing DUL lists after the first three positive
results, so you could just have the last test be "three or more"
Comment 3 Marc MERLIN 2002-05-31 14:51:55 UTC
You talk about the GA, but it's not relevant here, the GA doesn't run against
RBLs

If you look at my code and rules closer, you'll notice that you don't get
penalized twice for being an open relay or a dialup IP (although you can
actually have each give you a score of 2, and counteract with just -1 to
penalize a bit more).

You should be penalized for being a confirmed spammer (127.0.0.6 or 8 on
osirusoft) or being on the MAPS RBL (the confirmed spammer list)

However if you query 3 RBLs and they all tell you that:
- it's a dialup IP
- it's an open relay

Do you give a score of 6 right away?

My scheme lets you:
In case #1, you probably only penalize with 2, no matter how many DULs you're on
In case #2, you can do the same, or give a slightly higher score if you're on
2 or 3 open relay lists, but again, do you want to plan mark as spam a mail
that's on 3 open relay RBLs?

My patch is pretty small, and yet it took me more than a week to come up to it.
It's not because I can't code, it's because I tried different approaches and
gave this a lot of though.

Mind you, what I propose is not infinitely flexible, but takes care of most
cases a lot better than the current code (which is inconsistent)

What you propose with RCVD_IN_DUL_ISP_1 and RCVD_IN_DUL_4_MORE, is one of the
things I tried to do initially. You'll however notice that there are *many* 
combinations, and it gets non trivial once you deal with multiple blacklist
RBLs like relays.osirusoft.com or RBL+

Are we going to have
RCVD_IN_2DUL_1RSS_1SPAMIP
RCVD_IN_2DUL_1RSS_2SPAMIP
RCVD_IN_2DUL_2RSS_1SPAMIP
...

If you want to be thorough, it just gets very complex.

I think my scheme offers reasonable flexibility while not introducing lots of
new complex code.
Comment 4 Marc MERLIN 2002-05-31 14:53:24 UTC
I, another thing I forgot:
The reason why I went with my scheme too, is that you don't get:
Is on 2 DULs and 1 RSS
You get:
Is on MAPS DUL, OSIRUSOFT DUL, and ORBS RSS

You know exactly which RBLs matched, and with which return IP
Comment 5 Craig Hughes 2002-06-09 23:34:16 UTC
Has been merged into #399

*** This bug has been marked as a duplicate of 399 ***