Bug 399 - Fixed RBLs / RBL timeouts / Time logging / Debug framework
Summary: Fixed RBLs / RBL timeouts / Time logging / Debug framework
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P2 normal
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL: http://marc.merlins.org/linux/SA/sa_r...
: 175 268 298 370 393 (view as bug list)
Depends on:
Reported: 2002-06-05 01:14 UTC by Marc MERLIN
Modified: 2002-06-18 15:10 UTC (History)
4 users (show)

Attachment Type Modified Status Actions Submitter/CLA Status
patch file :) patch None Marc MERLIN [NoCLA]
Minor bugfix to previous patch patch None Marc MERLIN [NoCLA]
rules BL patch patch None Marc MERLIN [NoCLA]
DNS rules patch + timelog fixes for spamd patch None Marc MERLIN [NoCLA]
Yet another patch, superseedes the two previous ones :-) patch None Marc MERLIN [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Marc MERLIN 2002-06-05 01:14:45 UTC
This supersedes bug 370, but it refers to it (I'm not re-explaining the RBL part)

I know this sucks for whoever is going to deal with this, it's a big patch and
it does 4 different things. Splitting it is work because the diffs would conflict.
If for some reason you don't want to include some portion of this right now, but
you're willing to include everything else, you can split the patch, or ask me to
do it for you.

Here's the announcement I sent:
First, I need to apologize for sending all this mixed in one patch:             
- I did send two of these separately earlier here                               
- The patches are somewhat interleaved, so splitting them would have meant      
  writing multiple versions                                                     
- I didn't have time to refeed each piece slowly and hoping/waiting for it      
  to be included, I need the whole thing for sourceforge.net for last week :)   
My RBL rehaul to stop the problems I explained earlier hasn't changed:          
I rewrote  my patch  to log  how long  it took for  SA to  run and  in which    
portions of  the code the clock  (not CPU) time was  actually spent. This is    
quite  useful  if you  care  about  how long  SA  took  to scan  a  message,    
especially if you do it at SMTP time (which I do with SA-Exim)                  
This is obviously a user option                                                 
To give you an idea, that's what the log looks like:                            
0.000: Starting SpamAssassin Check                                              
0.268: Init completed                                                           
0.269: Created message object, checking message                                 
0.270: Launching RBL queries in the background                                  
0.388: Finished launching RBL queries in the background                         
0.388: Starting head tests                                                      
0.494: Finished head tests (Delta: 0.106s)                                      
0.494: Starting body tests                                                      
0.870: Finished body tests (Delta: 0.376s)                                      
0.870: Starting raw body tests                                                  
0.983: Finished raw body tests (Delta: 0.113s)                                  
0.983: Starting full message tests                                              
1.097: Razor -> Starting razor test (10 secs max)                               
1.241: Razor -> Finished razor test: confirmed spam (Delta: 0.145s)             
1.242: Finished full message tests (Delta: 0.259s)                              
1.242: Starting head eval tests                                                 
1.308: Finished head eval tests (Delta: 0.065s)                                 
1.308: Starting RBL tests (will wait up to  secs before giving up)              
1.309: RBL -> Waiting for result on              
11.411: RBL -> Timeout on (Delta: 10.101s)       
11.413: RBL -> Waiting for result on       
11.415: RBL -> No match on (Delta: 0.002s) 
11.416: RBL -> Waiting for result on       
11.418: RBL -> No match on (Delta: 0.002s) 
11.422: RBL -> Waiting for result on            
11.424: RBL -> No match on (Delta: 0.002s)      
11.426: RBL -> Waiting for result on   
11.474: RBL -> match on (Delta: 0.048s)
11.476: RBL -> Waiting for result on     
11.520: RBL -> match on (Delta: 0.043s)  
11.523: RBL -> Waiting for result on  
11.525: RBL -> No match on (Delta:    
11.527: RBL -> Waiting for result on      
11.529: RBL -> No match on (Delta: 0.002s)
11.531: RBL -> Waiting for result on            
11.533: RBL -> No match on (Delta: 0.002s)      
11.538: Finished all RBL tests (Delta: 10.230s)                                 
11.545: Done checking message (Delta: 11.276s)                                  
11.545: Done running SpamAssassin (Delta: 11.545s)                              
Then, my big goal was to put a bound on SA.                                     
I didn't touch the MX test mainly  because there's only one that's done, and    
SA  scores you  negatively  if  it didn't  succeed. Also,  the current  code    
actually runs the check  3 times (as an option), so that  didn't fit with my    
scheme of starting all the DNS queries at the beginning.                        
Basically, if you look at PerMsgStatus, I did the following:                    
    timelog("Launching RBL queries in the background", "rblbg", 1);             
    # Here, we launch all the DNS RBL queries and let them run while we         
    # inspect the message -- Marc                                               
    timelog("Finished launching RBL queries in the background", "rblbg", 22);   
    timelog("Starting head tests", "headtest", 1);                              
    timelog("Finished head tests", "headtest", 2);                              
    timelog("Starting head eval tests", "headevaltest", 1);                     
    timelog("Finished head eval tests", "headevaltest", 2);                     
    timelog("Starting RBL tests (will wait up to $self->{conf}->{dns_timeout}   
secs before giving up)", "rblblock", 1);                                        
    # This time we want to harvest the DNS results -- Marc                      
    # And now we can compute rules that depend on those results                 
    timelog("Finished all RBL tests", "rblblock", 2);
RBL rules now get prefixed with rbleval or rblreseval so that you can easily    
put them in a different set, and run them in the order intended.                
All the DNS queries are launched with bgsend before everything else happens,    
and we harvest the results at the end (waiting up to rbl_timeout)               
Note that while I put in some support for old RBL rules defined as:             
header RCVD_IN_OSIRUSOFT_COM    eval:check_rbl('osirusoft',                     
You're supposed to define them as such:                                         
header RCVD_IN_OSIRUSOFT_COM    rbleval:check_rbl('osirusoft',                  
My last change was debugging.                                                   
I noticed it was kind of a mess, everyone put his/her own debug messages, and   
left them behind or commented them out before the commit.                       
I   implemented  debug   sets   which   are  turned   on   and  off   inside    
SpamAssassin.pm's new  function right now,  but they  could be moved  to the    
command line. At least they can be turned on and off easilhy in one place.      
I'll quote the code:                                                            
  # This should be moved elsewhere, I know, but SA really needs debug sets      
  # I'm putting the intialization here for now, move it if you want             
  # For each part of the code, you can set debug levels. If the level is        
  # progressive, use negative numbers (the more negative, the move debug info   
  # is put out), and if you want to use bit fields, use positive numbers        
  # All code path debug codes should be listed here with a value of 0 if you    
  # want them disabled -- Marc                                                  
  # Bitfield:                                                                   
  # header regex: 1 | body-text: 2 | uri tests: 4 | raw-body-text: 8            
  # full-text regexp: 16 | run_eval_tests: 32 | run_rbl_eval_tests: 64          
The dbg function now looks like this:                                           
# Only the first argument is needed, and it can be a reference to a list if     
# you want                                                                      
sub dbg {                                                                       
  my ($msg, $codepath, $level) = @_;                                            
  my $dbg=$Mail::SpamAssassin::DEBUG;                                           
  $msg=join('',@{$msg}) if (ref $msg);                                          
  if (defined $codepath) {                                                      
    if (not defined $dbg->{$codepath}) {                                        
      warn("dbg called with codepath $codepath, but it's not defined, skipping  
(message was \"$msg\"\n");                                                      
      return 0;                                                                 
    } elsif (not defined $level) {                                              
      warn("dbg called with codepath $codepath, but no level threshold (message 
was \"$msg\"\n");                                                               
  return if (not $dbg->{enabled});                                              
  # Negative levels are just level numbers, the more negative, the more debug   
  return if (defined $level and $level<0 and not $dbg->{$codepath} <= $level);  
  # Positive levels are bit fields                                              
  return if (defined $level and $level>0 and not $dbg->{$codepath} & $level);   
  warn "debug: $msg\n";                                                         
It used  to accept  a list  but no  one used  that anywhere  in the  code (I    
checked), so the  fact that it only  accepts a scalar for  the debug message    
should be ok.                                                                   
I made it accept an array ref just in case someone wanted to use the feature    
That's about it.
Comment 1 Marc MERLIN 2002-06-05 01:15:23 UTC
Created attachment 146 [details]
patch file :)
Comment 2 Marc MERLIN 2002-06-05 01:38:37 UTC
Created attachment 147 [details]
Minor bugfix to previous patch
Comment 3 Craig Hughes 2002-06-09 23:34:17 UTC
*** Bug 370 has been marked as a duplicate of this bug. ***
Comment 4 Craig Hughes 2002-06-10 00:15:25 UTC
Ok, hacked this up a little to get it to commit against current CVS, plus fixed a couple problems 
which needed addressing following the patch (some old dbg() calls which relied on arguments being 
concatenated downstream).  Checking in now.
Comment 5 Craig Hughes 2002-06-10 00:18:28 UTC
*** Bug 175 has been marked as a duplicate of this bug. ***
Comment 6 Craig Hughes 2002-06-10 01:11:32 UTC
*** Bug 268 has been marked as a duplicate of this bug. ***
Comment 7 Craig Hughes 2002-06-10 02:01:01 UTC
*** Bug 298 has been marked as a duplicate of this bug. ***
Comment 8 Craig Hughes 2002-06-10 03:25:11 UTC
*** Bug 393 has been marked as a duplicate of this bug. ***
Comment 9 Marc MERLIN 2002-06-11 23:49:12 UTC
Hi Craig.

My patch gave an example of how the rules should now be to actually fix the double
scoring and the scoring on properly relayed DUL, but it didn't actually include
the patch the the rules themselves.

I'm including this patch here.

You'll see I re-organized the BL rules a bit, I just moved them around so that
they are ordered by type, and I changed the newly added dsbl so that its relay
test is in the relay set, like the other ones, otherwise you'd get a 5.0 right
away if a message were on the dsbl and some other relay BL.

I also fixed a timelog typo
Comment 10 Marc MERLIN 2002-06-11 23:50:17 UTC
Created attachment 166 [details]
rules BL patch
Comment 11 Marc MERLIN 2002-06-12 09:10:00 UTC
After updating my tree against CVS, I realized that TimeLog didn't deal with
some changes in spamd, and didn't close/reopen its log file for each new message
I've fixed that and I also
- cleaned up mesgid in case it contains a '/'
- changed the init testmsg so that it contains a recognizable mesgid
- fixed a warn

This patch supersedes attachment 166 [details]
Comment 12 Marc MERLIN 2002-06-12 09:11:19 UTC
Created attachment 169 [details]
DNS rules patch + timelog fixes for spamd
Comment 13 Marc MERLIN 2002-06-12 19:31:53 UTC
Sorry, yet another update.
After running CVS with the recent dsbl addition, I noticed some problems with
double scoring, so I added a fudge rule to compensate for that, as well as some
comments about dsbl
Comment 14 Marc MERLIN 2002-06-12 19:32:37 UTC
Created attachment 171 [details]
Yet another patch, superseedes the two previous ones :-)
Comment 15 Marc MERLIN 2002-06-18 23:10:07 UTC
Craig merged that in at the last minute in 2.30, closing...