SA Bugzilla – Bug 399
Fixed RBLs / RBL timeouts / Time logging / Debug framework
Last modified: 2002-06-18 15:10:07 UTC
This supersedes bug 370, but it refers to it (I'm not re-explaining the RBL part) I know this sucks for whoever is going to deal with this, it's a big patch and it does 4 different things. Splitting it is work because the diffs would conflict. If for some reason you don't want to include some portion of this right now, but you're willing to include everything else, you can split the patch, or ask me to do it for you. Here's the announcement I sent: --------------------------------------------------------------------------------- First, I need to apologize for sending all this mixed in one patch: - I did send two of these separately earlier here - The patches are somewhat interleaved, so splitting them would have meant writing multiple versions - I didn't have time to refeed each piece slowly and hoping/waiting for it to be included, I need the whole thing for sourceforge.net for last week :) My RBL rehaul to stop the problems I explained earlier hasn't changed: http://bugzilla.spamassassin.org/show_bug.cgi?id=370 I rewrote my patch to log how long it took for SA to run and in which portions of the code the clock (not CPU) time was actually spent. This is quite useful if you care about how long SA took to scan a message, especially if you do it at SMTP time (which I do with SA-Exim) This is obviously a user option To give you an idea, that's what the log looks like: 0.000: Starting SpamAssassin Check 0.268: Init completed 0.269: Created message object, checking message 0.270: Launching RBL queries in the background 0.388: Finished launching RBL queries in the background 0.388: Starting head tests 0.494: Finished head tests (Delta: 0.106s) 0.494: Starting body tests 0.870: Finished body tests (Delta: 0.376s) 0.870: Starting raw body tests 0.983: Finished raw body tests (Delta: 0.113s) 0.983: Starting full message tests 1.097: Razor -> Starting razor test (10 secs max) 1.241: Razor -> Finished razor test: confirmed spam (Delta: 0.145s) 1.242: Finished full message tests (Delta: 0.259s) 1.242: Starting head eval tests 1.308: Finished head eval tests (Delta: 0.065s) 1.308: Starting RBL tests (will wait up to secs before giving up) 1.309: RBL -> Waiting for result on 161.71.244.211.bl.spamcop.net. 11.411: RBL -> Timeout on 161.71.244.211.bl.spamcop.net. (Delta: 10.101s) 11.413: RBL -> Waiting for result on 161.71.244.211.orbs.dorkslayers.com. 11.415: RBL -> No match on 161.71.244.211.orbs.dorkslayers.com. (Delta: 0.002s) 11.416: RBL -> Waiting for result on 161.71.244.211.relays.osirusoft.com. 11.418: RBL -> No match on 161.71.244.211.relays.osirusoft.com. (Delta: 0.002s) 11.422: RBL -> Waiting for result on 161.71.244.211.relays.ordb.org. 11.424: RBL -> No match on 161.71.244.211.relays.ordb.org. (Delta: 0.002s) 11.426: RBL -> Waiting for result on 161.71.244.211.ipwhois.rfc-ignorant.org. 11.474: RBL -> match on 161.71.244.211.ipwhois.rfc-ignorant.org. (Delta: 0.048s) 11.476: RBL -> Waiting for result on 161.71.244.211.dialups.mail-abuse.org. 11.520: RBL -> match on 161.71.244.211.dialups.mail-abuse.org. (Delta: 0.043s) 11.523: RBL -> Waiting for result on 161.71.244.211.blackholes.mail-abuse.org. 11.525: RBL -> No match on 161.71.244.211.blackholes.mail-abuse.org. (Delta: 0.002s) 11.527: RBL -> Waiting for result on 161.71.244.211.relays.mail-abuse.org. 11.529: RBL -> No match on 161.71.244.211.relays.mail-abuse.org. (Delta: 0.002s) 11.531: RBL -> Waiting for result on 161.71.244.211.relays.visi.com. 11.533: RBL -> No match on 161.71.244.211.relays.visi.com. (Delta: 0.002s) 11.538: Finished all RBL tests (Delta: 10.230s) 11.545: Done checking message (Delta: 11.276s) 11.545: Done running SpamAssassin (Delta: 11.545s) Then, my big goal was to put a bound on SA. I didn't touch the MX test mainly because there's only one that's done, and SA scores you negatively if it didn't succeed. Also, the current code actually runs the check 3 times (as an option), so that didn't fit with my scheme of starting all the DNS queries at the beginning. Basically, if you look at PerMsgStatus, I did the following: timelog("Launching RBL queries in the background", "rblbg", 1); # Here, we launch all the DNS RBL queries and let them run while we # inspect the message -- Marc $self->do_rbl_eval_tests(0); timelog("Finished launching RBL queries in the background", "rblbg", 22); timelog("Starting head tests", "headtest", 1); $self->do_head_tests(); timelog("Finished head tests", "headtest", 2); (...) timelog("Starting head eval tests", "headevaltest", 1); $self->do_head_eval_tests(); timelog("Finished head eval tests", "headevaltest", 2); timelog("Starting RBL tests (will wait up to $self->{conf}->{dns_timeout} secs before giving up)", "rblblock", 1); # This time we want to harvest the DNS results -- Marc $self->do_rbl_eval_tests(1); # And now we can compute rules that depend on those results $self->do_rbl_res_eval_tests(); timelog("Finished all RBL tests", "rblblock", 2); RBL rules now get prefixed with rbleval or rblreseval so that you can easily put them in a different set, and run them in the order intended. All the DNS queries are launched with bgsend before everything else happens, and we harvest the results at the end (waiting up to rbl_timeout) Note that while I put in some support for old RBL rules defined as: header RCVD_IN_OSIRUSOFT_COM eval:check_rbl('osirusoft', 'relays.osirusoft.com.') You're supposed to define them as such: header RCVD_IN_OSIRUSOFT_COM rbleval:check_rbl('osirusoft', 'relays.osirusoft.com.') My last change was debugging. I noticed it was kind of a mess, everyone put his/her own debug messages, and left them behind or commented them out before the commit. I implemented debug sets which are turned on and off inside SpamAssassin.pm's new function right now, but they could be moved to the command line. At least they can be turned on and off easilhy in one place. I'll quote the code: # This should be moved elsewhere, I know, but SA really needs debug sets # I'm putting the intialization here for now, move it if you want # For each part of the code, you can set debug levels. If the level is # progressive, use negative numbers (the more negative, the move debug info # is put out), and if you want to use bit fields, use positive numbers # All code path debug codes should be listed here with a value of 0 if you # want them disabled -- Marc #$DEBUG->{datediff}=-1; #$DEBUG->{razor}=-3; $DEBUG->{rbl}=0; $DEBUG->{timelog}=0; # Bitfield: # header regex: 1 | body-text: 2 | uri tests: 4 | raw-body-text: 8 # full-text regexp: 16 | run_eval_tests: 32 | run_rbl_eval_tests: 64 $DEBUG->{rulesrun}=64; The dbg function now looks like this: # Only the first argument is needed, and it can be a reference to a list if # you want sub dbg { my ($msg, $codepath, $level) = @_; my $dbg=$Mail::SpamAssassin::DEBUG; $msg=join('',@{$msg}) if (ref $msg); if (defined $codepath) { if (not defined $dbg->{$codepath}) { warn("dbg called with codepath $codepath, but it's not defined, skipping (message was \"$msg\"\n"); return 0; } elsif (not defined $level) { warn("dbg called with codepath $codepath, but no level threshold (message was \"$msg\"\n"); } } return if (not $dbg->{enabled}); # Negative levels are just level numbers, the more negative, the more debug return if (defined $level and $level<0 and not $dbg->{$codepath} <= $level); # Positive levels are bit fields return if (defined $level and $level>0 and not $dbg->{$codepath} & $level); warn "debug: $msg\n"; } It used to accept a list but no one used that anywhere in the code (I checked), so the fact that it only accepts a scalar for the debug message should be ok. I made it accept an array ref just in case someone wanted to use the feature though. That's about it.
Created attachment 146 [details] patch file :)
Created attachment 147 [details] Minor bugfix to previous patch
*** Bug 370 has been marked as a duplicate of this bug. ***
Ok, hacked this up a little to get it to commit against current CVS, plus fixed a couple problems which needed addressing following the patch (some old dbg() calls which relied on arguments being concatenated downstream). Checking in now.
*** Bug 175 has been marked as a duplicate of this bug. ***
*** Bug 268 has been marked as a duplicate of this bug. ***
*** Bug 298 has been marked as a duplicate of this bug. ***
*** Bug 393 has been marked as a duplicate of this bug. ***
Hi Craig. My patch gave an example of how the rules should now be to actually fix the double scoring and the scoring on properly relayed DUL, but it didn't actually include the patch the the rules themselves. I'm including this patch here. You'll see I re-organized the BL rules a bit, I just moved them around so that they are ordered by type, and I changed the newly added dsbl so that its relay test is in the relay set, like the other ones, otherwise you'd get a 5.0 right away if a message were on the dsbl and some other relay BL. I also fixed a timelog typo
Created attachment 166 [details] rules BL patch
After updating my tree against CVS, I realized that TimeLog didn't deal with some changes in spamd, and didn't close/reopen its log file for each new message processed. I've fixed that and I also - cleaned up mesgid in case it contains a '/' - changed the init testmsg so that it contains a recognizable mesgid - fixed a warn This patch supersedes attachment 166 [details]
Created attachment 169 [details] DNS rules patch + timelog fixes for spamd
Sorry, yet another update. After running CVS with the recent dsbl addition, I noticed some problems with double scoring, so I added a fudge rule to compensate for that, as well as some comments about dsbl
Created attachment 171 [details] Yet another patch, superseedes the two previous ones :-)
Craig merged that in at the last minute in 2.30, closing...