SA Bugzilla – Bug 3538
[review] DNSBL results not harvested due to timing bug
Last modified: 2004-07-20 02:09:06 UTC
I just spotted this while testing some DNSBL rules with no other rules active; if you don't have enough rules, the DNSBL tests will never be "harvested", it seems. (this may only apply if the only rules are urirhsbl/urirhssub rules, which is how I got it to happen). if you *do* have enough rules, as in the default ruleset, you'll get some of the hits OK, because the rules themselves will take long enough for at least some of the DNSBL hits to arrive before PerMsgStatus leaves the check loop. but this can be triggered by removing the default ruleset. the bug's in the priority code, AFAICS. to reproduce: - cd masses - mkdir testrules - create a testrules/01_test.cf file containing something like this: urirhsbl SURBL_AB ab.surbl.org. A header SURBL_AB eval:check_uridnsbl('SURBL_AB') tflags SURBL_AB net - run "./mass-check --net -j=1 -c=testrules -f targets" - no SURBL_AB hits will be seen. this fixes it, but is a horrific kludge workaround, so I won't dignify it with a real patch: --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 22093) +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy) @@ -169,7 +169,7 @@ # only harvest the dnsbl queries once priority HARVEST_DNSBL_PRIORITY # has been reached and then only run once - if ($priority >= HARVEST_DNSBL_PRIORITY && $needs_dnsbl_harvest_p) { + if ($needs_dnsbl_harvest_p) { # harvest the DNS results $self->harvest_dnsbl_queries(); $needs_dnsbl_harvest_p = 0; The real fix will, doubtless, be quite different ;) HARVEST_DNSBL_PRIORITY looks like it's not used in Conf.pm or Conf/Parser.pm -- did we lose it after it was added?
pointing at 3.0.0
I suppose this has no major effect on performance. Semi-related: I've been thinking we should truncate the network timeouts if our score is already twice the theshold (or something like that). I'm not as familiar with the URIBL code, but for the DNSBL code, it would be pretty straightforward to not sleep the last second and increase the timeout scaling. Of course, verify results with some mass-checks with an emptied local DNS cache.
lowering severity to normal I tested 500 spam and 500 ham using the current tree as a control and the one line workaround included above. Each was tested with all tests (no Bayes or AWL, but with DCC, Pyzor, and Razor plus DNSBL tests) as a baseline and with only URIBL tests as a worst case for the timing bug. DNS cache was cleared before each run and -j 8 was used. Three runs of each: count of messages with one or more URIBL hits: ham-old-ALL 4 4 4 ham-old-URIBL 2 3 4 ham-new-ALL 4 4 4 ham-new-URIBL 4 4 4 spam-old-ALL- 433 434 432 spam-old-URIBL 375 402 407 spam-new-ALL 433 433 430 spam-new-URIBL 402 402 405 The lack of a patch was only significant in one of three runs (the 375). Total URIBL hits across all three runs: ham-old-ALL 30 ham-old-URIBL 16 ham-new-ALL 30 ham-new-URIBL 25 spam-old-ALL 3527 spam-old-URIBL 2971 spam-new-ALL 3523 spam-new-URIBL 2834 So, yes, it helps, but it's only an increase of about 5% in the hit rate.
This is caused by the fact that there are no rules with a priority >= 500, so the harvest dsnbl code is never run. In the normal case, folks will have at least 1 meta rule, and since meta rules are forced to be >= 500 priority the harvest code would be run. In the case where you severely restrict the rules that are run it can cause a problem. I believe the proper fix is to add a check after all rules have been run, and if the harvest dnsbl code still hasn't been run, go ahead and let it run. We should be ok with mass-checks since there will be at least one meta rule in the mass-checks.
So, was someone going to try for a correct fix for 3.0.0-pre2? Or, was the ugly fix workable enough?
Subject: Re: DNSBL results not harvested due to timing bug On Thu, Jul 08, 2004 at 06:51:21PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > So, was someone going to try for a correct fix for 3.0.0-pre2? No. > Or, was the ugly fix workable enough? > Ugly fix is incorrect, so it's not an option. I've got an idea, just need time to implement. It's not important for mass checks since it only happens if you have no rules with >= 500 priority.
not a 3.0.0-pre2 blocker
Created attachment 2142 [details] Patch File
Adds a little sanity check block after rule processing, if the rbl harvest code hasn't been run yet, it runs it.
ok, looks fine -- +1
+1
Committed revision 23085.