Bug 5589

Summary: [review] Refined async events handling and DNS lookup completions
Product: Spamassassin Reporter: Mark Martinec <Mark.Martinec>
Component: LibrariesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: giampaolo
Priority: P5    
Version: 3.2.3   
Target Milestone: 3.2.4   
Hardware: All   
OS: All   
Whiteboard: can be commited
Attachments: The Big Combo Patch
replaces the previous patch to 3.2.3 with an enhanced and cleaned one
updated and cleaned patch
replacement patch - with partial backport of r585292
extend the rbl_timeout syntax to allow per-zone timeout specification

Description Mark Martinec 2007-08-10 08:30:24 UTC
Copied from bug 5511, but opening a new ticket for discussions:

To remind ourselves what is the core problem here: some asynchronous
DNS lookup requests carry a completion routine which then spawns
additional DNS queries. This happens currently in Plugin::URIDNSBL,
but could in principle happen in any completion routine. For example,
a lookup on a NS record spawns an additional lookup for an A record,
but not before harvest_dnsbl_queries happens, which is quite late,
so not much time remains for collecting responses to these new queries.
Same goes for URIDNS lookups such as sbl.spamhaus.org, cn-kr.blackholes.us

What happened before the patch (bug 5511) is that most of these
additional lookup results were simply discarded if other rules
took long time to complete (longer than rbl_timeout), which was
a real waste of good information.

A solution is in processing completion routines (which may spawn
new queries) much sooner and gradually, as responses trickle in,
and not wait until priority 500, which is almost at the end of
mail processing.

I have a patch which accomplishes this. It is rather substantial (but
compatible with the rest of the code), touching AsyncLoop.pm mostly,
but also Dns.pm, DnsResolver.pm, Plugin/Check.pm and Plugin/URIDNSBL.pm.

From Justin:
> This change sounds pretty big, but probably quite uncontroversial
> (assuming it doesn't change any of the public POD-documented APIs),
> so you may not need to open a bug... see what you think applies.

Ok, this is what I'm going to do:
- attach a complete combo patch that applies to 3.2.3 or 3.2.2;
- commit it to trunk;
- add some explanations/comments below.
Comment 1 Mark Martinec 2007-08-10 08:50:56 UTC
Created attachment 4081 [details]
The Big Combo Patch

The patch has been thoroughly tested and polished with 3.2.2 and
3.2.3 on a production server, and is expected to be compatible
with the rest of the code. Nevertheless, as it touches 8 modules
and slightly modifies some of the internal interfaces to modules
AsyncLoop and Dns, a bit of caution is warranted. When new uses
of AsyncLoop are contemplated (e.g. new plugins), please mind
that some additional arguments (although optional, with sensible
defaults), and extended results are being passed around.
Comment 2 Mark Martinec 2007-08-10 08:55:19 UTC
$ svn ci -m 'Refined async events handling and DNS lookup completions
(bug 5589)'
Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
Sending        lib/Mail/SpamAssassin/Dns.pm
Sending        lib/Mail/SpamAssassin/DnsResolver.pm
Sending        lib/Mail/SpamAssassin/Logger.pm
Sending        lib/Mail/SpamAssassin/Plugin/ASN.pm
Sending        lib/Mail/SpamAssassin/Plugin/Check.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Transmitting file data ........
Committed revision 564636.
Comment 3 Mark Martinec 2007-08-10 10:01:17 UTC
And now for some remarks on changes:

- Perl module Time::HiRes is now used in AsyncLoop and DnsResolver.
  Although it is still optional (when loading fails the built-in function
  time() is used), it brings much benefits to the overall operations,
  and improves usefulness of reported log, facilitating troubleshooting.
  I'd suggest that Time::HiRes becomes a required module with 3.3.

- entries submitted to AsyncLoop now carry individual timing information:
  . start of a query timestamp
  . timestamp of the received response
  . individual timeout values (default to rbl_timeout)
  . display_id (only serves logging purposes)
  This now permits correct determination when some request is overdue
  and needs to be aborted - previously elapsed times were computed from
  the time of the first request, which led to grossly incorrect decisions
  on aborting of late-starting requests, which was the main reason for
  all this substantial rework. Additional benefits are more informative
  debug logging - look for '...: timing:' and similar.

- since both completion routines (set_response_packet and report_id_complete)
  now need access to AsyncLoop entries ($ent), but DNS responses only carry
  an 'id' and not a 'key', I faced two choices: either maintain an id-to-key
  mapping hash, or pass around the 'key' in calls to bgsend etc.
  I chose the later.

- shrinking of timeout as more and more responses trickle-in
  is now handled in AsyncLoop, and no longer in Dns harvest_* .
  It now yields correct decisions even for late-starting queries;

- Dns::harvest_until_rule_completes and harvest_dnsbl_queries redone,
  these are now much simplified because dynamic timeout handling
  is now done in one place in AsyncLoop::complete_lookups;

- a new and lightweight subroutine Dns::harvest_completed_queries
  should now be called as often as comfortable. It collects available
  DNS responses and invokes their completion routines, but never
  causes any delays (never sleeps, waiting for responses).
  Currently I added calls to this subroutine at a dozen places in
  Plugin::Check, which is somewhat unsightly, but does its job well.
  Improvements welcome.

- DnsResolver now adds a timestamp to a completed request, in order
  to let it receive as accurate measurement as convenient;

- DnsResolver::poll_responses now collects ALL available responses
  not just one, to match previous documentation and intention,
  and to suit our needs better;

- Plugin::URIDNSBL and Plugin::ASN modified to pass the 'key' around;

- Plugin::ASN was not calling $scanner->{async}->set_response_packet,
  which led module Dns.pm to think the responses have not been collected,
  and letting it wait unnecessarily until timeout. I believe this is
  the cause of the apparent slowness when Plugin::ASN is enabled,
  as reported recently on a sa-users mailing list;  Fixed.

- routines Logger::dbg and Logger::info (actually the Logger::_log() )
  now take optional arguments. If any optional argument is present
  AND the first argument contains a '%', then sprintf formatting is
  invoked, i.e. the first argument becomes a format string, and remaining
  arguments are passed to sprintf. Benefits are:
  . numerous splicing of data into dbg strings is becoming unsightly,
    especially when more than simple scalar need to be included in a
    log entry, or when formatting of floating point data is needed;
  . it is a waste to first splice together a string to be logged,
    only to find out by Logger that it is not needed. It is cheaper
    to pass additional data as arguments, and let a string be assembled
    only when needed.
  The change is fully compatible with previous use. I modified calls
  to dbg() in modules I touched, other modules may be modified (if at all)
  when convenient.

- to avoid the problem with Perl eval {}, where an error within eval
  causes eval to be completed and Perl regaining control at its end
  BUT LEAVING $@ EMPTY(!) (according to some comments in code the
  SpamAssassin has already been bitten by this, and so have been
  my amavisd-new), I started using the following idiom (or some variation):
  eval {
    ...
    1;
  } or do {
    $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
    dbg("error during ...: %s", $eval_stat);
  };

  The idea is to ONLY rely on a final value of eval, which is guaranteed
  to be false on failure, and not depend on a value of $@.
  The problem of empty $@ seems to happen on certain I/O error conditions.
  I suggest other uses of eval {} are modified as convenient.
Comment 4 Justin Mason 2007-08-12 06:33:45 UTC
+1 -- all looks good to me!
Comment 5 Mark Martinec 2007-08-22 11:32:28 UTC
Now that async DNS completions seem to be more responsive and
well behaved, I'm thinking on how to take a better advantage
of it. I see two issues here, one appears more straightforward
(but I don't know how to handle it), the other needs more
thought and some rework:

1.
The DNS queries as fired off by the URIDNSBL plugin are starting
quite late, after most time-consuming base rules are already over.
If these queries were fired off earlier, it would give them
extended time for completion and lessen the need to idly wait for
responses by the end. Perhaps lowering priority of rules calling
URIDNSBL plugin would do the job, but manually changing priority
on all such rules is tedious. Any cleaner/automatic solutions?

2.
Some plugins such as SPF, Razor2, DKIM, DomainKeys, Botnet
do their own DNS or similar queries. It seems it would be beneficial
to offer a mechanism to plugins in which a plugin would be offered
a possibility to be executed in two (or more) parts, early to do
some preparation work (e.g. parse mail header and fire off their
queries), and then again later on, to collect responses and finish
with their work, e.g. verifying mail signature against collected
public key in DKIM.

I see some hooks for plugins that could perhaps be used to accomplish
the job, but I'm not sure they are the right approach (not being
too familiar with these yet). I also see that SPF is given access
to SpamAssassin's DNS querying tools, but the job goes half way,
SPF plugin still insists on waiting for its own query on its own.

I shared some thoughts with Jason Long, the author or Mail::DKIM,
on what would be the best approach for that module's API to allow it
to proceed on its task in two or three steps.

Suggestions or code example welcome. Perhaps letting a plugin
insert new rules to be executed at a later priority, simulating
submitting new tasks to a scheduler.
Comment 6 Theo Van Dinter 2007-08-22 11:57:32 UTC
(In reply to comment #5)
> Some plugins such as SPF, Razor2, DKIM, DomainKeys, Botnet
> do their own DNS or similar queries. It seems it would be beneficial
> to offer a mechanism to plugins in which a plugin would be offered
> a possibility to be executed in two (or more) parts, early to do
> some preparation work (e.g. parse mail header and fire off their
> queries), and then again later on, to collect responses and finish
> with their work, e.g. verifying mail signature against collected
> public key in DKIM.

FWIW, I floated this idea a while ago but never had time to look into making it
happen.  So +1 and many kudos if you can make it so. :)
Comment 7 Justin Mason 2007-08-23 02:56:38 UTC
(In reply to comment #5)
> Now that async DNS completions seem to be more responsive and
> well behaved, I'm thinking on how to take a better advantage
> of it. I see two issues here, one appears more straightforward
> (but I don't know how to handle it), the other needs more
> thought and some rework:
> 
> 1.
> The DNS queries as fired off by the URIDNSBL plugin are starting
> quite late, after most time-consuming base rules are already over.
> If these queries were fired off earlier, it would give them
> extended time for completion and lessen the need to idly wait for
> responses by the end. Perhaps lowering priority of rules calling
> URIDNSBL plugin would do the job, but manually changing priority
> on all such rules is tedious. Any cleaner/automatic solutions?

An easy way would be to modify the check_main() method to simply
call the do_body_tests() methods last in the list, ie. instead of

   $self->do_head_tests($pms, $priority);
    $pms->harvest_completed_queries();
    $self->do_head_eval_tests($pms, $priority);
    $pms->harvest_completed_queries();

    $self->do_body_tests($pms, $priority, $decoded);
    $pms->harvest_completed_queries();
    $self->do_uri_tests($pms, $priority, @uris);
    $pms->harvest_completed_queries();
    $self->do_body_eval_tests($pms, $priority, $decoded);
    $pms->harvest_completed_queries();

    $self->do_rawbody_tests($pms, $priority, $bodytext);
    $pms->harvest_completed_queries();
    $self->do_rawbody_eval_tests($pms, $priority, $bodytext);
    $pms->harvest_completed_queries();

    $self->do_full_tests($pms, $priority, \$fulltext);
    $pms->harvest_completed_queries();
    $self->do_full_eval_tests($pms, $priority, \$fulltext);
    $pms->harvest_completed_queries();

    $self->do_meta_tests($pms, $priority);
    $pms->harvest_completed_queries();

call

    $self->do_head_tests($pms, $priority);
    $pms->harvest_completed_queries();
    $self->do_head_eval_tests($pms, $priority);
    $pms->harvest_completed_queries();

    $self->do_rawbody_tests($pms, $priority, $bodytext);
    $pms->harvest_completed_queries();
    $self->do_rawbody_eval_tests($pms, $priority, $bodytext);
    $pms->harvest_completed_queries();

    $self->do_full_tests($pms, $priority, \$fulltext);
    $pms->harvest_completed_queries();
    $self->do_full_eval_tests($pms, $priority, \$fulltext);
    $pms->harvest_completed_queries();

    $self->do_body_tests($pms, $priority, $decoded);
    $pms->harvest_completed_queries();
    $self->do_uri_tests($pms, $priority, @uris);
    $pms->harvest_completed_queries();
    $self->do_body_eval_tests($pms, $priority, $decoded);
    $pms->harvest_completed_queries();

    $self->do_meta_tests($pms, $priority);
    $pms->harvest_completed_queries();


or some other way to ensure the URIDNSBL stuff is kicked off before
the body rules, within the same priority level.  There's no defined
ordering of rules within the same priority level, except that meta rules
must run in some way that they can use subrules from the same level
or below.

looking at the code, though, it appears they should be starting before
the body rules anyway -- from the "parsed_metadata" hook.  is that
not what you're seeing?


> 2.
> Some plugins such as SPF, Razor2, DKIM, DomainKeys, Botnet
> do their own DNS or similar queries. It seems it would be beneficial
> to offer a mechanism to plugins in which a plugin would be offered
> a possibility to be executed in two (or more) parts, early to do
> some preparation work (e.g. parse mail header and fire off their
> queries), and then again later on, to collect responses and finish
> with their work, e.g. verifying mail signature against collected
> public key in DKIM.
>
> I see some hooks for plugins that could perhaps be used to accomplish
> the job, but I'm not sure they are the right approach (not being
> too familiar with these yet). 

yeah, as I noted above URIDNSBL uses "parsed_metadata", and that's
typically the place to start them.... maybe that should be documented
in the Mail::SA::Plugin POD.

> I also see that SPF is given access
> to SpamAssassin's DNS querying tools, but the job goes half way,
> SPF plugin still insists on waiting for its own query on its own.
>
> I shared some thoughts with Jason Long, the author or Mail::DKIM,
> on what would be the best approach for that module's API to allow it
> to proceed on its task in two or three steps.

yeah, I've mentioned this to Meng about Mail::SPF::Query a long
time ago.  it's hard work for their modules, though -- I remember
when we did this in the Mail::SA::Dns module a while back,
it was painful. :)

it may require us "adopting" some of the logic from their modules,
taking over the actual "issuing queries" part, and just calling
into hooks in those modules to interpret the results.

> Suggestions or code example welcome. Perhaps letting a plugin
> insert new rules to be executed at a later priority, simulating
> submitting new tasks to a scheduler.

hmm... interesting idea...  my thoughts would be that it'd probably
be easier for the plugin to hook into one of the later hooks like
"check_post_dnsbl" or "check_tick"...
Comment 8 Mark Martinec 2007-08-24 10:55:29 UTC
> An easy way would be to modify the check_main() method to simply
> call the do_body_tests() methods last in the list, ie. instead of
>   ...
> or some other way to ensure the URIDNSBL stuff is kicked off before
> the body rules, within the same priority level.  There's no defined
> ordering of rules within the same priority level, except that meta rules
> must run in some way that they can use subrules from the same level
> or below.
> 
> looking at the code, though, it appears they should be starting before
> the body rules anyway -- from the "parsed_metadata" hook.  is that
> not what you're seeing?

Haven't tried reordering yet, but just to document what I'm seeing,
here is a trimmed debug output from a command-line spamassassin,
with a first column showing elapsed time from the process start:

0.757 generic: SpamAssassin version 3.3.0-r569445
...
5.621 check: running tests for priority: -1000
...
5.800 check: running tests for priority: 0
6.029 rules: compiled head tests
...
6.058 rules: ran header rule __HAS_MIMEOLE ======> got hit: "P"
6.060 rules: running head_eval tests; score so far=-5
6.070 spf: ...
6.387 dkim: ...
6.391 spf: ...
6.393 dkim: ...
6.393 Botnet: ...
6.398 spf: ...
6.400 rules: running body tests; score so far=-5
7.096 rules: compiled body tests
7.417 rules: ran body rule __DOS_BODY_THU ======> got hit:...
8.805 rules: ran body rule __DOS_BODY_WED ======> got hit:...
11.687 rules: ran body rule __HAS_ANY_EMAIL ======> got hit:...
11.702 rules: ran body rule __SARE_PHONE_NUM ======> got hit:...
11.820 rules: ran body rule __NONEMPTY_BODY ======> got hit: "R"
13.153 async: select found 1 responses ready
13.153 async: query completed in 0.686 s: URI-DNSBL, DNSBL:dob...
13.153 async: query completed in 0.540 s: URI-A, A:deneb.dfn.de.
13.154 async: starting: URI-DNSBL, DNSBL:sbl.spamhaus.org....
13.155 async: starting: URI-DNSBL, DNSBL:cn-kr.blackholes.us...
13.155 async: starting: URI-DNSBL, DNSBL:l2.spews.dnsbl.sorbs.net.
13.156 async: starting: URI-DNSBL, DNSBL...
...
13.188 async: starting: URI-DNSBL, DNSBL:sbl.spamhaus.org....
13.189 async: starting: URI-DNSBL, DNSBL:cn-kr.blackholes.us...
13.189 async: starting: URI-DNSBL, DNSBL:l2.spews.dnsbl.sorbs.net.
13.190 async: starting: URI-DNSBL, DNSBL:l1.spews.dnsbl.sorbs.net.
13.190 async: query completed in 0.657 s: DNSBL-TXT, dns:TXT:...
...
13.191 async: queries completed: 32, started: 40
13.191 async: queries active: DNSBL-A=11 URI-DNSBL=40 ...
13.191 dns: harvested completed queries
13.191 rules: running uri tests; score so far=-5
13.231 rules: compiled uri tests
13.253 rules: ran uri rule __DOS_HAS_ANY_URI ======> got hit: "m"
13.257 rules: ran uri rule __LOCAL_PP_NONPPURL ======> got hit: ...
13.269 rules: running body_eval tests; score so far=-5
13.295 pdfinfo: ...
13.305 eval: stock info total: 0
13.306 rules: ran eval rule __TAG_EXISTS_BODY ======> got hit (1)
13.310 eval: text words: 288, html words: 287
13.311 eval: madiff: left: 1, orig: 287, max-difference: 0.35%
13.314 rules: ran eval rule __COMMENT_EXISTS ======> got hit (1)
...
13.332 rules: running rawbody tests; score so far=-8.3
13.396 rules: compiled rawbody tests
13.446 rules: ran rawbody rule __TVD_BODY ======> got hit: "Tom,"
13.448 rules: ran rawbody rule __SARE_HAS_FG_COLOR ======> got hit:.
13.467 rules: running rawbody_eval tests; score so far=-8.3
13.469 rules: ran eval rule __SARE_HTML_HAS_BR ======> got hit (1)
...
13.470 rules: running full tests; score so far=-8.3
13.473 rules: compiled full tests
13.476 rules: running full_eval tests; score so far=-8.3
13.477 info: entering helper-app run mode
14.081 razor2: ...
14.082 pyzor: ...
14.347 dcc: ...
14.347 info: leaving helper-app run mode
14.363 async: select found 34 responses ready
14.363 async: query completed in 1.199 s: URI-DNSBL, DNSBL:sbl...
14.369 async: queries active: DNSBL-A=11 URI-DNSBL=6
14.369 dns: harvested completed queries
14.369 rules: running meta tests; score so far=-8.3
14.370 rules: compiled meta tests
14.370 check: running tests for priority: 100
...

Note the late-starting queries at a 13-th second,
at priority 0, which seem to come from body tests,
apparently mixed with other time-consuming body tests
at the same priority.
Comment 9 Justin Mason 2007-09-13 06:57:25 UTC
do we want to backport this to 3.2.4?
Comment 10 Mark Martinec 2007-09-13 07:34:59 UTC
> do we want to backport this to 3.2.4?

Don't know, you tell me, I don't mind.
Probably the r570127 should then be backported too.

It would be nice to also move the URI lookups
earlier (comment #8) to nail the entire issue,
but I still don't know how to do it nicely.

But then it seems a lot of change altogether
for a minor maintenance version.
How far is 3.3.0 ? It works just fine here.
Comment 11 Giampaolo Tomassoni 2007-09-27 05:46:57 UTC
I would suggest few further enhancement to the async core.

The async API seems not define any way for a plugin to get a reliable 
indication of some async request to have timeout.

In case of a request timeout, I would find usefull to have the request's 
callback invoked with some timeout flag raised in $ent. Also, I would find 
very usefull to have a plugin-wide entry called when all the outstanding 
queries are completed (successfully as well as due to a timeout), thereby 
allowing the plugin to "draw conclusions" about the gathered data as a whole.

In summary:
 - in case of timeout, callback invoked with a "timeout flag" raised;
 - right before exiting the async look, invoke a specific sub in the plugin,
   if the latter defines it.

I guess these enhancements should be pretty backward-compatible, isn't it?
Comment 12 Mark Martinec 2007-09-27 08:09:21 UTC
> I would suggest few further enhancement to the async core. 
> The async API seems not define any way for a plugin to get a reliable 
> indication of some async request to have timeout.
> In case of a request timeout, I would find usefull to have the
> request's callback invoked with some timeout flag raised in $ent.

I'd find it useful too. A minimal change is required, committed to trunk:

n$ svn -m 'let completed_callback be called even on aborts, see bug 5589#11' ci
Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Committed revision 580055.

> Also, I would find very usefull to have a plugin-wide entry called when
> all the outstanding queries are completed (successfully as well as due
> to a timeout), thereby allowing the plugin to "draw conclusions" about
> the gathered data as a whole.

Perhaps the "check_post_dnsbl" plugins hook already fits the needs.
Comment 13 Mark Martinec 2007-09-27 08:17:24 UTC
Created attachment 4138 [details]
replaces the previous patch to 3.2.3 with an enhanced and cleaned one

Here is a merge of:
- a cleaned 'Big Combo Patch', with two small bugs fixed
- a backport of r570127 to 3.2.3
- a backport of r580055 to 3.2.3

It passes self-tests, appears to work fine, although it is
less thoroughly tested than the equivalent code in trunk.
Comment 14 Mark Martinec 2007-09-27 08:20:45 UTC
> svn -m 'let completed_callback be called even on aborts, see bug 5589#11' ci
> Committed revision 580055.

Here is the essence of the change, described by the updated comment:


  =item completed_callback (optional)

A code reference which will be called when an asynchronous task (e.g. a
DNS lookup) is completed, either normally, or aborted, e.g. by a timeout.

When a task has been reported as completed via C<set_response_packet()>
the response (as provided to C<set_response_packet()>) is stored in
$ent->{response_packet} (possibly undef, its semantics is defined by the
caller). When completion is reported via C<report_id_complete()> or a
task was aborted, the $ent->{response_packet} is guaranteed to be undef.
If it is necessary to distinguish between the last two cases, the
$ent->{status} may be examined for a string 'ABORTING' or 'FINISHED'.

The code reference will be called with one argument, the C<$ent> object.
Comment 15 Justin Mason 2007-09-27 10:09:32 UTC
btw, I'd be happy to see this applied in 3.2.4, if Mark feels like it (I don't
know if he wants to or not ;). +1
Comment 16 Giampaolo Tomassoni 2007-09-27 10:14:04 UTC
I don't know if I'm entitled to vote. In case I am, I would vote +1 for this 
in 3.2.4.
Comment 17 Mark Martinec 2007-09-27 17:51:13 UTC
(In reply to comment #15)
> btw, I'd be happy to see this applied in 3.2.4, if Mark feels like it
> (I don't know if he wants to or not ;). +1

I'm running the 3.2.3 with the patch now for the last couple of hours
(downgraded from HEAD) and observing a scattergram of elapsed times
for a SA scan with the rbl_timeout setting at a default, and at 5.5 s
(my usual setting). There is a significant improvement in typical
elapsed times with the patch, by about a factor of 2 at a 5.5 s timeout,
and by about a factor of 4 at a default 15 s timeout setting, mostly
do to some slow-responding DNS servers. Non-responsive RBL/DNS servers
much less drastically affect wait times now, even at 15 s timeout.

This is probably the key reason why such a large patch might be acceptable
as a late minor maintenance release. It would probably take too long to
wait for 3.3.0 to settle into a shape ready for a release, let alone
for people to be persuaded to switch to it.

I suggest that few more people try it out, then we might be ready
for a vote in few days. If someone feels like cleaning the patch
further, he is welcome.
Comment 18 Yusuf Goolamabbas 2007-09-27 19:13:28 UTC
I have been running 3.2.3 + the first Big Combo patch for the past few days in a
test environment on Centos 4.5/x86-64 and have seen no issues
Comment 19 Mark Martinec 2007-10-10 11:12:46 UTC
Created attachment 4151 [details]
updated and cleaned patch

- removed unnecessary changes from a previous version of the patch
  to keep its size down;
- fix compatibility with third-party plugins (r582702, r583255);
- harden plugins URIDNSBL and ASN to empty DNS responses (r582903);
- rename async parameter "timeout" to "timeout_initial", add "timeout_min"
  (the 1s minimum after shrinkage is too small for some 3rd party
  plugins) (r583551 backported)

I expect this version of the patch to be the last one as a candidate
for 3.2.4. Please try it out.
Comment 20 Mark Martinec 2007-10-16 17:36:18 UTC
checked in to trunk:
  AsyncLoop: replaces sharp clipping on a lower timeout value with a
  smooth quadratic transition curve from an initial to a minimal timeout
  value; updated documentation accordingly. Adjust actual timeout value
  to timer resolution (ceiling) if Time::HiRes module is not available.
  Conf+AsyncLoop: to a configuration variable rbl_timeout add two optional
  parameters: a minimal timeout value, and a zone name if a timeout
  specification should only apply to the specified zone or its subdomains;
  update documentation accordingly.
Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
Sending        lib/Mail/SpamAssassin/Conf.pm
Sending        lib/Mail/SpamAssassin/Dns.pm
Sending        lib/Mail/SpamAssassin/Plugin/ASN.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Committed revision 585292.

I'd like to backport at least the first half of this diff
to the patch attached on this PR, hopefully on its way to 3.2.4.

Would anybody mind me backporting this complete diff,
which also extends the syntax of a rbl_timeout setting?
(an alternative would be to provide a separate syntax for
per-zone settings, which could be useful for example to
disable all lookups on some zone in one go).

Example use (made possible by the r585292)
(a made-up example, not necessarily useful):
  rbl_timeout 5   2
  rbl_timeout 7.7 5  open-whois.org
  rbl_timeout 4      121.combined.njabl.org
  rbl_timeout 2   2  uribl.com

The updated man page on Mail::SpamAssassin::Conf now says:

  rbl_timeout  t  [t_min]  [zone]         (default: 15 3)

All DNS queries are made at the beginning of a check and we try to
read the results at the end.  This value specifies the maximum
period of time (in seconds) to wait for an DNS query.  If most of
the DNS queries have succeeded for a particular message, then Spa-
mAssassin will not wait for the full period to avoid wasting time
on unresponsive server(s), but will shrink the timeout according to
a percentage of queries already completed.  As the number of
queries remaining approaches 0, the timeout value will gradually
approach a t_min value, which is an optional second parameter and
defaults to 0.2 * t.  If t is smaller than t_min, the initial time-
out is set to t_min.  Here is a chart of queries remaining versus
the timeout in seconds, for the default 15 second / 3 second time-
out setting:

  queries left  100%  90%  80%  70%  60%  50%  40%  30%  20%  10%   0%
  timeout        15   14.9 14.5 13.9 13.1 12.0 10.7  9.1  7.3  5.3  3

For example, if 20 queries are made at the beginning of a message
check and 16 queries have returned (leaving 20%), the remaining 4
queries should finish within 7.3 seconds since their query started
or they will be timed out.  Note that timed out queries are only
aborted when there is nothing else left for SpamAssassin to do -
long evaluation of other rules may grant queries additional time.

If a parameter 'zone' is specified (it must end with a letter,
which distinguishes it from other numeric parametrs), then the set-
ting only applies to DNS queries against the specified DNS domain
(host, domain or RBL (sub)zone).  Matching is case-insensitive, the
actual domain may be a subdomain of the specified zone.

Comment 21 Mark Martinec 2007-10-17 08:24:07 UTC
Created attachment 4163 [details]
replacement patch - with partial backport of r585292

I took the nonproblematic change from r585292 and merged
it with previously proposed patch to 3.2.4, but leaving out
the possibly disputable extension of a 'rbl_timeout' syntax
(to allow timeouts to be specified per-zone), and I'll
present that as a separate small patch.
Comment 22 Mark Martinec 2007-10-17 08:30:41 UTC
Created attachment 4164 [details]
extend the rbl_timeout syntax to allow per-zone timeout specification

This small patch is to be applied on top of the
big combo patch, and extends the syntax of 'rbl_timeout'
from: rbl_timeout t [t_min]
to:   rbl_timeout t [t_min] [zone]

This may or may not be included with 3.2.4.
As the patch is small, it may be left up to each administrator
to download it and apply to 3.2.4, if he needs the functionality.
Comment 23 Mark Martinec 2007-10-17 08:40:28 UTC
Perhaps it is now the time to decide what to do with this patch for 3.2.4.

The next-to-last version was tested by several sites and I received favourable
feedback (including the compatibility problem report with third party plugins
which has by now been fixed). The size of the diff may be controversial,
but the gain is substantial, so I'm inclined to let it go into 3.2.4, as
it seems it will take months before 3.3.0 hits the streets.

- please vote on committing the combo patch to 3.2.4

- please say if you'd want to see the small new-functionality patch
  included too or not  (the per-zone rbl_timeout settings)
Comment 24 Justin Mason 2007-10-17 08:50:20 UTC
as long as the syntax of existing "rbl_timeout" lines still works ok (which
afaics it does), I'm fine with it.  +1 on both.

I'd say 3.3.0 is a few months off at least, for sure...
Comment 25 Doc Schneider 2007-10-17 08:56:10 UTC
(In reply to comment #24)
> as long as the syntax of existing "rbl_timeout" lines still works ok (which
> afaics it does), I'm fine with it.  +1 on both.
> 
> I'd say 3.3.0 is a few months off at least, for sure...

Ditto for me +1 on both
Comment 26 Mark Martinec 2007-10-17 10:17:42 UTC
spamassassin-3.2$ svn -m 'Bug 5589: Refined async events handling
                          and DNS lookup completions' ci
Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
Sending        lib/Mail/SpamAssassin/Conf.pm
Sending        lib/Mail/SpamAssassin/Dns.pm
Sending        lib/Mail/SpamAssassin/DnsResolver.pm
Sending        lib/Mail/SpamAssassin/Logger.pm
Sending        lib/Mail/SpamAssassin/Plugin/ASN.pm
Sending        lib/Mail/SpamAssassin/Plugin/Check.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Transmitting file data .........
Committed revision 585580.