5930 – Shortcircuiting before DNS lookups

Bug 5930 - Shortcircuiting before DNS lookups

Summary: Shortcircuiting before DNS lookups

Status:	RESOLVED FIXED

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	Plugins (show other bugs)
Version:	SVN Trunk (Latest Devel Version)
Hardware:	Other All

Importance:	P4 enhancement
Target Milestone:	Undefined
Assignee:	SpamAssassin Developer Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2008-06-23 08:50 UTC by Henrik Krohns
Modified:	2019-03-02 17:44 UTC (History)
CC List:	3 users (show)

Attachment	Type	Actions	Submitter/CLA Status
Hacky test patch	patch	None	Henrik Krohns
Patch to run rbls after priority -10000	patch	None	Henrik Krohns
Patch to launch uribl lookups only after check_uridnsbl is actually called	patch	None	Henrik Krohns
Show Obsolete (1) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Henrik Krohns 2008-06-23 08:50:53 UTC

Created attachment 4342 [details]
Hacky test patch

Shortcircuiting ALL_TRUSTED, USER_IN_WHITELIST etc could use tiny improvement. Currently DNS queries are sent and message is decoded before these simple rules are evaluated, wasting bandwidth and CPU.

I made a quick hack for myself. I set these rules at priority -10000 and run them manually before run_rbl_eval_tests.

I couldn't come up with any elegant solution for general use. Perhaps we could allocate some priority-range that are run before DNS? But that would mean users could accidently prioritize rules that need decoded message, and decoding  should be done only after DNS launching. Ideas?

Comment 1 Justin Mason 2008-06-23 09:31:15 UTC

(In reply to comment #0)
> I couldn't come up with any elegant solution for general use. Perhaps we could
> allocate some priority-range that are run before DNS? 

That would work for me, I think.

> But that would mean users
> could accidently prioritize rules that need decoded message, and decoding 
> should be done only after DNS launching. Ideas?

If message decoding is "lazily evaluated" -- which it is -- then that wouldn't really matter...

Comment 2 Michael Parker 2008-06-23 09:39:05 UTC

You could create a new Check plugin, subclass really, and override the check_main to do what you want.

Comment 3 AXB 2014-09-22 11:50:38 UTC

while testing Henrik's TLD patches with blacklist_from, etc, I noticed that net lookups were still happening - not really necessary.
Could we reactivate this "discussion"

I see quite a bit of potential to save many cycles and network traffic if we could implement this idea.

Comment 4 Henrik Krohns 2014-09-22 11:55:27 UTC

Created attachment 5239 [details]
Patch to run rbls after priority -10000

Comment 5 Henrik Krohns 2014-09-22 11:56:29 UTC

Created attachment 5240 [details]
Patch to launch uribl lookups only after check_uridnsbl is actually called

Comment 6 Henrik Krohns 2014-09-22 11:58:13 UTC

Here you go, I've been running these patches for years without hiccups.

Then you just need something like:

priority USER_IN_BLACKLIST -10050

Anything <= -10000 is shortcircuited before lookups are launched.

Comment 7 Henrik Krohns 2014-09-22 11:59:15 UTC

For quick reference:

===================================================================
--- Check.pm    (revision 1626743)
+++ Check.pm    (working copy)
@@ -77,10 +77,8 @@
   # rbl calls.
   $pms->extract_message_metadata();

-  # Here, we launch all the DNS RBL queries and let them run while we
-  # inspect the message
-  $self->run_rbl_eval_tests($pms);
   my $needs_dnsbl_harvest_p = 1; # harvest needs to be run
+  my $rbls_running = 0;

   my $decoded = $pms->get_decoded_stripped_body_text_array();
   my $bodytext = $pms->get_decoded_body_text_array();
@@ -108,6 +106,13 @@
       last;
     }

+    # Here, we launch all the DNS RBL queries and let them run while we
+    # inspect the message
+    if (!$rbls_running && $priority > -10000) {
+      $rbls_running = 1;
+      $self->run_rbl_eval_tests($pms);
+    }
+
     my $timer = $self->{main}->time_method("tests_pri_".$priority);
     dbg("check: running tests for priority: $priority");


--- URIDNSBL.pm (revision 1626743)
+++ URIDNSBL.pm (working copy)
@@ -327,18 +327,21 @@
   return $self;
 }

-# this is just a placeholder; in fact the results are dealt with later
-sub check_uridnsbl {
-  return 0;
-}
-
 # ---------------------------------------------------------------------------

 # once the metadata is parsed, we can access the URI list.  So start off
 # the lookups here!
-sub parsed_metadata {
-  my ($self, $opts) = @_;
-  my $pms = $opts->{permsgstatus};
+
+#
+# only parse and run after first check_uridnsbl call
+#
+sub check_uridnsbl {
+  my ($self, $pms, @args) = @_;
+
+  # only parse once
+  return 0  if $pms->{uridnsbl_done};
+  $pms->{uridnsbl_done} = 1;
+
   my $conf = $pms->{conf};

   return 0  if $conf->{skip_uribl_checks};
@@ -494,7 +497,7 @@
   # and query
   $self->query_hosts_or_domains($pms, \%hostlist);

-  return 1;
+  return 0;
 }

 # Accepts argument in one of the following forms: m, n1-n2, or n/m,

Comment 8 Kevin A. McGrail 2014-09-22 11:59:54 UTC

I don't use short-circuiting nor does the priority really have good documentation at this time as shown by discussion with Philip Prindeville re: bug 7060.

What's the path forward you recommend?

Comment 9 Kevin A. McGrail 2014-09-22 12:01:24 UTC

Following my own comment: with years of testing sounds like we get it committed, get it documented and get a few rules that use it.  If you can do that, it'll make 3.4.1rc1 for sure.

Comment 10 Henrik Krohns 2014-09-22 12:07:00 UTC

Given these defaults already exist, we could just come up with arbitrary priority like -100 after which RBLs launch..

60_shortcircuit.cf:priority USER_IN_WHITELIST     -1000
60_shortcircuit.cf:priority USER_IN_DEF_WHITELIST -1000
60_shortcircuit.cf:priority USER_IN_ALL_SPAM_TO   -1000
60_shortcircuit.cf:priority SUBJECT_IN_WHITELIST  -1000
60_shortcircuit.cf:priority ALL_TRUSTED            -950
60_shortcircuit.cf:priority SUBJECT_IN_BLACKLIST   -900
60_shortcircuit.cf:priority USER_IN_BLACKLIST_TO   -900
60_shortcircuit.cf:priority USER_IN_BLACKLIST      -900
60_shortcircuit.cf:priority BAYES_99               -400

Comment 11 RW 2014-09-22 13:24:19 UTC

The gratuitous DNS requests are not necessarily wasted because they bring the results into cache. A relatively small number of slow lookups taken off the critical path would make this a win in terms of throughput.

Comment 12 AXB 2014-09-22 13:40:36 UTC

depending on how many thousand q/s you make it can make a massive difference.

the concept of shortcircuiting is to avoid any other rule apply a score.

if network lookups are NOT bypassed, network hits with high custom scores or metas with lookup rules could "break" for example, a whitelist_from_rcvd.

Comment 13 Henrik Krohns 2018-09-24 06:07:20 UTC

- Modified Check.pm to run run_rbl_eval_tests at priority -100
- Adjusted priorities of DNS launching rules to -100 (DKIM_* SPF_* etc)
- Fixed URIDNSBL.pm to launch only after first eval run (and all uri* rules prio -100)
- Adjusted DCC/RAZOR2/PYZOR to priority 10, since they don't run async. Atleast normal rules are run before they possibly timeout or something.
- Added t/shortcircuit_before_dns.t to check everything works as designed
- Tried to document in few places

Sending        UPGRADE
Sending        lib/Mail/SpamAssassin/Plugin/Check.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        rules/23_bayes.cf
Sending        rules/25_dcc.cf
Sending        rules/25_dkim.cf
Sending        rules/25_pyzor.cf
Sending        rules/25_razor2.cf
Sending        rules/25_spf.cf
Sending        rules/25_uribl.cf
Sending        rules/60_shortcircuit.cf
Sending        rules/local.cf
Sending        rulesrc/sandbox/davej/20_dkimwl.cf
Sending        rulesrc/sandbox/jm/20_dob.cf
Sending        t/dnsbl_sc_meta.t
Adding         t/shortcircuit_before_dns.t
Transmitting file data ................done
Committing transaction...
Committed revision 1841798.

Comment 14 Henrik Krohns 2018-10-05 12:10:47 UTC

Removed all "priority XXX -100" clauses, not needed anymore. Check.pm and bgsend_and_start_lookup work together now, queueing any lookups made before priority -100 and launching them then.

Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
Sending        lib/Mail/SpamAssassin/Plugin/Check.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        rules/20_dnsbl_tests.cf
Sending        rules/25_dkim.cf
Sending        rules/25_spf.cf
Sending        rules/25_uribl.cf
Sending        rules/60_whitelist_dkim.cf
Sending        rulesrc/sandbox/davej/20_dkimwl.cf
Sending        rulesrc/sandbox/jm/20_dob.cf
Transmitting file data ..........done
Committing transaction...
Committed revision 1842899.

Comment 15 RW 2018-10-17 14:18:48 UTC

I think it would be nice to make some of these priorities dependent on whether  the shortcircuit plugin is actually loaded.

Comment 16 Henrik Krohns 2018-10-17 14:22:13 UTC

(In reply to RW from comment #15)
> I think it would be nice to make some of these priorities dependent on
> whether  the shortcircuit plugin is actually loaded.

Do you have any example why you think this would be useful?

Comment 17 RW 2018-10-18 22:18:08 UTC

SPF, DKIM and Bayes have been made to run before blocklist requests are sent at priority -100, which means that they always add to the duration of the scan. When they run after priority -100, a slow DNS response or timeout can take them off the critical path.

Comment 18 Henrik Krohns 2018-10-19 05:24:50 UTC

(In reply to RW from comment #17)
> SPF, DKIM and Bayes have been made to run before blocklist requests are sent
> at priority -100, which means that they always add to the duration of the
> scan. When they run after priority -100, a slow DNS response or timeout can
> take them off the critical path.

SPF and DKIM _are_ async "blocklist requests". DNS and any other async network requests should be launched at -100.

But I see your point about Bayes. I think it should be started at around say -90, after async requests are launched. But before 0 so it has a chance to run before normal rules timing out. TODO: Make all those SQL/Redis calls async..

Comment 19 Henrik Krohns 2018-10-19 08:46:30 UTC

(In reply to Henrik Krohns from comment #18)
>
> SPF and DKIM _are_ async "blocklist requests".

... so they are not async after all?

I see the deprecated mess that is Mail::SPF is already known (bug 7261). Looks like there is no way to make those lookups async.

For Mail::DKIM some async possibility is mentioned in docs.. one more thing to look into *groan*.

Comment 20 Henrik Krohns 2019-03-02 17:44:48 UTC

(In reply to Henrik Krohns from comment #19)
> (In reply to Henrik Krohns from comment #18)
> >
> > SPF and DKIM _are_ async "blocklist requests".
> 
> ... so they are not async after all?
> 
> I see the deprecated mess that is Mail::SPF is already known (bug 7261).
> Looks like there is no way to make those lookups async.
> 
> For Mail::DKIM some async possibility is mentioned in docs.. one more thing
> to look into *groan*.

Made new bugs for looking into DKIM and SPF:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7695
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7696

Resolving this bug since shortcircuit stuff seems to work fine now.