Bug 7987 - DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready
Summary: DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 4.0.0
Hardware: All All
: P2 blocker
Target Milestone: 4.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on: 7735
Blocks:
  Show dependency tree
 
Reported: 2022-05-07 17:23 UTC by Michael Storz
Modified: 2022-05-30 08:59 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
deletion of rule_pending, rule_ready patch None Michael Storz [NoCLA]
Eval functions returning undef for async patch None Henrik Krohns [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Storz 2022-05-07 17:23:40 UTC
Rules that perform asynchronous DNS lookups via bgsend_and_start_lookup do not need to be manually flagged via rule_pending and rule_read, as everything is done automatically via additions and deletions to pending_rules. The end game, the call to finish_meta_tests, is prepared by the call to harvest_dnsbl_queries, which in turn calls abort_remaining_lookups. Therefore the calls of rule_pending and rule_ready can and should be deleted.

If rule_pending/rule_ready would be needed, then all registered eval functions of DNSEval.pm would need these calls, which is at the moment not the case.
Comment 1 Michael Storz 2022-05-07 17:25:31 UTC
Created attachment 5779 [details]
deletion of rule_pending, rule_ready
Comment 2 Henrik Krohns 2022-05-07 18:33:58 UTC
I'll look into this again, but note that I spent considerable effort and testing to make sure things worked when I implemented these meta changes. Not everything is always as it seems at first glance and reading the sparse documentation I made might not reveal all the intricacies. So great care should be taken to remove something, especially if it just for "cosmetic reasons", if they really are redundant calls, they shouldn't have ill effects..
Comment 3 Michael Storz 2022-05-07 20:45:49 UTC
Well, it took me more than a day of carefully reviewing the call diagram and implementation of bgsend_and_start_lookup to come to this conclusion. My first thought was that maybe some tests are not working properly and therefore overlooked that some calls to rule_pending are missing. Only afterwards did I realize that it was the other way around.

We both know how complicated all the rule processing is. Therefore, from my point of view, it is extremely important to remove everything superfluous so that we have a chance to understand how the different algorithms work. This has nothing to do with cosmetic details, but is essential.

Next, I'm going to look at asynchronous tag processing. After your second change to FromNameSpoof.pm, I'm pretty sure there's a general processing error here. But I'll have to check that out in more detail.

The use of the rule_pending and rule_ready routines you introduced should be clearly described and also implemented accordingly. From the current description I understand that the routines are to be used in principle with all asynchronous eval functions, so that the evaluation of the meta routines works. I am of the opinion that this description is wrong and does not correspond to your implementation. 

If the processing of the asynchronous calls, i.e. the queueing, polling and dequeueing is done centrally, then the handling must be done automatically. The end handling should then also be done centrally. This applies to both the central mechanism via bgsend_and_start_lookup and via action_depends_on_tags. For bgsend_and_start_lookup, I'm pretty sure that's how you implemented it. And that's a big step forward.

rule_pending/rule_ready should only become necessary if the processing is decentralized and the periodic queries are each triggered via check_tick. This is the case with DCC.pm, Pyzor.pm and Razor2.pm. Here it also makes sense to have the end handling done by the respective plugin and triggered by check_cleanup. And also here, I think, you have implemented it exactly like that.
Comment 4 Henrik Krohns 2022-05-08 04:43:19 UTC
I mostly agree with everything, great to have extra eyeballs. Can you comment if my previous list comment and Revision 1900667 additions cleared any things up for you and changes anything?

As a developer I can have hard time figuring out what to document about things, so they are understandable for any third party plugin developers. I even think SA originally went too far to "pluginize" every little detail, with it creating it's own problems.. there's just way too many public hooks and tidbits to understand..
Comment 5 Kevin A. McGrail 2022-05-08 05:29:21 UTC
(In reply to Henrik Krohns from comment #4)
> I
> even think SA originally went too far to "pluginize" every little detail,
> with it creating it's own problems.. there's just way too many public hooks
> and tidbits to understand..

Very fair comment.  The idea was good though because it gave us different standards for code quality: Core code > Plugin Enabled by Default > Plugin not Enabled by Default

And it gave the ability to create proprietary plugins which was good because an important part of the ASF is the pro-business ASLv2 with no copyleft licensing or common clause concepts.

Plus, I think sometimes when you are writing code, you think, this will only get used for a year.  Not 20 and we're getting close to that.

And there are definitely things I've wanted to fix for eons. I remember after Vixie created RBLs and we implemented them, I made a statement like DNS is a brilliant hack but we need to replace using DNS in a year, lol.  DNS was distributed and resilient but you've seen all the blocking code in trying to handle lookups on DNS and timeouts.  Spa.ghet.ti. And don't get me started on Net::DNS.

Anyway, just thought it might be good to have some of the thought process from the olden days.
Comment 6 Henrik Krohns 2022-05-08 08:00:19 UTC
Added mention of bgsend to rule_pending() docs, Revision 1900680

And then I found some bugs in the logic..

For example when URIDNSBL is launching and finishing a rule fast, and only after that eval check_uridnsbl is called, it marks the rule again as rule_pending(), and the rule will end up considered unrun.. duh..
Comment 7 Henrik Krohns 2022-05-10 15:24:47 UTC
Setting as blocker for 4.0.0.

There's some remaining bugs, especially the way check_rbl_sub is handled, which makes some rules considered unrun occasionally.

Already made some progress, but I really need a small break and a breather, will work on in the coming days..
Comment 8 Henrik Krohns 2022-05-13 06:07:45 UTC
Should be a bit better now.

- fix body rules considered unrun when using sa-compile
- fix check_rbl_sub rules considered unrun and other DNSEval cleanups
- improve rule_pending/rule_ready/got_hit() logic
- rename $pms->get_pending_lookups to get_async_pending_rules
- other minor async cleanups
- test and documentation improvements

Committed revision 1900849.
Comment 9 Henrik Krohns 2022-05-13 06:37:20 UTC
Feel free to discuss here or on dev-list if something is still unclear.
Comment 10 Henrik Krohns 2022-05-15 19:26:16 UTC
Created attachment 5784 [details]
Eval functions returning undef for async

As discussed on list, here's a patch to change all plugins async eval functions to return undef.

Care needs to be taken to check all bgsend* return values, so there is no unnecessary async marking.
Comment 11 Henrik Krohns 2022-05-15 19:26:32 UTC
Reopening for review
Comment 12 Henrik Krohns 2022-05-15 19:45:43 UTC
Of course we can vote if got_miss() etc would be better name for rule_ready(). And documentation needs some more polishing.
Comment 13 Henrik Krohns 2022-05-15 19:52:13 UTC
What comes to: "a better name for tests_already_hit would be tests_already_finished or tests_finished"

.. I would say renaming ancient stuff for cosmetic purposes only is always a bit sus. But it is more or less internal variable, atleast Google doesn't find any third part plugins using it ..
Comment 14 Michael Storz 2022-05-15 20:09:37 UTC
I remember someone renamed get_pending_lookups to get_async_pending_rules, which is the name I would have chosen too ;-)
Comment 15 Henrik Krohns 2022-05-15 20:13:37 UTC
(In reply to Henrik Krohns from comment #12)
> Of course we can vote if got_miss() etc would be better name for
> rule_ready(). And documentation needs some more polishing.

What I don't like about got_miss(), is that it doesn't necessarily mark a "miss". It's possible to call got_hit() after it, especially in multiple-lookups-for-one-rule scenario, where it would be complex for the plugin itself to track all lookups. Much simpler to call rule_ready on every callback. Though xurrent logic is not perfect either.. if the last lookup timeouts, it ends up as unrun rule.
Comment 16 Henrik Krohns 2022-05-15 20:21:36 UTC
(In reply to Michael Storz from comment #14)
> I remember someone renamed get_pending_lookups to get_async_pending_rules,
> which is the name I would have chosen too ;-)

Well that function never even existed outside trunk.. but I forgot alias for it just in case..
Comment 17 Michael Storz 2022-05-15 20:33:51 UTC
(In reply to Henrik Krohns from comment #15)
> (In reply to Henrik Krohns from comment #12)
> > Of course we can vote if got_miss() etc would be better name for
> > rule_ready(). And documentation needs some more polishing.
> 
> What I don't like about got_miss(), is that it doesn't necessarily mark a
> "miss". It's possible to call got_hit() after it, especially in
> multiple-lookups-for-one-rule scenario, where it would be complex for the
> plugin itself to track all lookups. Much simpler to call rule_ready on every
> callback. Though xurrent logic is not perfect either.. if the last lookup
> timeouts, it ends up as unrun rule.

I think it is the responsibility of the plugin to decide if a rule is a hit or a miss. It would be really bad, if it called got_miss first, which means I'm done and it is a miss, and then later to decide to call got_hit. If that could happen the logic of the eval rule is too complex and should be changed.

However, if you think rule_ready is a tool which could support the decision making of the rule, then it is ok. I haven't thought much about that scenario.
Comment 18 Henrik Krohns 2022-05-16 04:25:58 UTC
(In reply to Michael Storz from comment #17)
> If that
> could happen the logic of the eval rule is too complex and should be changed.

A DNSBL can do this:
- It parses for example 4 IP addresses from Received
- For each of the IP it starts a separate bgsend_and_start_lookup, expecting 4 separate callbacks

Obviously if got_hit() is called, we know rule is ready regardless of any pending lookups.

What things should the callback do if there is no hit?

Current logic:

sub callback {
  rule_ready(thisrule)
  foreach (answer) {
    if (match) got_hit(thisrule)
  }
}

Using got_miss() would probably be:

sub callback {
  foreach (answer) {
    if (match) got_hit(thisrule)
  }
  if (!tests_already_hit(thisrule) && !get_async_pending_rules(thisrule))
    got_miss(thisrule)
}

I guess we could brind back the deprecated Dns/is_rule_complete function, which can do the checks. So would this be the most elegant flow?

sub callback {
  foreach (answer) {
    if (match) got_hit(thisrule)
  }
  if (is_rule_complete(thisrule)) got_miss(thisrule)
}
Comment 19 Henrik Krohns 2022-05-16 04:32:32 UTC
(In reply to Henrik Krohns from comment #18)
>
> I guess we could brind back the deprecated Dns/is_rule_complete function,
> which can do the checks. So would this be the most elegant flow?
> 
> sub callback {
>   foreach (answer) {
>     if (match) got_hit(thisrule)
>   }
>   if (is_rule_complete(thisrule)) got_miss(thisrule)
> }

Even still, that is confusing naming to me, since is_rule_complete() or got_miss() must ignore any previous got_hit(). You can't be sure that all developers make 100% correct logic in plugins. In that sense rule_ready() atleast doesn't convey anything about rule hitting or missing. Rule missing should be assumed as a default.
Comment 20 Michael Storz 2022-05-16 14:21:32 UTC
(In reply to Henrik Krohns from comment #18)
> (In reply to Michael Storz from comment #17)
> > If that
> > could happen the logic of the eval rule is too complex and should be changed.
> 
> A DNSBL can do this:
> - It parses for example 4 IP addresses from Received
> - For each of the IP it starts a separate bgsend_and_start_lookup, expecting
> 4 separate callbacks
> 
> Obviously if got_hit() is called, we know rule is ready regardless of any
> pending lookups.
> 
> What things should the callback do if there is no hit?
> 
> Current logic:
> 
> sub callback {
>   rule_ready(thisrule)
>   foreach (answer) {
>     if (match) got_hit(thisrule)
>   }
> }
> 
> Using got_miss() would probably be:
> 
> sub callback {
>   foreach (answer) {
>     if (match) got_hit(thisrule)
>   }
>   if (!tests_already_hit(thisrule) && !get_async_pending_rules(thisrule))
>     got_miss(thisrule)
> }
> 
> I guess we could brind back the deprecated Dns/is_rule_complete function,
> which can do the checks. So would this be the most elegant flow?
> 
> sub callback {
>   foreach (answer) {
>     if (match) got_hit(thisrule)
>   }
>   if (is_rule_complete(thisrule)) got_miss(thisrule)
> }

short executive answer for Henrik: 

Since the decision logic of how to process the different asynchronous calls is completely contained in the plugin and therefore cannot be detected from the outside, I don't know how a support system could look like. For this we probably need first several examples from which we could take an idea for the realization.

long version:

If we take the standard case of one rule one result, then we can represent the decision logic for your example of a DNSBL with 4 lookups l1 to l4 like this:

hit = l1 || l2 || l3 || l4
miss = !hit = !(l1 || l2 || l3 || l4) = !l1 && !l2 && !l3 && !l4

With short-circuit evaluation we can evaluate a hit already with the result of one lookup hit. But for a miss we need the evaluation of all lookups.

The logic could be the other way around in another case:

hit = l1 && l2 && l3 && l4

Now we would need all lookup results for the evaluation of a hit and only one for a miss.

In principle, the logic can take any form of a boolean expression:

hit = (l1 && l2) || (l3 && l4)

Here, too, there are combinations where we can only decide whether it is a hit or a miss after evaluating all lookups.

I.e. logically we always need three levels

- eval function
- asynchronous lookups
- decision function

i.e. the callbacks of the lookups must always call a decision function, which keeps track of the status of all lookups and makes a decision about hit or miss based on this data.

Besides the standard case, there are also the cases with one rule many results. In the extreme case, there could also be four hit/miss results associated with the four lookups, i.e. the lookups are completely independent of each other.

Which logic is implemented in the plugin cannot be determined from the outside. At the moment I don't know how to support the implementation of the logic. For this we need first of all examples from which we can then perhaps derive something.

I would therefore implement nothing for the support of the decision logic for the time being.
Comment 21 Henrik Krohns 2022-05-16 14:53:17 UTC
(In reply to Michael Storz from comment #20)
>
> I would therefore implement nothing for the support of the decision logic
> for the time being.

Then I'll just make a decision and go with rule_ready() in it's current state. You'll see it in use for the whole 4.0 lifetime, which can be long. All fine to me.
Comment 22 Henrik Krohns 2022-05-16 15:53:06 UTC
(In reply to Henrik Krohns from comment #10)
> Created attachment 5784 [details]
> Eval functions returning undef for async
> 
> As discussed on list, here's a patch to change all plugins async eval
> functions to return undef.
> 
> Care needs to be taken to check all bgsend* return values, so there is no
> unnecessary async marking.

Committed revision 1900961.

Need to clean ups some docs later. There's also some suspicious unrun rules/metas popping up randomly, still investigating if there's something to fix.
Comment 23 Michael Storz 2022-05-16 16:54:59 UTC
Great work! Are you now ready to switch from the brute force algorithm to a deterministic algorithm? I am not sure which of the two algorithms would be faster. That needs to be tested.
Comment 24 Henrik Krohns 2022-05-16 17:28:03 UTC
(In reply to Michael Storz from comment #23)
> Great work! Are you now ready to switch from the brute force algorithm to a
> deterministic algorithm? I am not sure which of the two algorithms would be
> faster. That needs to be tested.

It should not matter if I'm ready or not. If you have something, then share it for everyone. :-)
Comment 25 Henrik Krohns 2022-05-17 04:03:33 UTC
Looking at my old plugin and popular things like SH.pm, it's clear the latest change will heavily break backwards compatibility.

So here's a small fix that mostly mitigates things:

- Use rule_ready() in run_eval_tests to allow async even for "return 0"
- Bring back async pending check in do_meta_tests

Committed revision 1900974.
Comment 26 Henrik Krohns 2022-05-17 07:53:08 UTC
(In reply to Henrik Krohns from comment #25)
> - Bring back async pending check in do_meta_tests

My brain is starting to be exhausted. It's not really needed.

Committed revision 1900984.
Comment 27 Michael Storz 2022-05-17 15:09:27 UTC
(In reply to Henrik Krohns from comment #24)
> (In reply to Michael Storz from comment #23)
> > Great work! Are you now ready to switch from the brute force algorithm to a
> > deterministic algorithm? I am not sure which of the two algorithms would be
> > faster. That needs to be tested.
> 
> It should not matter if I'm ready or not. If you have something, then share
> it for everyone. :-)

Ok, let's see if the change leads to better performance or not. The idea is really simple, no big surprise, no fancy algorithm :-)

Each meta rule needs a counter with the number of dependent rules. For each rule, an array is created with all meta rules that depend on that rule. When a rule has finished, got_hit/got_mis/rule_ready, it decrements the counter of each dependent meta rule. If the counter is 0, then the meta rule is moved from meta_pending to meta_ready queue. The meta_ready queue is then processed by do_meta_tests.

Alternative: Instead of moving the meta rule, it is executed immediately. The do_meta_tests subroutine then becomes redundant (what to do with the reuse part then, I have no idea).

In the end, the leftover meta-rules must then be handled with finish_meta_tests.

A few code fragments for clarification:

sub compile_meta_rules

  foreach my $name (keys %{$conf->{tests}}) {
    ...
    $conf->{meta_dep_count}->{$name} = scalar @{$rule_deps{$name}};
    foreach my $rulename (@{$rule_deps{$name}}) {
      push @{$conf->{rule_dependencies}->{$rulename}}, $name;
    }
  }

---------------------------------------

got_hit/got_miss/rule_ready

foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
  move_meta_p2r($meta_rule) unless --$conf->{meta_dep_count}->{$meta_rule};
}

foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
  run_meta_rule($meta_rule) unless --$conf->{meta_dep_count}->{$meta_rule};
}

run_meta_rule like do_meta_tests + delete $pms->{meta_pending}->{$meta_rule}

sub do_meta_tests {
  my ($self, $pms, $priority) = @_;
  ...
  return if $self->{am_compiling}; # nothing to compile here

  my $mr = $pms->{meta_ready};
  my $mt = $pms->{conf}->{meta_tests};
  my $h = $pms->{tests_already_hit};

  while (my $rulename = shift @$mr) {
    # Metasubs look like ($_[1]->{$rulename}||($_[2]->{$rulename}?1:0)) ...
    my $result = $mt->{$rulename}->($pms, $h, {});
    if ($result) {
      dbg("rules: ran meta rule $rulename ======> got hit ($result)");
      $pms->got_hit($rulename, '', ruletype => 'meta', value => $result);
    } else {
      dbg("rules-all: ran meta rule $rulename, no hit") if $would_log_rules_all;
      $pms->got_miss($rulename); # mark meta done
    }
  }
}
Comment 28 Henrik Krohns 2022-05-19 09:53:42 UTC
A quick cleanup to tidy things:

- Use rule_ready() everywhere instead of direct tests_already_hit modify
- Simple tracking of meta dependency hits, run do_meta_tests only when needed
- Do not run do_meta_tests on last priority, as finish_meta_tests will run anyway

Committed revision 1901060.

It already reduces unnecessary do_meta_tests calls a lot.

Before:

do_meta_tests -1000 ready 2
do_meta_tests -950 ready 0
do_meta_tests -900 ready 0
do_meta_tests -100 ready 5
do_meta_tests -90 ready 0
do_meta_tests 0 ready 1234
do_meta_tests 500 ready 0

After:

do_meta_tests -950 ready 2
do_meta_tests -100 ready 5
do_meta_tests 0 ready 1234

Though in the grand total runtimes, it makes very little difference. It's hard to optimize things, as most metas will end up ready at priority 0 anyway.

Will look if there's further to improve, but I'm quite sceptical about the --$conf->{meta_dep_count} stuff, as it's really hard to guarantee that someone doesn't call rule_ready() multiple times, obviously then the counts will end up wrong and metas could be run prematurely. It's possible to track readiness perfectly with a hash from which you delete dependencies as they occur. I already tried something like that, but it gets so complex that it actually increases runtimes.
Comment 29 Henrik Krohns 2022-05-27 04:27:08 UTC
No objections to the combo of "return undef" for eval and $pms->rule_ready() usage have been seen.

Any bugs or theoretical performance improvements can be discussed here or in a new bug, but the current state "works as intended" and should not hold up 4.0.0, so I'm closing this.
Comment 30 Henrik Krohns 2022-05-30 08:59:13 UTC
- Fix eval functions returning unintended "undef"

Committed revision 1901403.