Bug 6728 - DNSBLs need a way to turn off queries based on BLOCKED rules triggering
Summary: DNSBLs need a way to turn off queries based on BLOCKED rules triggering
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: PC Windows 7
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-15 12:56 UTC by Kevin A. McGrail
Modified: 2018-09-28 07:21 UTC (History)
7 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin A. McGrail 2011-12-15 12:56:34 UTC
Need a way to make spamd recognize that a BLOCKED rule such as URIBL_BLOCKED is triggered and hold off on subsequent DNSBL queries for 1 hour.

The 1 hour part should be configurable.

Subsequent delays will need a multiplier that is configurable and will be set at 1.1 for initial tests. 

This means:

Query returns answer indicating IP is blocked for DNS queries.
BLOCKED rule is triggered
SpamD recognizes the block and turns off queries for 1 hour for that RBL
1 hour elapses
Query returns answer indicating IP is blocked for DNS queries.
BLOCKED rule is triggered
SpamD recognizes the block and turns off queries for 1.1 hours for that RBL
...

We likely need something to define in a rules config such as:

block_disable   List of Rules separated by commas,semicolons or whitespace

block_disable  URIBL_BLACK,URIBL_GREY,URIBL_RED

Then if spamd picks up a BLOCKED rule and block_disable is defined, it can disable those rules.

However, if we need an easier hack, we can modify the description of the rule for BLOCKED to including something that is parse-able. such as adding Temporarily Blocking Queries: URIBL_BLACK,URIBL_GREY,URIBL_RED to the description.

This idea was split from bug 6724.
Comment 1 Darxus 2011-12-15 15:14:47 UTC
You're assuming that people exceeding free usage limits will always be using spamc/spamd and never the non-client/server form of spamassassin?
Comment 2 Kevin A. McGrail 2011-12-15 15:17:11 UTC
(In reply to comment #1)
> You're assuming that people exceeding free usage limits will always be using
> spamc/spamd and never the non-client/server form of spamassassin?

I am not assuming that.  I'm hitting the most likely scenario since we need to keep stateful information.  I didn't think creating lock files was going to be efficient but that was how I would do it for spamassassin sans spamd.
Comment 3 Michael Parker 2011-12-15 15:55:08 UTC
I'm -1 on this idea unless you can figure out how to make it a plugin that is disabled by default.  You're solving a problem for a VERY SMALL percentage of users and most likely introducing a performance penalty for everyone.

Persue the "BLOCKED" rule set idea and don't try to get fancy.  FPs and FNs will get examined and if you make the "BLOCKED" rule descriptions scary enough they will take action.
Comment 4 Kevin A. McGrail 2011-12-15 15:57:09 UTC
(In reply to comment #3)
> I'm -1 on this idea unless you can figure out how to make it a plugin that is
> disabled by default.  You're solving a problem for a VERY SMALL percentage of
> users and most likely introducing a performance penalty for everyone.
> 
> Persue the "BLOCKED" rule set idea and don't try to get fancy.  FPs and FNs
> will get examined and if you make the "BLOCKED" rule descriptions scary enough
> they will take action.

I intended to have it on by default but have it be either a plugin or configuration option to disable the behavior.  Is that acceptable to bypass your -1?
Comment 5 Henrik Krohns 2011-12-15 17:18:36 UTC
-1 for spamd specific
-1 for any locking whatsoever
-1 for multiplier (what's the point? 1 query per 1 hour or 2 hours makes no difference, but would need more complex state processing)

Possibly the simplest and cheapest solution would be using filenames for keeping state.

When block is hit: create file LOCAL_STATE_DIR/dnsblock.<identifier>

The identifier should come from the rbl identifier/name, in this case "dnswl": eval:check_rbl_sub('dnswl-firsttrusted', '^127\.0\.\d+\.255$')

Then rbl function can simply stat() if LOCAL_STATE_DIR/dnsblock.<identifier> exists. If mtime is > 1 hour, just unlink() the file.

My slow linux VPS benchmarks at 715000 stat calls per second testing for a set of 10 different non-existing filenames, so performance is a non-issue.

Proposed config "block_disable" would need to refer to identifier instead of a rule name.
Comment 6 Henrik Krohns 2011-12-15 17:22:17 UTC
(In reply to comment #5)
> Proposed config "block_disable" would need to refer to identifier instead of a
> rule name.

Correction.. since uribl etc doesn't have an eval "identifier", we should use the zone name as such.
Comment 7 D. Stussy 2011-12-15 23:21:55 UTC
In my opinion, "blocked" should ALSO trigger upon the affirmative receiving of an A record OUTSIDE of 127.0.0.0/8, regardless of ruleset processing.  This part of the recognition would be performed in the routine which processes DNS-list results.  "Blocked" detection for this purpose should be a boolean flag - to handle a case where more than one offending address is received, and handled after such processing.  That way, we won't accidentally bump the timeout counter more than once when retrying.

Functional DNS lists should explicitly return "0.0.0.0" (i.e. no rule is necessary to detect them).  Non-functional lists may return any [unicast] address as they may be "parked" at a registration service for sale and SA got the A record (via a DNS wildcard entry) meant for HTTP redirection to a "domain for sale" web server page.

This is in addition to the rule detection proposed in "comment 0."
Comment 8 Kevin A. McGrail 2011-12-15 23:28:17 UTC
(In reply to comment #7)
> In my opinion, "blocked" should ALSO trigger upon the affirmative receiving of
> an A record OUTSIDE of 127.0.0.0/8, regardless of ruleset processing.  This
> part of the recognition would be performed in the routine which processes
> DNS-list results.  "Blocked" detection for this purpose should be a boolean
> flag - to handle a case where more than one offending address is received, and
> handled after such processing.  That way, we won't accidentally bump the
> timeout counter more than once when retrying.

I get this part.  We should make URIBL.pm and EvalDNS.pm flag ignore responses outside of 127.0.0.1 and possibly even trigger BLOCKED.

> Functional DNS lists should explicitly return "0.0.0.0" (i.e. no rule is
> necessary to detect them).  

Here is where I get confused.  Functional lists should explicitly return 0.0.0.0 to what query?

> Non-functional lists may return any [unicast]
> address as they may be "parked" at a registration service for sale and SA got
> the A record (via a DNS wildcard entry) meant for HTTP redirection to a "domain
> for sale" web server page.

Definitely a good case to fix.  This issue has bitten us on other RBLs in the past.
Comment 9 D. Stussy 2011-12-16 00:41:23 UTC
"I get this part.  We should make URIBL.pm and EvalDNS.pm flag ignore responses
outside of 127.0.0.1[sic] and possibly even trigger BLOCKED."

YES!  Definently trigger "blocked." ;-)  Takes care of the registration problem too.

"Here is where I get confused.  Functional lists should explicitly return
0.0.0.0 to what query?"

...To "query refused"  (abuse/excessive traffic), if the proper DNS RC of refused with 0 answers is not performed.  I suggest "0.0.0.0" because it is outside of 127/8 (see processing above) and "all zeros" means no information.  Returning "127.0.0.255" as some do is irresponsible.  If they returned a code outside of 127/8, all we'd need is the range checking code above.
Comment 10 Kevin A. McGrail 2011-12-16 00:52:01 UTC
(In reply to comment #9)
> "I get this part.  We should make URIBL.pm and EvalDNS.pm flag ignore responses
> outside of 127.0.0.1[sic] and possibly even trigger BLOCKED."
> 
> YES!  Definently trigger "blocked." ;-)  Takes care of the registration problem
> too.

So you want to trigger blocked for anything outside of 127.0.0/24 or 127/8?  

 
> "Here is where I get confused.  Functional lists should explicitly return
> 0.0.0.0 to what query?"
> 
> ...To "query refused"  (abuse/excessive traffic), if the proper DNS RC of
> refused with 0 answers is not performed.  I suggest "0.0.0.0" because it is
> outside of 127/8 (see processing above) and "all zeros" means no information. 
> Returning "127.0.0.255" as some do is irresponsible.  If they returned a code
> outside of 127/8, all we'd need is the range checking code above.

We're outside of my comfort zone with the standard-tracks for DNSBL but I am unsure why 127.0.0.255 is irresponsible in DNSWL's case because they don't use bitwise logic for their list.  

Anyway, why 0.0.0.0 as opposed to an explicitly valid answer defined as a Blocked answer?  What's the benefit to the change?
Comment 11 D. Stussy 2011-12-16 01:51:18 UTC
"So you want to trigger blocked for anything outside of 127.0.0/24 or 127/8?"

Yes - to 127/8  (127.0.0.0/255.0.0.0).  RFC 5782 permits anything in 127/8, so there is no reason to restrict it to the /24.  Furthermore, there are some lists (e.g. hostkarma.junkemailfilter.com) which do return codes within the /8 but outside the /24 (for various experimental things like "does the server issue QUIT?" = 127.0.1.[0-2]).


A am against the use of "127.0.0.255" to mean "query refused due to abuse and/or excessive traffic" because it is within the valid range of 127/8, yet yields no information as to actual information answering the query.  It can be easily mistaken for a valid answer.  "Not available/go away" is not the same as "listed" nor "unlisted."  The fact that it is an answer in the valid range is the very reason why we have the FP/FN problem in the first place -- we considered it a valid answer.  As "0.0.0.0" is outside the valid range for an informational answer AND also not a valid unicast address, that's why I suggested it for a "null answer."

Another suitable value of all one's (255.255.255.255) as a refusal indicator was considered and rejected.  A single-bit error in the MSB which gets past any application layer error detection could be confused with a valid answer.

"0.0.0.0" triggering a block implies an explicit block from an active list, vs. a "random" unicast address triggering a block, implying a decommissioned list.  The software could but need not distinguish between the cases.  If it were to distinguish, then the non-all-zero address would permanently block (until manual intervention).

Although I agree with an initial one-hour delay, the TTL amount on the record in question when one is returned could also, if higher (than 3600), be the initial count for the recheck timer (when the A-RR value is outside 127/8).
Comment 12 Darxus 2011-12-18 17:00:38 UTC
> The reason for the special result code, as indicated in the posting referenced
> above, is that REFUSED rcode will result in triple the amount of queries in
> most cases. 
- Why DNSWL started returning "listed, high trust"

Might have been obvious, but while you're disabling queries when receiving an IP response indicating you're blocked, it should probably also disable queries when receiving the REFUSED rcode.  


Anybody feel like doing some testing to figure out how and why DNS queries are tripled when REFUSED is returned by the DNS server?  Possibly not even related to spamassassin (*might* be entirely other software) but weird.
Comment 13 Matthias Leisi 2011-12-22 10:32:43 UTC
I did some additional tests on how best to block abusive query sources. "Best" is defined as three goals: 

1) Reduce the overall traffic on parent (dnswl.org) and data (list.dnswl.org) zone
2) Avoid or minimize collateral damage on root and gTLD servers
3) Make it easy for operators of abusive query sources to find out what is happening

We have built the mechanism to redirect defined IPs to a special view of the dnswl.org zone as part of bug 6724 (using BIND views). I wanted to do actual tests to base at least the decision on the first goal on hard facts. We tested three combinations:

A. Explicit nameserver in "nowhere land"
| list.dnswl.org.        21600 IN NS blockedview.dnswl.org.
| blockedview.dnswl.org. 21600 IN A  127.0.0.255

B. Explicit nameserver for data zone in .invalid
| list.dnswl.org.        21600 IN NS  _  
|   you.are.blocked.from.using.dnswl.org.thorugh.public.nameservers.invalid.

C. No zone apex
(no NS records for list.dnswl.org)

In all cases, we returned 127.0.0.255 for *.list.dnswl.org in this view. Also in all cases, we return 127.0.0.255 for the nameservers of the original data zone (a through l.ns.dnswl.org), which affected clients should not actually ever have  seen. Also, if an affected client would ask a through l.ns.dnswl.org they would always receive 127.0.0.255 as an answer. 

A. and B. showed no measurable difference in traffic levels on the parent and the data zone. 

With C., the traffic on the parent zone nameservers grew by about 30%; traffic on the data zone did only shrink by about half the amount that was added on the parent zone.

This rules out C. as a viable option and makes the choice depend only on goals 2 and 3 above: minimize collateral damage (on root servers) and maximize identifiability for operators. 

It can be expected that some resolvers will ask the roots for invalid., and it can also be expected that not all resolvers will do proper negative caching for B. 

This leaves A. as the most efficient option with the least collateral damage (except for the timeouts on the affected DNS resolver / forwarder when trying to reach 127.0.0.255). 

It should be remembered that this only applies to query sources who generate excessive amounts of traffic over some period of time, and who do not react to reasonable attempts at communication. 

The first line of defense would be to return 127.0.0.255 (or other BLOCKED triggering value, to be defined) from the regular data zone nameservers, as discussed in this bug.
Comment 14 D. Stussy 2012-01-19 19:30:18 UTC
RFC 6471 was published recently.  It has some things we may want to consider in determining the status of a DNS based list:

Section 3.3:
Listing 127.0.0.2 => DNSBL is operational (a must list condition).
A response outside of 127/8 => DNSBL is NOT operational.

Section 3.5:
Listing 127.0.0.1 => DNSBL is NOT operational.

My comment (to the authors when it was a draft RFC) about returning 0.0.0.0 for queries refused (when a DNS RC of REFUSED isn't implemented) apparently fell on deaf ears, beyond it being outside of 127/8 and thus indicating that the DNSBL is "not operational" for the querying client.  However, that does not mean that we can't consider it a special case as previously proposed (as the all-zeroes address isn't a routable unicast address).  I do think it makes it clear that returning 127.0.0.255 (or any other value in 127/8) is INCORRECT when a query is "refused."
Comment 15 Kevin A. McGrail 2012-06-08 12:53:47 UTC
*** Bug 6803 has been marked as a duplicate of this bug. ***
Comment 16 Henrik Krohns 2018-09-27 14:48:00 UTC
Well it took a while to come up with an "optimal" solution. Hopefully you will like this.

Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
Sending        lib/Mail/SpamAssassin/Conf.pm
Sending        lib/Mail/SpamAssassin/PerMsgStatus.pm
Transmitting file data ...done
Committing transaction...
Committed revision 1842098.

    dns_block_rule RULE domain
        If rule named RULE is hit, DNS queries to specified domain are
        temporarily blocked. Intended to be used with rules that check RBL
        return codes for specific blocked status. For example:

          urirhssub URIBL_BLOCKED multi.uribl.com. A 1
          dns_block_rule URIBL_BLOCKED multi.uribl.com

        Block status is maintained across all processes by empty statefile in
        userstate directory (~/.spamassassin/dnsblock_multi.uribl.com).

    dns_block_time (default: 300)
        dns_block_rule query blockage will last this many seconds.

It's extremely simple, works also with spamd, and takes no resources at all, one stat() call per dns_block_rule per message is not even worth mentioning nor debating.

I figured 5 minutes would be a better value than hour, in case something goes wrong (bad dns response of such?), then it's not blocked for too long. Imo, someone making max dozens of requests per 5 minutes or 1 hour makes zero difference. Also there will be more warns in log this way.

Must be enclosed with feature_dns_block_rule

if can(Mail::SpamAssassin::Conf::feature_dns_block_rule)
dns_block_rule SURBL_BLOCKED multi.surbl.org
dns_block_rule URIBL_BLOCKED multi.uribl.com
dns_block_rule RCVD_IN_DNSWL_BLOCKED list.dnswl.org
endif

Waiting for approval to commit that :-)
Comment 17 AXB 2018-09-27 15:47:17 UTC
(In reply to Henrik Krohns from comment #16)
> Well it took a while to come up with an "optimal" solution. Hopefully you
> will like this.
> 
> Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
> Sending        lib/Mail/SpamAssassin/Conf.pm
> Sending        lib/Mail/SpamAssassin/PerMsgStatus.pm
> Transmitting file data ...done
> Committing transaction...
> Committed revision 1842098.
> 
>     dns_block_rule RULE domain
>         If rule named RULE is hit, DNS queries to specified domain are
>         temporarily blocked. Intended to be used with rules that check RBL
>         return codes for specific blocked status. For example:
> 
>           urirhssub URIBL_BLOCKED multi.uribl.com. A 1
>           dns_block_rule URIBL_BLOCKED multi.uribl.com
> 
>         Block status is maintained across all processes by empty statefile in
>         userstate directory (~/.spamassassin/dnsblock_multi.uribl.com).
> 
>     dns_block_time (default: 300)
>         dns_block_rule query blockage will last this many seconds.
> 
> It's extremely simple, works also with spamd, and takes no resources at all,
> one stat() call per dns_block_rule per message is not even worth mentioning
> nor debating.
> 
> I figured 5 minutes would be a better value than hour, in case something
> goes wrong (bad dns response of such?), then it's not blocked for too long.
> Imo, someone making max dozens of requests per 5 minutes or 1 hour makes
> zero difference. Also there will be more warns in log this way.
> 
> Must be enclosed with feature_dns_block_rule
> 
> if can(Mail::SpamAssassin::Conf::feature_dns_block_rule)
> dns_block_rule SURBL_BLOCKED multi.surbl.org
> dns_block_rule URIBL_BLOCKED multi.uribl.com
> dns_block_rule RCVD_IN_DNSWL_BLOCKED list.dnswl.org
> endif
> 
> Waiting for approval to commit that :-)

I like it - you have my +1
Comment 18 RW 2018-09-27 16:26:37 UTC
(In reply to Henrik Krohns from comment #16)
>         Block status is maintained across all processes by empty statefile in
>         userstate directory (~/.spamassassin/dnsblock_multi.uribl.com).

This isn't going to work well for UNIX users unless that can be configured to be a shared location.
Comment 19 Henrik Krohns 2018-09-27 16:42:41 UTC
(In reply to RW from comment #18)
> (In reply to Henrik Krohns from comment #16)
> >         Block status is maintained across all processes by empty statefile in
> >         userstate directory (~/.spamassassin/dnsblock_multi.uribl.com).
> 
> This isn't going to work well for UNIX users unless that can be configured
> to be a shared location.

Well the example should be actually __userstate__/dnsblock_$domain. But I think it's clearer this way yes?

It works by default for everyone. Unless spamd/spamassassin is run on account that has zero write permissions of something funny is made. Obviously one can't put the default on some public /tmp for security reasons and configurable location would be silly for this kind of option imo. The userstate directory is there for a reason.

I think this is fine for 99% of the users. Well I'd still have to check Windows if it needs some File::Spec magic on the path.
Comment 20 Henrik Krohns 2018-09-27 16:52:20 UTC
Btw it's configurable..

  userstate_dir

  The new user's storage directory. (equivalent to C<~/.spamassassin>.)

There some virtual user stuff that I have no idea about. Maybe have to check that too. I doubt it's a problem, unless someone has thousand users having their own statedirs.
Comment 21 Henrik Krohns 2018-09-27 16:59:42 UTC
Actually there's the __local_state_dir__ (/var/lib/spamassassin)

__local_state_dir__/__version__/updates_spamassassin_org

I kind of glanced at it, but though it might not always be writable. Spamd might not be running as the same user that runs sa-update etc.

Maybe I'll do a small write-check on initialization to figure out what is the best state directory?
Comment 22 RW 2018-09-27 17:08:16 UTC
(In reply to Henrik Krohns from comment #20)
 
> unless someone has thousand users having
> their own statedirs.

That was my point.
Comment 23 AXB 2018-09-27 17:12:23 UTC
why not place it where the install places local.cf ?
Comment 24 Henrik Krohns 2018-09-27 17:37:17 UTC
(In reply to RW from comment #22)
> (In reply to Henrik Krohns from comment #20)
>  
> > unless someone has thousand users having
> > their own statedirs.
> 
> That was my point.

It would be more helpful it you expand this "point"?

Are you suggesting that there are installations where users are not using spamc/spamd, but instead use standalone /usr/bin/spamassassin for every message? I don't know what kind of else setup would make it possible, unless we are talking about the virtual-user spamd thingy that I haven't looked at.

In the case is as above, it makes absolutely no sense to share a "common" directory. It only takes one user to stop everyones DNSBL lookups by touching some files.

I think any spamd usage scenario is fixable by trying localstate directory first.
Comment 25 RW 2018-09-27 18:50:37 UTC
(In reply to Henrik Krohns from comment #24)

> Are you suggesting that there are installations where users are not using
> spamc/spamd, but instead use standalone /usr/bin/spamassassin for every
> message?

No, this is what happens when spamd works with unix users, as it might on a login server. 
 
>  I don't know what kind of else setup would make it possible, unless
> we are talking about the virtual-user spamd thingy that I haven't looked at.

I guess it would apply to  --virtual-config-dir as well. I've no idea about other types of virtual user.

> In the case is as above, it makes absolutely no sense to share a "common"
> directory. It only takes one user to stop everyones DNSBL lookups by
> touching some files.

In the case of a shared directory between unix users it would be necessary to defers setting the flag until the spamd child  regains root privileges, which may be more trouble than it's worth.

For virtual users a shared directory would be owned by the unprivileged user.
Comment 26 Henrik Krohns 2018-09-27 19:03:46 UTC
Add __writable_state_dir__ to find the best place to share global state data

Sending        SpamAssassin/AsyncLoop.pm
Sending        SpamAssassin/Conf.pm
Sending        SpamAssassin/PerMsgStatus.pm
Sending        SpamAssassin.pm
Transmitting file data ....done
Committing transaction...
Committed revision 1842141.
Comment 27 Henrik Krohns 2018-09-27 19:11:17 UTC
(In reply to RW from comment #25)
ssage?
> 
> No, this is what happens when spamd works with unix users, as it might on a
> login server. 

Ok I see what my problem was. Of course my spamd was running as the same user I tried spamc..

I see I have to also hold off testing writing only after dropping root privs.. I will also add the spamd-running users $HOME/.spamassassin as one state dir.
Comment 28 Henrik Krohns 2018-09-28 07:21:10 UTC
Takes ages to follow and understand all the code and processing paths. :-(

The concept is now __global_state_dir__/dnsblock_$domain

I have no idea why t/cross_user_config_leak.t break. Things are sliced slashed copied and duplicated here and there, who knows. I moved check from got_hit() to Check.pm and it works. I guess that's a bit more logical.

Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
Sending        lib/Mail/SpamAssassin/Conf.pm
Sending        lib/Mail/SpamAssassin/PerMsgStatus.pm
Sending        lib/Mail/SpamAssassin/Plugin/Check.pm
Sending        lib/Mail/SpamAssassin/Plugin.pm
Sending        lib/Mail/SpamAssassin.pm
Sending        spamd/spamd.raw
Transmitting file data .......done
Committing transaction...
Committed revision 1842212.