Bug 6728 - DNSBLs need a way to turn off queries based on BLOCKED rules triggering
Summary: DNSBLs need a way to turn off queries based on BLOCKED rules triggering
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: PC Windows 7
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-15 12:56 UTC by Kevin A. McGrail
Modified: 2012-06-08 12:53 UTC (History)
6 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin A. McGrail 2011-12-15 12:56:34 UTC
Need a way to make spamd recognize that a BLOCKED rule such as URIBL_BLOCKED is triggered and hold off on subsequent DNSBL queries for 1 hour.

The 1 hour part should be configurable.

Subsequent delays will need a multiplier that is configurable and will be set at 1.1 for initial tests. 

This means:

Query returns answer indicating IP is blocked for DNS queries.
BLOCKED rule is triggered
SpamD recognizes the block and turns off queries for 1 hour for that RBL
1 hour elapses
Query returns answer indicating IP is blocked for DNS queries.
BLOCKED rule is triggered
SpamD recognizes the block and turns off queries for 1.1 hours for that RBL
...

We likely need something to define in a rules config such as:

block_disable   List of Rules separated by commas,semicolons or whitespace

block_disable  URIBL_BLACK,URIBL_GREY,URIBL_RED

Then if spamd picks up a BLOCKED rule and block_disable is defined, it can disable those rules.

However, if we need an easier hack, we can modify the description of the rule for BLOCKED to including something that is parse-able. such as adding Temporarily Blocking Queries: URIBL_BLACK,URIBL_GREY,URIBL_RED to the description.

This idea was split from bug 6724.
Comment 1 Darxus 2011-12-15 15:14:47 UTC
You're assuming that people exceeding free usage limits will always be using spamc/spamd and never the non-client/server form of spamassassin?
Comment 2 Kevin A. McGrail 2011-12-15 15:17:11 UTC
(In reply to comment #1)
> You're assuming that people exceeding free usage limits will always be using
> spamc/spamd and never the non-client/server form of spamassassin?

I am not assuming that.  I'm hitting the most likely scenario since we need to keep stateful information.  I didn't think creating lock files was going to be efficient but that was how I would do it for spamassassin sans spamd.
Comment 3 Michael Parker 2011-12-15 15:55:08 UTC
I'm -1 on this idea unless you can figure out how to make it a plugin that is disabled by default.  You're solving a problem for a VERY SMALL percentage of users and most likely introducing a performance penalty for everyone.

Persue the "BLOCKED" rule set idea and don't try to get fancy.  FPs and FNs will get examined and if you make the "BLOCKED" rule descriptions scary enough they will take action.
Comment 4 Kevin A. McGrail 2011-12-15 15:57:09 UTC
(In reply to comment #3)
> I'm -1 on this idea unless you can figure out how to make it a plugin that is
> disabled by default.  You're solving a problem for a VERY SMALL percentage of
> users and most likely introducing a performance penalty for everyone.
> 
> Persue the "BLOCKED" rule set idea and don't try to get fancy.  FPs and FNs
> will get examined and if you make the "BLOCKED" rule descriptions scary enough
> they will take action.

I intended to have it on by default but have it be either a plugin or configuration option to disable the behavior.  Is that acceptable to bypass your -1?
Comment 5 Henrik Krohns 2011-12-15 17:18:36 UTC
-1 for spamd specific
-1 for any locking whatsoever
-1 for multiplier (what's the point? 1 query per 1 hour or 2 hours makes no difference, but would need more complex state processing)

Possibly the simplest and cheapest solution would be using filenames for keeping state.

When block is hit: create file LOCAL_STATE_DIR/dnsblock.<identifier>

The identifier should come from the rbl identifier/name, in this case "dnswl": eval:check_rbl_sub('dnswl-firsttrusted', '^127\.0\.\d+\.255$')

Then rbl function can simply stat() if LOCAL_STATE_DIR/dnsblock.<identifier> exists. If mtime is > 1 hour, just unlink() the file.

My slow linux VPS benchmarks at 715000 stat calls per second testing for a set of 10 different non-existing filenames, so performance is a non-issue.

Proposed config "block_disable" would need to refer to identifier instead of a rule name.
Comment 6 Henrik Krohns 2011-12-15 17:22:17 UTC
(In reply to comment #5)
> Proposed config "block_disable" would need to refer to identifier instead of a
> rule name.

Correction.. since uribl etc doesn't have an eval "identifier", we should use the zone name as such.
Comment 7 D. Stussy 2011-12-15 23:21:55 UTC
In my opinion, "blocked" should ALSO trigger upon the affirmative receiving of an A record OUTSIDE of 127.0.0.0/8, regardless of ruleset processing.  This part of the recognition would be performed in the routine which processes DNS-list results.  "Blocked" detection for this purpose should be a boolean flag - to handle a case where more than one offending address is received, and handled after such processing.  That way, we won't accidentally bump the timeout counter more than once when retrying.

Functional DNS lists should explicitly return "0.0.0.0" (i.e. no rule is necessary to detect them).  Non-functional lists may return any [unicast] address as they may be "parked" at a registration service for sale and SA got the A record (via a DNS wildcard entry) meant for HTTP redirection to a "domain for sale" web server page.

This is in addition to the rule detection proposed in "comment 0."
Comment 8 Kevin A. McGrail 2011-12-15 23:28:17 UTC
(In reply to comment #7)
> In my opinion, "blocked" should ALSO trigger upon the affirmative receiving of
> an A record OUTSIDE of 127.0.0.0/8, regardless of ruleset processing.  This
> part of the recognition would be performed in the routine which processes
> DNS-list results.  "Blocked" detection for this purpose should be a boolean
> flag - to handle a case where more than one offending address is received, and
> handled after such processing.  That way, we won't accidentally bump the
> timeout counter more than once when retrying.

I get this part.  We should make URIBL.pm and EvalDNS.pm flag ignore responses outside of 127.0.0.1 and possibly even trigger BLOCKED.

> Functional DNS lists should explicitly return "0.0.0.0" (i.e. no rule is
> necessary to detect them).  

Here is where I get confused.  Functional lists should explicitly return 0.0.0.0 to what query?

> Non-functional lists may return any [unicast]
> address as they may be "parked" at a registration service for sale and SA got
> the A record (via a DNS wildcard entry) meant for HTTP redirection to a "domain
> for sale" web server page.

Definitely a good case to fix.  This issue has bitten us on other RBLs in the past.
Comment 9 D. Stussy 2011-12-16 00:41:23 UTC
"I get this part.  We should make URIBL.pm and EvalDNS.pm flag ignore responses
outside of 127.0.0.1[sic] and possibly even trigger BLOCKED."

YES!  Definently trigger "blocked." ;-)  Takes care of the registration problem too.

"Here is where I get confused.  Functional lists should explicitly return
0.0.0.0 to what query?"

...To "query refused"  (abuse/excessive traffic), if the proper DNS RC of refused with 0 answers is not performed.  I suggest "0.0.0.0" because it is outside of 127/8 (see processing above) and "all zeros" means no information.  Returning "127.0.0.255" as some do is irresponsible.  If they returned a code outside of 127/8, all we'd need is the range checking code above.
Comment 10 Kevin A. McGrail 2011-12-16 00:52:01 UTC
(In reply to comment #9)
> "I get this part.  We should make URIBL.pm and EvalDNS.pm flag ignore responses
> outside of 127.0.0.1[sic] and possibly even trigger BLOCKED."
> 
> YES!  Definently trigger "blocked." ;-)  Takes care of the registration problem
> too.

So you want to trigger blocked for anything outside of 127.0.0/24 or 127/8?  

 
> "Here is where I get confused.  Functional lists should explicitly return
> 0.0.0.0 to what query?"
> 
> ...To "query refused"  (abuse/excessive traffic), if the proper DNS RC of
> refused with 0 answers is not performed.  I suggest "0.0.0.0" because it is
> outside of 127/8 (see processing above) and "all zeros" means no information. 
> Returning "127.0.0.255" as some do is irresponsible.  If they returned a code
> outside of 127/8, all we'd need is the range checking code above.

We're outside of my comfort zone with the standard-tracks for DNSBL but I am unsure why 127.0.0.255 is irresponsible in DNSWL's case because they don't use bitwise logic for their list.  

Anyway, why 0.0.0.0 as opposed to an explicitly valid answer defined as a Blocked answer?  What's the benefit to the change?
Comment 11 D. Stussy 2011-12-16 01:51:18 UTC
"So you want to trigger blocked for anything outside of 127.0.0/24 or 127/8?"

Yes - to 127/8  (127.0.0.0/255.0.0.0).  RFC 5782 permits anything in 127/8, so there is no reason to restrict it to the /24.  Furthermore, there are some lists (e.g. hostkarma.junkemailfilter.com) which do return codes within the /8 but outside the /24 (for various experimental things like "does the server issue QUIT?" = 127.0.1.[0-2]).


A am against the use of "127.0.0.255" to mean "query refused due to abuse and/or excessive traffic" because it is within the valid range of 127/8, yet yields no information as to actual information answering the query.  It can be easily mistaken for a valid answer.  "Not available/go away" is not the same as "listed" nor "unlisted."  The fact that it is an answer in the valid range is the very reason why we have the FP/FN problem in the first place -- we considered it a valid answer.  As "0.0.0.0" is outside the valid range for an informational answer AND also not a valid unicast address, that's why I suggested it for a "null answer."

Another suitable value of all one's (255.255.255.255) as a refusal indicator was considered and rejected.  A single-bit error in the MSB which gets past any application layer error detection could be confused with a valid answer.

"0.0.0.0" triggering a block implies an explicit block from an active list, vs. a "random" unicast address triggering a block, implying a decommissioned list.  The software could but need not distinguish between the cases.  If it were to distinguish, then the non-all-zero address would permanently block (until manual intervention).

Although I agree with an initial one-hour delay, the TTL amount on the record in question when one is returned could also, if higher (than 3600), be the initial count for the recheck timer (when the A-RR value is outside 127/8).
Comment 12 Darxus 2011-12-18 17:00:38 UTC
> The reason for the special result code, as indicated in the posting referenced
> above, is that REFUSED rcode will result in triple the amount of queries in
> most cases. 
- Why DNSWL started returning "listed, high trust"

Might have been obvious, but while you're disabling queries when receiving an IP response indicating you're blocked, it should probably also disable queries when receiving the REFUSED rcode.  


Anybody feel like doing some testing to figure out how and why DNS queries are tripled when REFUSED is returned by the DNS server?  Possibly not even related to spamassassin (*might* be entirely other software) but weird.
Comment 13 Matthias Leisi 2011-12-22 10:32:43 UTC
I did some additional tests on how best to block abusive query sources. "Best" is defined as three goals: 

1) Reduce the overall traffic on parent (dnswl.org) and data (list.dnswl.org) zone
2) Avoid or minimize collateral damage on root and gTLD servers
3) Make it easy for operators of abusive query sources to find out what is happening

We have built the mechanism to redirect defined IPs to a special view of the dnswl.org zone as part of bug 6724 (using BIND views). I wanted to do actual tests to base at least the decision on the first goal on hard facts. We tested three combinations:

A. Explicit nameserver in "nowhere land"
| list.dnswl.org.        21600 IN NS blockedview.dnswl.org.
| blockedview.dnswl.org. 21600 IN A  127.0.0.255

B. Explicit nameserver for data zone in .invalid
| list.dnswl.org.        21600 IN NS  _  
|   you.are.blocked.from.using.dnswl.org.thorugh.public.nameservers.invalid.

C. No zone apex
(no NS records for list.dnswl.org)

In all cases, we returned 127.0.0.255 for *.list.dnswl.org in this view. Also in all cases, we return 127.0.0.255 for the nameservers of the original data zone (a through l.ns.dnswl.org), which affected clients should not actually ever have  seen. Also, if an affected client would ask a through l.ns.dnswl.org they would always receive 127.0.0.255 as an answer. 

A. and B. showed no measurable difference in traffic levels on the parent and the data zone. 

With C., the traffic on the parent zone nameservers grew by about 30%; traffic on the data zone did only shrink by about half the amount that was added on the parent zone.

This rules out C. as a viable option and makes the choice depend only on goals 2 and 3 above: minimize collateral damage (on root servers) and maximize identifiability for operators. 

It can be expected that some resolvers will ask the roots for invalid., and it can also be expected that not all resolvers will do proper negative caching for B. 

This leaves A. as the most efficient option with the least collateral damage (except for the timeouts on the affected DNS resolver / forwarder when trying to reach 127.0.0.255). 

It should be remembered that this only applies to query sources who generate excessive amounts of traffic over some period of time, and who do not react to reasonable attempts at communication. 

The first line of defense would be to return 127.0.0.255 (or other BLOCKED triggering value, to be defined) from the regular data zone nameservers, as discussed in this bug.
Comment 14 D. Stussy 2012-01-19 19:30:18 UTC
RFC 6471 was published recently.  It has some things we may want to consider in determining the status of a DNS based list:

Section 3.3:
Listing 127.0.0.2 => DNSBL is operational (a must list condition).
A response outside of 127/8 => DNSBL is NOT operational.

Section 3.5:
Listing 127.0.0.1 => DNSBL is NOT operational.

My comment (to the authors when it was a draft RFC) about returning 0.0.0.0 for queries refused (when a DNS RC of REFUSED isn't implemented) apparently fell on deaf ears, beyond it being outside of 127/8 and thus indicating that the DNSBL is "not operational" for the querying client.  However, that does not mean that we can't consider it a special case as previously proposed (as the all-zeroes address isn't a routable unicast address).  I do think it makes it clear that returning 127.0.0.255 (or any other value in 127/8) is INCORRECT when a query is "refused."
Comment 15 Kevin A. McGrail 2012-06-08 12:53:47 UTC
*** Bug 6803 has been marked as a duplicate of this bug. ***