Bug 7477

Summary: Direct DNS Querying Per DNSBL Zone
Product: Spamassassin Reporter: Karsten Bräckelmann <guenther>
Component: LibrariesAssignee: SpamAssassin Developer Mailing List <dev>
Status: NEW ---    
Severity: enhancement CC: apache, KlausRusch, kmcgrail, me
Priority: P2    
Version: unspecified   
Target Milestone: Future   
Hardware: All   
OS: All   
Whiteboard:
Attachments: per-zone resolver

Description Karsten Bräckelmann 2017-10-13 21:55:10 UTC
Feature request for a configuration option to set nameservers (similar to the existing dns_servers setting) on a per-DNSBL zone basis.

This feature would be useful for local and custom third-party DNSBLs, but also in certain environments where admins don't have full control of the nameservers used (see following in-depth discussion). Besides directly specifying a nameserver, this feature also can query for the DNSBLs authoritative nameservers on its own.

This would of course be a non-default option, and should come with a clear warning not to use unless the admin knows why and what he is doing. For that reason, the setting's documentation is not located in the main Conf docs.


I'd like to thank Invaluement, Rob McEwen, who permitted me to work on and spend time on this by funding my work on this project as part of the PCCC team and to publish it under the Apache License, Version 2.0.
Comment 1 Karsten Bräckelmann 2017-10-13 22:00:31 UTC
Created attachment 5464 [details]
per-zone resolver

Current state. Not fully operational, thus not yet committing to SVN but placing here.

- Direct querying implemented. Future improvement should include periodically refreshing of authoritative nameservers and possibly rotation of server used.

- Caching of the results is work-in-progress, commented out since it is not in a useable state.

Comments and feedback highly welcome.
Comment 2 Karsten Bräckelmann 2017-10-13 22:02:14 UTC
In-depth discussion by Rob McEven. Included in the patch, pasted here for convenience.


Direct DNS Query Per DNSBL

This is an optional (non-default!) feature in SpamAssassin. It is assigned to a DNSBL on a DNSBL-by-DNSBL basis. Therefore, if this feature is assigned to one particular DNSBL, all of the other DNSBLs continue to operate normally, WITHOUT using or being impacted by this feature.

This feature provides two main functions: 

(1) This enables the queries for a particular DNSBL to be sent DIRECTLY to a particular specified IP (or hostname), which can be either an authoritative DNS server, or a caching DNS server (whatever IP or hostname is specified in the setting for that particular DNSBL that is making use of this feature)

-OR-

(2) SpamAssassin can look up the NS records for a particular DNSBL (using the default DNS server for that lookup), resolve those host names to IP addresses, cache that DNS information as instructed by the authoritative TTL values, then start sending the actual DNSBL lookups DIRECTLY to one of those IPs, thus bypassing the default caching DNS server and going DIRECTLY to the authoritative DNS servers for that particular DNSBL. The NS records forthe DNSBL and the actual "a" record answers to the queries - are BOTH cached internally by SA, according to the TTL values provided by the DNSBLs authoritative DNS servers answers. When those expire, that information is then re-fetched, and then an IP to query against is selected again. Typically, the A-record of each DNSBL lookup is going to have a very short TTL value. But these are still cached since, in real-world spam filtering environments, MANY such lookups on the same item can often occur very rapidly within a small number of seconds. Therefore, these are also cached per the DNSBL's TTL value for these A-record lookups.

WHY IS THIS NEEDED? WHEN WOULD THIS BE USED?

Sometimes, SpamAssassin is installed in environments where the administrator does not necessarily have 100% control over the DNS server, or needs to use a DNS server that is NOT the default DNS server for that server, for a particular DNSBL. It is not uncommon for custom global DNS settings for that server to be overwritten by a hosting or service provider. Many hosting companies often try to default DNS settings to using Google's DNS servers instead of a locally-hosted caching DNS server. (This is VERY common!) And some SpamAssassin installations are in environments where the SA adminstrator simply has insufficient privileges to even be allowed to install their own locally-hosted caching DNS server. Other similar problems are common.

At the same time, some DNSBLs either require that they are accessed from the users own IP space and/or they dont allow queries to come from large providers DNS servers. For example, many DNSBLs block queries that come from both Googles and OpenDNSs DNS servers. Yet, again, it is extremely common for hosting companies to default to Google and OpenDNS DNS servers. At the same time, many subscription-based DNSBLs require that the queries come DIRECTLY from the userss IP space, and NOT be routed through Googles or their ISPs dns servers (and this would include both subscription-only DNSBLs, such as invaluement.com -AND- direct query subscriptions for more public DNSBLs like SpamHaus, which often require a subscription for various usage levels and scenarios). This feature therefore enables a way for a SpamAssassin administrator to ensure that the DNSBL lookups are going to come from the SA installations own IP space, and are NOT subject to undesirable global DNS changes that would disrupt access to that DNSBL. This also gives the DNSBL provider granularity of control by allowing the DNSBL provider to ONLY use this feature for particular DNSBLs, and give them the ability to set specific custom settings per DNSBL. In other words, each DNSBL this is applied to can have its own custom setting! And, again, those DNSBLs for which this is not implemented will continue to operate in the SAME manner as before this feature was introduced into SpamAssassin.

WHEN NOT TO USE THIS (AND LIMITATIONS):

For those SpamAssassin installations where the administrator has a great deal of control over the server and can install their own locally-hosted caching DNS server for use by SA, such an administrator should continue to do that and should NOT use this feature. Also, this solution may not be appropriate for extremely high-volume processing. However, if someone is in a high-volume environment, they normally would (and should!) have these other resources and permissions and level of control to be able to manage their own caching DNS server, without fear of having their settings overwritten, or the other problems described earlier! The main target of this feature is those SpamAssassin installations where the administrator doesnt have as much control, and such SA installations are typically smaller (such as typically providing spam filtering for fewer than 1,000 mailboxes). This new feature probably shouldnt be considered as superior to using your own locally-hosted caching DNS server, when the SA admin does have that option (with a reliable implementation that isn't subject to unwanted alterations).

TECHNOLOGY NOTES:

One challenge for this product is that fact that SA child processes are stateless. Therefore, this is implemented by using the Cache::FastMmap shared-memory cache for storing the DNSBL answers. Each SA child process spawned by SPAMD then reads and writes values from/to Cache::FastMmap. This is a Unix-specific program. Therefore, those who maintain Windows version of SpamAssassin might not have this feature, unless they are able to program an equivalent functionality. That might possibly be achieved through the use of a Windows port of Redis? (Since Redis is similar to Cache::FastMmap, and has been ported to Windows.) But this would need to be researched further, along with the licensing requirements of Redis or any other 3rd party programs used to implement this functionality on a Windows port of SpamAssassin.
Comment 3 Benny Pedersen 2017-10-13 22:44:51 UTC
could we change to sqlite ?

so pr dnsbl changes to pr sqlite db cache ?, why keep it dns when all nearly dont like to have dns data sync ?

is dns really that unstable that it really needs fixing in spamassassin ?

possible begin to use enlists with ip listning ?

why do i say it again, i find enlist very use full to have local url listed, and it does not need dns to do this, if the enlist data is in sqlite it's still lowmen even it there is millions random domains to be blacklisted

is sqlite to slow for this ?

imho could we distribute data to be compiled to sqlite, to keep lowmem still, so update channels can provide data to be added to sqlite, and then have rule set loaded from sqlite to memmory if its low amount of data to be loaded

just my own feedback, good or bad, dont know
Comment 4 Karsten Bräckelmann 2017-10-15 19:15:02 UTC
(In reply to Benny Pedersen from comment #3)
> could we change to sqlite ?
> 
> so pr dnsbl changes to pr sqlite db cache ?, why keep it dns when all nearly
> dont like to have dns data sync ?

This feature is about specific configuration for a(ny given) DNSBL, while still using the general configuration for all other DNSBLs.

No part of this about DNS is new, so using SQLite instead of DNS simply is not an option.

> is dns really that unstable that it really needs fixing in spamassassin ?

This is not about fixing unstable DNS.

> possible begin to use enlists with ip listning ?
> 
> why do i say it again, i find enlist very use full to have local url listed,

That has nothing to do with this feature request.
Comment 5 Henrik Krohns 2019-07-30 07:59:43 UTC
*** Bug 3500 has been marked as a duplicate of this bug. ***
Comment 6 Henrik Krohns 2022-04-11 13:18:03 UTC
Would need some cleaning up, but no time to look for 4.0.0. Is there _really_ demand for this feature, when resolvers are configurable pretty much everywhere.. postponing..