Bug 7835

Summary: Domain blacklists domain wildcarding
Product: Spamassassin Reporter: Raymond Dijkxhoorn <raymond>
Component: LibrariesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: enhancement CC: apache, raymond, riccardo.alfieri
Priority: P2    
Version: unspecified   
Target Milestone: Undefined   
Hardware: All   
OS: All   
Whiteboard:

Description Raymond Dijkxhoorn 2020-07-07 10:23:01 UTC
The current SA libraries dont take into account that both DBL and SURBL provide wildcarded lists. They strip down the domain to the base level. Where this iosnt needed. Due to this the community is missing many listings that are inside both of those lists.

Now we can submit requests to add domains to the util_rb_2tld files but that doesnt really scale and its too slow also. 

For example: 

page.link isnt listed inside SURBL but <abused-subdomain>.page.link is. 
And this is just an example to outline. 

We see that many of the bad actors are abusing free services. Cloudplatforms and such and adding domains to 2/3tld files could work but again way to slow. If you want to take full advantage of the capabilities that SURBL hands to the community it would be far better to not strip down the domains to the base level all the time. the same applies for the DBL list that is also wildcarded. URIBL isnt wildcarded as far as i know but Alex could comment on that. 

If you need more info dont hesitate to mail me. 

With kind regards, Raymond Dijkxhoorn - SURBL
Comment 1 Henrik Krohns 2020-07-09 08:19:06 UTC
Some related talk also found in Bug 7165.

Yes it should be feasable to use a flag for example "tflags SURBL_FOO notrim".

And this could be enabled for all multi.surbl.org queries?
Comment 2 Riccardo Alfieri 2020-07-09 08:32:23 UTC
FWIW, we at Spamhaus support Raymond's request.

Using untrimmed hostnames would provide for sure more spam catching from both SURBL and Spamhaus lists
Comment 3 Raymond Dijkxhoorn 2020-07-09 08:37:54 UTC
(In reply to Henrik Krohns from comment #1)
> Some related talk also found in Bug 7165.
> 
> Yes it should be feasable to use a flag for example "tflags SURBL_FOO
> notrim".
> 
> And this could be enabled for all multi.surbl.org queries?

Yes. All of the multi lookups are wildcarded. 

So it applies to SURBL ABUSE, PH, CR and MW lookups. 

I saw SpamHaus was also added as a watcher.

I am sure Riccardo can comment on the SpamHaus zones to be changed.

Thanks! Raymond
Comment 4 Riccardo Alfieri 2020-07-09 08:49:13 UTC
All lookups to DBL should have the "notrim" flag set. 

ZRD supports them too, but since it's only for DQS customers, I'll take care of adding the necessary changes in our plugin when/if (I really hope this will happen!) there is support in SA
Comment 5 Henrik Krohns 2021-04-08 08:36:27 UTC
tflags "notrim" now implemented for urirhsbl/urirhssub

Sending        trunk/UPGRADE
Sending        trunk/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        trunk/t/uribl.t
Transmitting file data ...done
Committing transaction...
Committed revision 1888502.

Will test locally for a bit, then add to stock rules.
Comment 6 Riccardo Alfieri 2021-04-08 10:29:08 UTC
Very cool. Will this be included in 3.4.6 or will it be part of 4.x only?
Comment 7 Henrik Krohns 2021-04-08 10:33:48 UTC
(In reply to Riccardo Alfieri from comment #6)
> Very cool. Will this be included in 3.4.6 or will it be part of 4.x only?

We tried to drop 3.4 maintenance several times already, new features for 4.x only.
Comment 8 Henrik Krohns 2021-04-08 10:36:23 UTC
PS. Please try to run trunk version. It's been "production quality" for long time already. We need testers to prepare 4.0 release. :-)
Comment 9 Henrik Krohns 2021-04-12 11:15:47 UTC
Committed to stock rules. As there is no syntax validation on tflags, it's safe to use notrim anywhere needed.

Sending        rules/25_uribl.cf
Transmitting file data .done
Committing transaction...
Committed revision 1888663.
Comment 10 Henrik Krohns 2021-04-22 05:57:06 UTC
Some get_uri_detail_list statistics of my corpus. Enabling notrim adds one or two queries more per message on average. So very little effect DNS usage wise, caching will probably reduce a lot too.

HAM DOMAINS
Range:  0.000 - 80.000; Mean:  2.114; Median:  2.000; Stddev:  2.641
Percentiles:  90th:  4.000; 95th:  5.000; 99th:  8.000

HAM HOSTS
Range:  0.000 - 81.000; Mean:  3.065; Median:  3.000; Stddev:  3.126
Percentiles:  90th:  6.000; 95th:  7.000; 99th: 11.000

SPAM DOMAINS
Range:  0.000 - 26.000; Mean:  1.444; Median:  1.000; Stddev:  1.218
Percentiles:  90th:  3.000; 95th:  4.000; 99th:  6.000

SPAM HOSTS
Range:  0.000 - 26.000; Mean:  1.637; Median:  1.000; Stddev:  1.503
Percentiles:  90th:  3.000; 95th:  4.000; 99th:  7.000