Bug 7060 - Allow local criteria for blocking <A HREF> URL's based on the hosts ISP, Country, or CIDR block - URILocalBL.pm
Summary: Allow local criteria for blocking <A HREF> URL's based on the hosts ISP, Coun...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-22 21:42 UTC by Philip Prindeville
Modified: 2014-09-02 17:04 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
First cut at module text/x-perl-script None Philip Prindeville [HasCLA]
Updated to explain configuring exclusions text/x-perl-script None Philip Prindeville [HasCLA]
Allow excluding domains instead of individual hosts text/x-perl-script None Philip Prindeville [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Prindeville 2014-06-22 21:42:13 UTC
Created attachment 5207 [details]
First cut at module

Setting up rules to use URIDNSBL.pm can be tricky for the uninitiated, and debugging issues requires specific knowledge of how the RR's are encoded for that DNSBL.

New entries sometimes take a while to get propagated.

And one has to use to totality of blacklisting: there is no way to configure only using selected parts of the DNSBL.

This plugin allows a site administrator to make trivial, instantaneous blacklisting entries based on Country Code (ISO 3166 2 letter codes), ISP name (as per the OrgName in the regional registry's database), or explicit CIDR blocks.

The first 2 functions require Geo::IP to be installed, and licensing of the GeoIPISP database in the case of ISP blocking.

See the attachment for documentation on usage and dependencies.

One known issue at this time is that the plugin uses synchronous name to address mappings via gethostbyname().

If a look-aside cache of mappings were available, as the one proposed in bug #7054, then both execution time and the number of DNS RR requests could be reduced.

Until then, the synchronous name lookups make this plugin unsuitable for high-volume sites.
Comment 1 Kevin A. McGrail 2014-06-23 16:58:31 UTC
I've added the first version of the URILocalBL.pm plugin to trunk.  It's disabled by default and due to the necessity of Geo::IP, probably will never be suitable to be enabled by default.

 svn commit -m 'Added URILocalBL.pm plugin to trunk for testing, updating MANIFEST and v341.pre file as well as optional dependencies with Net::CIDR::Lite and Geo::IP'
Sending        MANIFEST
Adding         lib/Mail/SpamAssassin/Plugin/URILocalBL.pm
Sending        lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Sending        rules/v341.pre
Transmitting file data ....
Committed revision 1604881.


Also noting this from the mailing list: why this is good to implement, what time of spam you use it to block, etc. 


As to your last questions: for someone who doesn’t need the complexity of using an DNSBL, doesn’t want the wide scope of using a DNSBL, want to have to configure it, or perhaps just wants a significantly more precise tool to solve a very limited problem, local blacklisting lets you do this.

As an example, we were recently hit by a volley of SPAM from a variety of mail relays, but they all had something in common.  All of them contained HTML with URL’s pointing to websites hosted by “Solar VPS”, and in particular on the subnet 65.181.64.0/18 (in some cases, the web hosts had additional A records on the subnet 192.99.0.0/16).

It took a couple of hours to get URIDNSBL configured, scored appropriately, and working… and verifying that the ill-behaved hosts had corresponding entries in multi.uribl.com without prior understanding of the record encoding also took some time (since the use of DNS RR’s is an overloading of their intended use, it’s less than intuitive).

When it was all over, it occurred to me that a trivial configuration like:

uri_block_cidr L_BLOCK_CIDR     65.181.64.0/18 192.99.0.0/16
body L_BLOCK_CIDR               eval:check_uri_local_bl("L_BLOCK_CIDR")
describe L_BLOCK_CIDR           Block URI's pointing to bad CIDR's
score L_BLOCK_CIDR              5.20

would be a lot more of a pinpoint fix to my issue, rather than the overly generalized approach of using multi.uribl.com. And I didn’t want to score everyone that was in that DNSBL, just to particular subnets.

After that, it occurred to me that I had never seen a legitimate email with a URL pointing to Vietnam or Nigeria in my life, and it would be nice to restrict those as well.  So the plugin later evolved to:

uri_block_cc L_BLOCK_CC         cn vn ro bg ru ng eg
body L_BLOCK_CC                 eval:check_uri_local_bl("L_BLOCK_CC")
describe L_BLOCK_CC             Block URI's pointing to countries with no CERT or anti-SPAM laws
score L_BLOCK_CC                5.65

In the case of the 65.181.0.0/16 SPAM which provided this call to action, here are some subject lines you might recognize:

News alert: you could apply for a CNA education program
Wireless Internet plans online
You've Been Accepted into the Who's Who
Don't overpay for a phone. Try a free* one today
Is your home missing something? How about custom blinds?
Could you study at a CNA education program?
cable service is a possibility

etc. All within a 6 hour spam.

Looking at some recent traffic on the SpamAssassin users mailing list, it seemed that other people had had a similar idea at the same time to provide surgical blacklisting locally.

At this point, I’m thinking of adding whitelisting support to the country, ISP, and CIDR blacklists. For example, we’ve had issues with ServerBeach being proactive about Spam or even acknowledging complaints in a timely fashion: that said, we get legitimate traffic with URL’s pointing to a Fedora Project resource hosted on one of their networks. So we couldn’t blacklist that entire ISP without “punching a hole” for Fedora build reports.

The whitelisting would either take individual IP addresses and/or host names as they appear in the URL’s.
Comment 2 AXB 2014-06-23 17:53:08 UTC
By the fact that it cannot be widely tested, the licensing of GeoIP data and the limited scalability, imo, this should not be part of SA code but treated as a third party plugin.
Comment 3 Kevin A. McGrail 2014-06-23 18:22:52 UTC
(In reply to AXB from comment #2)
> By the fact that it cannot be widely tested, the licensing of GeoIP data and
> the limited scalability, imo, this should not be part of SA code but treated
> as a third party plugin.

I considered that but since RelayCountry is in the same boat and he has a CLA on file, adding it to the code was best.
Comment 4 Philip Prindeville 2014-06-23 19:23:52 UTC
(In reply to AXB from comment #2)
> By the fact that it cannot be widely tested, the licensing of GeoIP data and
> the limited scalability, imo, this should not be part of SA code but treated
> as a third party plugin.

Sorry, this detail might have gotten lost along the way somewhere: the GeoLiteCountry.dat file is distributed with the perl packaging (at least for Fedora/RHEL/CentOS) and this is a dumbed-down fremium version of the GeoIP country database.

Without paying a dime, or piaster, or whatever you can use this plugin and have the CIDR and Country functionality, even if the Country database is of reduced accuracy.

# rpmls -l GeoIP
-rw-r--r--  root     root     /etc/GeoIP.conf
-rw-r--r--  root     root     /etc/GeoIP.conf.default
-rwxr-xr-x  root     root     /usr/bin/geoiplookup
-rwxr-xr-x  root     root     /usr/bin/geoiplookup6
-rwxr-xr-x  root     root     /usr/bin/geoipupdate
lrwxrwxrwx  root     root     /usr/lib64/libGeoIP.so.1
-rwxr-xr-x  root     root     /usr/lib64/libGeoIP.so.1.5.1
lrwxrwxrwx  root     root     /usr/lib64/libGeoIPUpdate.so.0
-rwxr-xr-x  root     root     /usr/lib64/libGeoIPUpdate.so.0.0.0
drwxr-xr-x  root     root     /usr/share/GeoIP
lrwxrwxrwx  root     root     /usr/share/GeoIP/GeoIP.dat
lrwxrwxrwx  root     root     /usr/share/GeoIP/GeoIPASNum.dat
-rw-r--r--  root     root     /usr/share/GeoIP/GeoIPASNumv6.dat
-rw-r--r--  root     root     /usr/share/GeoIP/GeoIPv6.dat
-rw-r--r--  root     root     /usr/share/GeoIP/GeoLiteASNum.dat
lrwxrwxrwx  root     root     /usr/share/GeoIP/GeoLiteASNumv6.dat
-rw-r--r--  root     root     /usr/share/GeoIP/GeoLiteCity.dat
-rw-r--r--  root     root     /usr/share/GeoIP/GeoLiteCityv6.dat
-rw-r--r--  root     root     /usr/share/GeoIP/GeoLiteCountry.dat
drwxr-xr-x  root     root     /usr/share/doc/GeoIP
-rw-r--r--  root     root     /usr/share/doc/GeoIP/AUTHORS
-rw-r--r--  root     root     /usr/share/doc/GeoIP/COPYING
-rw-r--r--  root     root     /usr/share/doc/GeoIP/ChangeLog
-rw-r--r--  root     root     /usr/share/doc/GeoIP/README
-rw-r--r--  root     root     /usr/share/doc/GeoIP/TODO
-rw-r--r--  root     root     /usr/share/doc/GeoIP/fetch-geoipdata-city.pl
-rw-r--r--  root     root     /usr/share/doc/GeoIP/fetch-geoipdata.pl
-rw-r--r--  root     root     /usr/share/man/man1/geoiplookup.1.gz
-rw-r--r--  root     root     /usr/share/man/man1/geoiplookup6.1.gz
-rw-r--r--  root     root     /usr/share/man/man1/geoipupdate.1.gz
# 

By default, the symlinks point to the "Lite" or fremium files, like "GeoIP.dat" points to "GeoLiteCountry.dat" for instance, and "GeoIPASNum.dat" points to "GeoLiteASNum.dat", etc.

It's only the GeoIPISP.dat file which would need to be licensed which would provide the ISP-based lookups.
Comment 5 AXB 2014-06-23 19:46:48 UTC
on Centos 6.x core repositories 
yum search GeoIP
 
Warning: No matches found for: GeoIP
No Matches found


yum search GeoLiteCountry

Warning: No matches found for: GeoLiteCountry
No Matches found


Please include download links in the pod
Comment 6 AXB 2014-06-23 19:47:51 UTC
which uses either the fremium GeoLiteCountry..

fremium?
Comment 7 Kevin A. McGrail 2014-06-23 19:52:32 UTC
(In reply to AXB from comment #6)
> which uses either the fremium GeoLiteCountry..
> 
> fremium?

I think he means freemium

Here's a good description I found:

http://www.tecmint.com/install-mod_geoip-for-apache-in-rhelcentos-6-35-8/

Mod_GeoIP has two different version one is Free and another one is Paid and uses MaxMind GeoIP / GeoCity databases.

    Free Version : In Free version the Geo City and Country databases are availble with 99.5% accuracy.
    Paid Version : In Paid version you will get both databases with 99.8% accuracy with some more advanaced details about IP address.

I don't think a stock CentOS includes the Geo::IP module based on http://mirror.centos.org/centos/6.5/os/x86_64/Packages/

cpan Geo::IP should work.

http://search.cpan.org/~borisz/Geo-IP-1.43/lib/Geo/IP.pm
Comment 8 Philip Prindeville 2014-06-23 23:20:35 UTC
(In reply to AXB from comment #5)
> on Centos 6.x core repositories 
> yum search GeoIP
>  
> Warning: No matches found for: GeoIP
> No Matches found
> 
> 
> yum search GeoLiteCountry
> 
> Warning: No matches found for: GeoLiteCountry
> No Matches found
> 
> 
> Please include download links in the pod

Sorry, for RHEL and CentOS you'll need to use the EPEL EL6 repository.
Comment 9 Philip Prindeville 2014-06-25 01:13:48 UTC
Created attachment 5208 [details]
Updated to explain configuring exclusions

The previous version supported exclusions but this was omitted in the POD section; this adds POD documentation for this capability.
Comment 10 AXB 2014-06-25 10:36:50 UTC
Is there a reason not to implement RegistrarBoundaries.pm and SA's uri detection?

also missing priority
Comment 11 Philip Prindeville 2014-06-25 20:46:43 UTC
(In reply to AXB from comment #10)
> Is there a reason not to implement RegistrarBoundaries.pm and SA's uri
> detection?

You lost me. You mean for the exclude rules, be able to specify just the domain and not the FQDN hostname?  Because a lot of snowshoe spam will use www.foo.bar on one ISP for the website, and foo.bar as the mail relay on another ISP.

> also missing priority

Priority for... what exactly?  The importance setting in Bugzilla or what?
Comment 12 AXB 2014-06-25 21:08:46 UTC
(In reply to Philip Prindeville from comment #11)
> (In reply to AXB from comment #10)
> > Is there a reason not to implement RegistrarBoundaries.pm and SA's uri
> > detection?
> 
> You lost me. You mean for the exclude rules, be able to specify just the
> domain and not the FQDN hostname?  Because a lot of snowshoe spam will use
> www.foo.bar on one ISP for the website, and foo.bar as the mail relay on
> another ISP.

go thru RegistrarBoundaries.pm & Uribl.pm and you'll see what RegistrarBoundaries.pm does &e how they play together. It's a pretty critical point of headaches.

> > also missing priority
> 
> Priority for... what exactly?  The importance setting in Bugzilla or what?
not Bugzilla ...

Different evals have different priorities in the processing chain

grep -R -H priority thru /trunk/lib
you'll get a ton of results which will point you in the right direction
Comment 13 Philip Prindeville 2014-06-27 00:53:34 UTC
(In reply to AXB from comment #12)

> go thru RegistrarBoundaries.pm & Uribl.pm and you'll see what
> RegistrarBoundaries.pm does &e how they play together. It's a pretty
> critical point of headaches.

Okay, I found RegistrarBoundaries.pm in spamassassin/trunk, but not Uribl.pm ...

> > Priority for... what exactly?  The importance setting in Bugzilla or what?
> not Bugzilla ...
> 
> Different evals have different priorities in the processing chain
> 
> grep -R -H priority thru /trunk/lib
> you'll get a ton of results which will point you in the right direction

Are you talking about $conf->{$rulename}->{priority} or something else?
Comment 14 AXB 2014-06-27 05:36:02 UTC
(In reply to Philip Prindeville from comment #13)
> (In reply to AXB from comment #12)
> 
> > go thru RegistrarBoundaries.pm & Uribl.pm and you'll see what
> > RegistrarBoundaries.pm does &e how they play together. It's a pretty
> > critical point of headaches.
> 
> Okay, I found RegistrarBoundaries.pm in spamassassin/trunk, but not Uribl.pm

it's URIDNSBL.pm

my bad...
Comment 15 Philip Prindeville 2014-07-25 00:52:55 UTC
(In reply to AXB from comment #12)

> Different evals have different priorities in the processing chain
> 
> grep -R -H priority thru /trunk/lib
> you'll get a ton of results which will point you in the right direction

Okay, so what does the priority do?  And what's a good example of it in a module?

(In reply to AXB from comment #14)

> it's URIDNSBL.pm

Okay, looking at that module I see:

    # take the usable domains and add them to the ordered list
    while (my($host,$domain) = each( %{$info->{hosts}} )) {
      if ($skip_domains->{$domain}) {
        dbg("uridnsbl: domain $domain in skip list, host $host");

so I can do something similar and process exclusions based on the domain instead of the host's FQDN.

I see RegistrarBoundaries::trim_domain() being called, but from the method complete_ns_lookup().  Since I don't handle my own asynchronous DNS lookups, I'm not sure how I would use this function directly.
Comment 16 Philip Prindeville 2014-08-07 17:52:39 UTC
Created attachment 5226 [details]
Allow excluding domains instead of individual hosts

Please apply this patch to SVN.
Comment 17 Kevin A. McGrail 2014-08-08 16:43:50 UTC
(In reply to Philip Prindeville from comment #16)
> Created attachment 5226 [details]
> Allow excluding domains instead of individual hosts
> 
> Please apply this patch to SVN.

Will do.  One minor thing to consider next time create a new bug and reference this one since it's additional features and this is closed.

svn commit -m 'allow excluding domains instead of individual hosts - bug 7060'  
Sending        lib/Mail/SpamAssassin/Plugin/URILocalBL.pm
Transmitting file data .
Committed revision 1616826.
Comment 18 Benny Pedersen 2014-08-29 19:42:09 UTC
(In reply to Philip Prindeville from comment #4)
> (In reply to AXB from comment #2)

> It's only the GeoIPISP.dat file which would need to be licensed which would
> provide the ISP-based lookups.

This is a show stopper
Comment 19 Philip Prindeville 2014-08-29 20:23:20 UTC
(In reply to Benny Pedersen from comment #18)
> (In reply to Philip Prindeville from comment #4)
> > (In reply to AXB from comment #2)
> 
> > It's only the GeoIPISP.dat file which would need to be licensed which would
> > provide the ISP-based lookups.
> 
> This is a show stopper

Sorry, remind me why this is a showstopper?

You don't need GeoIPISP.dat to use this plugin.  You just need it for "uri_block_isp" to work.

If it fails to open this database, then "uri_block_isp" is never exposed as a rule type.

What is it you're looking for as a 'fix' to this situation?
Comment 20 Benny Pedersen 2014-08-29 22:24:33 UTC
Plugin try to load this dat file and its logged as warn

Would be more happy it was not forced to be loaded, eg dont have to pay for remove warn
Comment 21 Philip Prindeville 2014-08-29 23:14:38 UTC
(In reply to Benny Pedersen from comment #20)
> Plugin try to load this dat file and its logged as warn
> 
> Would be more happy it was not forced to be loaded, eg dont have to pay for
> remove warn

This is a known issue with libGeoIP (the library provided by MaxMind.com which the Perl Geo::IP module links to) and a defect exists:

https://maxmind.zendesk.com/hc/requests/35945

I'm attaching a workaround for now until MaxMind.com updates the library sources.
Comment 22 Philip Prindeville 2014-09-02 17:04:25 UTC
(In reply to Benny Pedersen from comment #20)
> Plugin try to load this dat file and its logged as warn
> 
> Would be more happy it was not forced to be loaded, eg dont have to pay for
> remove warn

This was fixed with:

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7079#c1