Bug 7529 - GeoLite Legacy databases will be discontinued in 04/01/2018
Summary: GeoLite Legacy databases will be discontinued in 04/01/2018
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: unspecified
Hardware: PC OpenBSD
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-07 16:12 UTC by Giovanni Bechis
Modified: 2018-08-05 13:45 UTC (History)
7 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
RelayCountry api rewrite patch None Giovanni Bechis [HasCLA]
URILocalBL api rewrite patch None Giovanni Bechis [HasCLA]
Warnings fix patch None Giovanni Bechis [HasCLA]
RelayCountry rewrite using IP::Country::DB_File patch None Giovanni Bechis [HasCLA]
RelayCountry dbtype option patch None Giovanni Bechis [HasCLA]
Add GeoIP2 to RelayCountry.pm patch None Kent Oyer [HasCLA]
RelayCountry country_db_type option patch None Giovanni Bechis [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Giovanni Bechis 2018-01-07 16:12:53 UTC
Maxmind annouced they will discontinue support for GeoIP legacy[¹] databases.

Atm legacy databases are used by RelayCountry and URILocalBL plugins, they should be able to use GeoIP2[²] api to be able to use new databases format.

[¹] http://dev.maxmind.com/geoip/legacy/geolite/
[²] https://metacpan.org/pod/GeoIP2
Comment 1 Giovanni Bechis 2018-02-06 12:21:14 UTC
Created attachment 5523 [details]
RelayCountry api rewrite
Comment 2 Giovanni Bechis 2018-02-06 12:21:45 UTC
Created attachment 5524 [details]
URILocalBL api rewrite
Comment 3 Giovanni Bechis 2018-02-06 12:26:04 UTC
Rewrite of plugins using Geo::IP legacy databases to new v2 API.
GeoIP2 isp database is not free, so it's untested.
Some operating systems do not have GeoIP2 perl modules in their tree yet, maybe we should wait a bit or add a fallback for that ?
RelayCountry plugin uses IP::Country::Fast as a fallback but upstream does not updates databases frequently (I do it on my own every now and then).
Comment 4 John Hardin 2018-02-06 17:22:19 UTC
(In reply to Giovanni Bechis from comment #3)
> GeoIP2 isp database is not free, so it's untested.

Not according to MaxMind at the link above:

>  GeoLite Legacy users will need to update their integrations
>  in order to switch to the free GeoLite2 or commercial GeoIP
>  databases by April 2018.

GeoLite2 is free. Do the changes support that database? Or only the GeoIP2 commercial database?
Comment 5 Giovanni Bechis 2018-02-06 18:03:53 UTC
The new databases are divided in "lite" version [¹] and paid version [²].
The isp database is only available in the commercial license, as for the City, Country and ASN databases they are still free.
Even with the legacy databases in use by SA atm, I cannot find a way to download a free version of the isp database.


[¹] https://dev.maxmind.com/geoip/geoip2/geolite2/
[²] https://www.maxmind.com/en/geoip2-isp-database
Comment 6 Giovanni Bechis 2018-02-06 19:29:57 UTC
Created attachment 5525 [details]
Warnings fix
Comment 7 John Hardin 2018-02-07 02:52:18 UTC
(In reply to Giovanni Bechis from comment #5)
> The new databases are divided in "lite" version [¹] and paid version [²].
> The isp database is only available in the commercial license, as for the
> City, Country and ASN databases they are still free.
> Even with the legacy databases in use by SA atm, I cannot find a way to
> download a free version of the isp database.
> 
> 
> [¹] https://dev.maxmind.com/geoip/geoip2/geolite2/
> [²] https://www.maxmind.com/en/geoip2-isp-database

The current tool uses the Lite database as far as I can tell - at least, GeoLite is what my Centos install is using...

Is the ISP database relevant? Does SA RelayCountry care about anything beyond the country?
Comment 8 John Hardin 2018-02-07 03:01:42 UTC
Argh, I think I may have misread your original comment - "GeoIP2 isp database is not free, so it's untested" may not imply that the free Lite country database doesn't work and can't be used...

Is that what happened? If so, my sincere apologies!
Comment 9 Giovanni Bechis 2018-02-07 07:35:42 UTC
SA RelayCountry reads the Country databases, URILocalBL can read both Country (free) and Isp (paid) databases.

I took a look at RHEL7 and I could not find an official RPM of MaxMind::DB::Reader nor of GeoIP2 (the web api that can be used instead of downloading the database files).
I haven't any Debian/Ubuntu machine at hands but I doubt that any big distribution is ready for this diff.
Comment 10 Bill Cole 2018-02-07 14:19:50 UTC
A potential alternative data source: https://pwhois.org/
I have no idea if their infrastructure is capable of handling the load from everyone who can switch it on in SpamAssassin, but they do distribute a Milter and a MacOSX 'widget' so they don't seem afraid of load. 

Pros: 
   Not dependent on the largesse of one company with an interest in free data being low-quality. 
   Derived from operational routing data (from route-views servers) updated 
   Includes Lat/Lon location (NOT in GeoLite2) 
   
Cons:
   Would need completely new code.  
   Location is strictly AS-based, so it is unlikely to be as accurate as MaxMind data. In some regions (e.g. EU) it may not even be the correct country.
Comment 11 Giovanni Bechis 2018-02-07 14:48:37 UTC
That could be a way in the long term, in the meantime I was thinking about porting URILocalBL to IP::Country::Fast and adding a configuration parameter to be able to switch between Geo::IP and IP::Country::Fast at will.
The IP::Country::Fast database is not updated but we can provide and updated database every now and then.
I will double check the accuracy of IP::Country::Fast database in respect of Geo::IP and come back in some days.
Comment 12 Benny Pedersen 2018-02-07 15:20:24 UTC
(In reply to Giovanni Bechis from comment #11)
.
> I will double check the accuracy of IP::Country::Fast database in respect of
> Geo::IP and come back in some days.

only one problem with this is that it does not support ipv6

atleast not when i did my own db update

building requeires arround 3GB ram, not a problem on my own vps to do this

if it now supports ipv6 it would be very cool imho
Comment 13 Giovanni Bechis 2018-02-07 17:24:37 UTC
IP::Country does not support ipv6, but IP::Country::DB_File does.
I will take a look at it, IMHO not depending on a company is a great thing for me.
Comment 14 Giovanni Bechis 2018-02-08 07:56:27 UTC
Created attachment 5526 [details]
RelayCountry rewrite using IP::Country::DB_File

RelayCountry plugin rewrite using IP::Country::DB_File and IP::Country::Fast as a fallback, no missing functionality.
Pro:
- get rid of MaxMind and uses only official data from Ripe
- you can update your database whenever you want with build_ipcc.pl(1)

Cons:
- IP::Country::DB_File is not included in any distribution atm

Regarding URILocalBL.pm I can provide a similar diff but there will be a missing functionality: uri_block_isp will not be available because of missing data.
As far as I know, atm this functionality needs a paid subscription of MaxMind isp databases, I cannot find a free version in MaxMind web site.
Comment 15 Giovanni Bechis 2018-02-27 10:57:38 UTC
In addiction MaxMind::DB::Reader is pureperl and is very slow, 
MaxMind::DB::Reader::XS is faster but depends on Math::Int128 which is 64bit only.
Comment 16 Giovanni Bechis 2018-03-19 22:41:53 UTC
Created attachment 5561 [details]
RelayCountry dbtype option

Another round of diffs:
This diff does do anything by default and uses Geo::IP legacy api as the current code does.
In addition IP::Country::Fast or IP::Country::DB_File can be choosed by changing the new country_db_type option.
This way there is no strict requirement on any new dependencies but we could still  have a way to get rid of Geo::IP legacy databases when they will be too outdated.
Any cons on this path ?
Otherwise I would like to fix M::S::P::URILocalBL in a similar way.
Comment 17 Michael C 2018-03-20 04:33:28 UTC
If Maxmind is removing the coordinates in GeoLite2, I suggest to use free IP2Location LITE database https://lite.ip2location.com which has country, state, city, latitude, longitude, ZIP code and time zone.

The Perl library https://github.com/ip2location/ip2location-perl is good in speed.
Comment 18 Giovanni Bechis 2018-03-20 07:45:59 UTC
I think that country is enough as information, 
IP::Country::DB_File database has 3.690.830.136 ipv4 addresses while ip2location free database has only 1M ip addresses.
Comment 19 Michael C 2018-03-20 10:34:39 UTC
How do you get the number of IP address in IP2Location LITE?

I've found 3,658,409,728 IP addresses in the latest DB1 LITE database.

Please take note the data is compressed by range to speed up query. One row in the database might consists of more than 1  IP address.
Comment 20 Giovanni Bechis 2018-03-20 11:00:40 UTC
https://lite.ip2location.com/edition-comparison
I haven't tried to code anything, I would like to have a plan about which api we should use for short and long term.
Comment 21 Kevin A. McGrail 2018-03-20 12:12:31 UTC
Is there any concern with a release?  Is anything enabled by default with this functionality that maxmind is changing?
Comment 22 Michael C 2018-03-20 12:52:26 UTC
(In reply to Giovanni Bechis from comment #20)
> https://lite.ip2location.com/edition-comparison
> I haven't tried to code anything, I would like to have a plan about which
> api we should use for short and long term.

That should be the number of rows. The total IP address should be more than 3 millions.
Comment 23 Giovanni Bechis 2018-03-20 14:07:57 UTC
(In reply to Kevin A. McGrail from comment #21)
> Is there any concern with a release?  Is anything enabled by default with
> this functionality that maxmind is changing?

The affected plugins are RelayCountry and URILocalBL, none of them are enabled by default.
MaxMind won't update legacy databases starting from 04/18/2018, using old databases will still work but with old data.
I do not know if MaxMind will permit to download the latest database or not, otherwise new installations won't have any database available and RelayCountry and URILocalBL won't work at all.
Comment 24 Kent Oyer 2018-04-26 04:41:21 UTC
Created attachment 5566 [details]
Add GeoIP2 to RelayCountry.pm

I created this patch to add GeoIP2 support to the RelayCountry plugin. The only problems is I had to hardcode the location of the database file. I'm not sure how to get around that. This works with the free GeoLite2 database that can be downloaded with geoipupdate [https://dev.maxmind.com/geoip/geoipupdate/]
Comment 25 Giovanni Bechis 2018-04-26 10:42:03 UTC
(In reply to Kent Oyer from comment #24)
> Created attachment 5566 [details]
> Add GeoIP2 to RelayCountry.pm
> 
> I created this patch to add GeoIP2 support to the RelayCountry plugin. The
> only problems is I had to hardcode the location of the database file. I'm
> not sure how to get around that.

You can use a config option as I have done in https://bz.apache.org/SpamAssassin/attachment.cgi?id=5523&action=view

Anyway IMHO the "problems" with using GeoIP2 api are:
- GeoIP2 is not packaged in many operating systems
- GeoIP2 xs code is 32bit only
Comment 26 Kent Oyer 2018-04-26 14:29:01 UTC
> 
> You can use a config option as I have done in
> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5523&action=view
> 
> Anyway IMHO the "problems" with using GeoIP2 api are:
> - GeoIP2 is not packaged in many operating systems
> - GeoIP2 xs code is 32bit only

In my patch, if GeoIP2 is not installed I fall back to the legacy Geo::IP first and then to IP::Country::Fast. This method does not break existing installations but still allows people to use GeoIP2 if desired. Why not do that?

Thanks for the tip on the config option. I thought about that but by moving the code into extract_metadata we have to instantiate a new Reader object with every message. I guess it's a tradeoff.
Comment 27 Kent Oyer 2018-04-26 19:15:27 UTC
Comment on attachment 5561 [details]
RelayCountry dbtype option

Your patch 5561 works well except two things:

1. On line 157, the assignment needs to be moved outside the if statement otherwise $db_info could be undefined in some cases

2. This could break existing installations because the default db_type is "GeoIP" and there's no automatic fallback to IP::Country::Fast
Comment 28 Giovanni Bechis 2018-06-12 21:08:16 UTC
Created attachment 5575 [details]
RelayCountry country_db_type option

I would like to move on with this diff.
It adds a country_db_type option to let the user choose between Geo::IP and IP::Country::Fast module.
If no option is specified the current behaviour is preserved (Geo::IP as default and IP::Country::Fast as a fallback).
URILocalBL will remain as is, IP::Country::Fast has less features than Geo::IP and it is not worth porting it.

Next we could add more dbtypes to choose from.
Comment 29 Giovanni Bechis 2018-06-17 09:42:56 UTC
Committed in r1833660.
The fix is partial because IP::Country::Fast has no ipv6 support.
I will keep the bz open, a new api should be adopted sooner or later.
Comment 30 hal415 2018-06-18 16:44:07 UTC
Looking at the commits, there's, then, no GeoIP2 support?

> Maxmind annouced they will discontinue support for GeoIP legacy[¹] databases...

It's when, not if, given that the legacy MaxMind DBs are *already* deprecated (https://blog.maxmind.com/2018/01/02/discontinuation-of-the-geolite-legacy-databases/):

No longer updated as of April 11, 2018, and to be completely removed in Jan 2019.

IIUC, for those that depend on regularly updated MaxMind DBs -- some free, some paid -- and have already switched to the v2 sources, this renders RelayCountry non-functional currently.

> - GeoIP2 is not packaged in many operating systems

Not everyone depends on distros packages -- particularly when it comes to security.

In the same manner that we track SA's 3.4 branch, and build locally, we do the same for the MaxMind Geo DBs.

Fwiw, DL'ing/building/using GeoLite2 DBs is trivial/straightforward.
Comment 31 John Hardin 2018-06-18 17:45:06 UTC
(In reply to hal415 from comment #30)
> > - GeoIP2 is not packaged in many operating systems
> 
> Not everyone depends on distros packages -- particularly when it comes to
> security.
> 
> In the same manner that we track SA's 3.4 branch, and build locally, we do
> the same for the MaxMind Geo DBs.
> 
> Fwiw, DL'ing/building/using GeoLite2 DBs is trivial/straightforward.

I strikes me that it would be very welcome to have a "Migrating RelayCountry to GeoLite v2" SA wiki page with step-by-step instructions - I know I would like to have that available.

If someone who knows how to do this and is willing to write it up does not have wiki access, I'm sure one of the devs who does have such access would be happy to post it to the wiki.
Comment 32 hal415 2018-06-18 17:51:02 UTC
(In reply to John Hardin from comment #31)
> > Fwiw, DL'ing/building/using GeoLite2 DBs is trivial/straightforward.
> 
> I strikes me that it would be very welcome to have a "Migrating RelayCountry
> to GeoLite v2" SA wiki page with step-by-step instructions - I know I would
> like to have that available.

AFAICT, there's no migration of ReplayCountry -- or accompanying 'step-by-step' -- *until* the  RelayCountry API supports it ...
Comment 33 Giovanni Bechis 2018-06-18 20:30:24 UTC
I have a working GeoIP2 api implementation on my tree and I can commit it iff there is general consensus.

The biggest showstopper for me is that
GeoIP2 depends on MaxMind::DB::Reader that needs Scalar::Util>=1.45 to run;
Rhel/Centos7 (as an example) have only 1.27 as default; perl-5.26 has the required version.
Comment 34 hal415 2018-06-18 20:54:31 UTC
Access to working RelayCountry, with GeoIP2, is the issue now.  If you've a working tree, that'd be helpful.

I'd add the fact of the prereq to the appropriate README's 'prereqs' section.

An upgrade with (m)cpan(p) to, or a DIY build of, a newer Scalar::Util is simple enough.  If someone's already paying attention to the GeoIP v2 DBs, it's not a stretch that they're capable.
Comment 35 Giovanni Bechis 2018-07-02 07:34:58 UTC
If there are no objections I would like to add code to support both GeoIP2 and IP::Country::DB_File.
The first is Maxmind new format (lot of dependencies but data are accurate and there is a ISP database (paid version) that is used by URILocalBL).
The latter is free, has a database AS-based that supports ipv6 as well (IP::Country::Fast does not support it) and updating the database is very fast.

I will then contact some SA maintainers to try to push the needed new dependencies in their respective Linux distro to be able to deprecate GeoIP legacy and IP::Country::Fast sooner or later.
Comment 36 Giovanni Bechis 2018-08-05 13:45:16 UTC
Added support for RelayCountry and URILocalBL for new GeoIP2 databases with commit #1837465 and #1837466 for both trunk and 3.4.

To install optional packages you can use cpan[m] if your preferred distribution
doesn't offer packages.
By default legacy databases are still used and there is no intended behaviour change, check man pages to enable GeoIP2 or DB_File support.
GeoIP2 is x64 bit only (there is a pure-perl implementation but is very slow), IP:Country::DB_File has few dependency and runs on !intel platforms as well.

To install dependencies:
# cpan -i Math::Int64
# cpan -i GeoIP2

# cpan -i IP::Country::DB_File