Bug 6372 - LashBack: Unsubscribe Data
Summary: LashBack: Unsubscribe Data
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: unspecified
Hardware: Other All
: P5 enhancement
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 6167 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-03-10 23:43 UTC by Brandon Phillips
Modified: 2019-08-20 16:58 UTC (History)
5 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Brandon Phillips 2010-03-10 23:43:22 UTC
I'm with an email compliance monitoring company called LashBack, located in St. Louis, USA. I was hoping I could get some guildance from this group as to how we might start a discussion with the SpamAssassin team in regards to seeing if we might incorporate our "Unsubscribe Blacklist" (UBL) into your service. 

The UBL is a list of IP addresses that are sending email to addresses which have been harvested from unsubscribe/suppresion files. It's a very unique list and I think it can add value to your project.

http://www.lashback.com/reputation/BlacklistResources.aspx

Thank you in advance.

Brandon Phillips
ceo, lashback llc
314.398.9900 direct
Comment 1 Warren Togami 2010-03-11 02:59:17 UTC
There are a number of requirements (that are not documented anywhere...) for a new blacklist to be added to spamassassin.  But first the blacklist must be measured to be useful in spam detection while also guarding against false positives.  Examples:

* Bug #6156 PSBL for example had over 2 months of testing in weekly masschecks before it was enabled by default in spamassassin-3.3.0.  PSBL's hit rate is a relatively low 10-15% which is normally too small to be worthwhile as a default network test in spamassassin.  But PSBL was found to be consistently among the SAFEST blacklists with almost zero false positives.
* The Anubis blacklists have been in testing for many months now.  They are catching roughly 33% of spam with a typically low false positive rate.  33% is marginally good enough for inclusion in spamassassin, but overlap analysis shows differences between Anubis and the typical top blacklists like Spamhaus XBL, which is a good thing.  Anubis being based in Europe uses very different data sources than most other blacklists making it a valuable addition.  Anubis is now currently working on establishing a network of public global mirrors in order to become suitable for spamassassin.

I have tested Lashback UBL during October 2009.  The test is still in the source sandbox.

spamassassin/trunk/rulesrc/sandbox/wtogami

# UBL testing disabled 20091019
# http://ruleqa.spamassassin.org/20091017-r826198-n/T_RCVD_IN_UBL/detail
# Saturday masscheck revealed 7.9% spam and 2.3% ham hit rate

#header   RCVD_IN_UBL eval:check_rbl('ubl-lastexternal', 'ubl.unsubscore.com')
#describe RCVD_IN_UBL Relay listed in UBL http://www.lashback.com/support/UnsubscribeBlacklistSupport.aspx
#tflags   RCVD_IN_UBL net nopublish

Unfortunately the results at that time were abysmally bad.  It caught far fewer spam than the other blacklists, while the false positive rate was unacceptably poor.  We can certainly test your blacklist again.

Do you wish us to enable UBL in our weekly Saturday masschecks?  It wont be a rule pushed to any spamassassin clients.  It will be a rule in Saturday masschecks where you will have a burst of up to a million queries coming from a small number of servers on the Internet.

Anyone else object to me enabling this test for weekly masscheck?
Comment 2 Kevin A. McGrail 2012-01-18 23:41:21 UTC
Brandon, 

The SA Policy for DNSBL inclusion has been formalized at http://wiki.apache.org/spamassassin/DnsBlocklistsInclusionPolicy

Would you like us to continue this ticket?

Regards,
KAM
Comment 3 Dave Jones 2017-06-06 21:19:41 UTC
Lashback has become one of the "major RBLs" out there known to be used by many major mail providers like Comcast.net.  Can we revive this again and get ubl.unsubscore.com into the default rules?  I would like to get it into the masscheck processing to see how effective it would be assuming Lashback is Ok with the inclusion policy.

Dave
Comment 4 Kevin A. McGrail 2017-06-06 21:40:06 UTC
(In reply to Dave Jones from comment #3)
> Lashback has become one of the "major RBLs" out there known to be used by
> many major mail providers like Comcast.net.  Can we revive this again and
> get ubl.unsubscore.com into the default rules?  I would like to get it into
> the masscheck processing to see how effective it would be assuming Lashback
> is Ok with the inclusion policy.
> 
> Dave

Dave, See comment #2 that this is in LashBack's court.  Brandon left LashBack a few years ago but we could definitely try it out.  Do you have a contact there that might want to talk about inclusion by default in the free for some model?

Best,
KAM
Comment 5 Dave Jones 2017-06-06 21:52:52 UTC
I tried sending an email to Brandon and got a bounce so I have sent it to sales at lashback.com address.  We will see if they are interested in reviving this.

This page indicates it's free to use for both commercial or non-commercial purposes:

https://blacklist.lashback.com/

I am sure they want to grow their user base to increase their listing fees.

I received a spam message today that went into my ham masscheck corpus which would have been blocked if we had this BL in the default SA rules:

X-Spam-Status: No, score=4.575 tagged_above=-999 required=5
    tests=[BAYES_50=0.8, CANT_SEE_AD=1,
    HEADER_FROM_DIFFERENT_DOMAINS=0.001, HK_RANDOM_ENVFROM=0.001,
    HTML_IMAGE_ONLY_12=2.059, HTML_MESSAGE=0.001,
    HTML_SHORT_LINK_IMG_1=0.001, MIME_HTML_ONLY=0.723,
    SPF_HELO_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01]
Received: from hotshope.com (hotshope.com [192.227.65.79])
    by smtp4i.ena.net (Postfix) with ESMTP id DF99E1480D36
    for <redacted@example.com>; Tue, 6 Jun 2017 11:21:19 -0500 (CDT)
Subject: Cannabis Oil Without a Prescription in All 50 States

Dave
Comment 6 Kevin A. McGrail 2017-06-06 23:24:48 UTC
(In reply to Dave Jones from comment #5)
> I tried sending an email to Brandon and got a bounce so I have sent it to
> sales at lashback.com address.  We will see if they are interested in
> reviving this.

I am open to trying it again.  It was pretty bad 8 or so years ago when I last tested it but many RBLs have improved their techniques.
Comment 7 Mike Augustine 2017-06-07 16:06:27 UTC
We’d love to be considered for inclusion in the SpamAssassin DNSBL. Let me know how I may help with the required testing.
Comment 8 Dave Jones 2017-06-07 17:28:45 UTC
Mike,
Can you review this page and provide any necessary responses to the bullet points?

http://wiki.apache.org/spamassassin/DnsBlocklistsInclusionPolicy

Do you have any SpamAssassin example rules that we could start with for testing purposes?  I have some I use in my day job if needed but I think it would be better to come from your side if possible to make sure SA would be using SA the best possible way.

Dave
Comment 9 Mike Augustine 2017-06-12 18:24:31 UTC
Just a quick update; I've received some helpful guidance from Dave Jones about tweaks to reduce false positives, etc in our blacklist. I'm meeting with teammates this week to discuss feasibility and implementation. I'll post a status soon.

Mike
Comment 10 Dave Jones 2017-09-13 20:37:20 UTC
I have done some testing on my mail filtering platform lately after Lashback has cleaned up many large hosting providers from their list.  The results look very promising.

What are the next steps to get this added into the default SA rules for testing with a low value?

I have confirmed with Michael Augustine maugustine@lashback.com their terms and conditions align with our requirements:

http://blacklist.lashback.com/

P.S. I understand the rule updates are currently still on hold so I would have to get that part band-aid'd together again before updated rulesets will go out again.

Thanks,
Dave
Comment 11 Kevin A. McGrail 2017-09-13 22:08:17 UTC
Excellent.  I'll put it into testing and give some feedback!
Comment 12 Kevin A. McGrail 2017-10-11 22:20:50 UTC
So in order to test the RBL effectiveness, I used approximately one months worth of data for:

Number of hits on the rule(s) in my SPAM2LEARN hand sorted spam folder (this is mail that slipped through)  /  Number of emails in the folder

44 hits / 1915 total

Number of hits on the rule(s) in my RECEIVED hand sorted ham folder / Number of emails in the folder

6 hits / 7231 total


Also, in 5 days of spam, it marked 79 messages out of 150 with no FPs spotchecking 10.

Also in 2 days highly marked spam, it marked 2068 messages out of 2813 with no FPs found again spotchecking.

Looking at why things hit/didn't hit when they should/shouldn't.

In the received folder, it hit on Uber Receipt, UberEATS ad, Uber invitation.

In the spam2learn folder, it hit all appropriate items and would have helped.

Beyond that, I did not do any overlap analysis to see if this duplicated other RBLs.
Comment 13 Dave Jones 2017-10-17 12:42:04 UTC
I am seeing very good results doing analysis the past few days.  Is the next step to get it committed to the default SA ruleset with a very low score?  Here is what I have been using for over a month with much higher scores on my production mail filtering platform:

ifplugin Mail::SpamAssassin::Plugin::DNSEval

header		__RCVD_IN_LASHBACK	eval:check_rbl('lashback', 'ubl.unsubscore.com.')
describe	__RCVD_IN_LASHBACK	Received is listed in Lashback ubl.unsubscore.com
tflags		__RCVD_IN_LASHBACK	net

header		RCVD_IN_LASHBACK	eval:check_rbl_sub('lashback', '127.0.0.2')
describe	RCVD_IN_LASHBACK	Received is listed in Lashback ubl.unsubscore.com
score		RCVD_IN_LASHBACK	0.1
tflags		RCVD_IN_LASHBACK	net

header		RCVD_IN_LASHBACK_LASTEXT	eval:check_rbl('lashback-lastexternal', 'ubl.unsubscore.com.')
describe 	RCVD_IN_LASHBACK_LASTEXT	Last external is listed in Lashback ubl.unsubscore.com
score		RCVD_IN_LASHBACK_LASTEXT	0.2
tflags		RCVD_IN_LASHBACK_LASTEXT	net

endif
Comment 14 Dave Jones 2017-10-19 00:12:21 UTC
Anyone else testing out these rules?  Can we get some feedback and add them to the default SA ruleset with low scores for testing?

The results on my production and spamtrap instances are good.  RCVD_IN_LASHBACK hits are lining up very well with spam and blacklisted messages.
Comment 15 Dave Jones 2017-10-19 00:13:52 UTC
*** Bug 6167 has been marked as a duplicate of this bug. ***
Comment 16 Dave Jones 2017-11-19 21:06:45 UTC
Now that we have the ruleset updates rolling again, I would like to put these rules in with low scores to start testing.  This could cause a high volume of DNS queries to ubl.unsubscore.com.

Mike Augustine,
Are you ready for this new DNS load on ubl.unsubscore.com?  I am only seeing a single DNS server ns1.unsubscore.com hosting the ubl subdomain.  That's a little odd since the parent unsubscore.com has both ns1 and ns2.  Usually there would be the same NS records or more (not less) on the subdomain -- especially not a single DNS server.  Even if 64.38.116.15 is BGP-backed by a number of DNS servers, there should be at least 2 NS records following best practices.

Is ns1.unsubscore.com and ns2.unsubscore.com BGP-backed by multiple DNS servers around the world?  I have no way to estimate the DNS volume but I know it's going to be significant.  Once it's enabled, it could take 24 hours to disable if there is a problem.  I don't want this to DOS your DNS servers.

I guess we could put a version check around the new rules and start out with 3.4.1 to limit the DNS queries to those running the latest version of SA.  Then if that is OK we would lower the version number until all "modern" versions are covered then remove the version check completely.  This is the only way I know of to ease it in slowly.
Comment 17 Kevin A. McGrail 2017-11-20 13:43:52 UTC
I have it in testing and believe it is worthy of inclusion in the default rules but the key things are agreement with the inclusion policy and making sure that we won't overwhelm their infrastructure.
Comment 18 Mike Augustine 2017-11-20 19:10:49 UTC
I am doing the necessary vetting to insure that we can handle the additional requests and looking into the NS record issue.
Comment 19 Dave Jones 2017-11-22 16:13:58 UTC
Just an FYI.  I am seeing about 500,000 unique IPs on my SA mirrors running sa-update the past two days.  It's not an exact number of SA instances due to NAT and not everyone could be running sa-update regularly, but it's an rough number to work with.

Due to DNS caching and SA instances pointing to ISP/Google/OpenDNS/etc. DNS servers, this doesn't mean that the ubl.unsubscore.com will be hit directly by half-a-million IPs.

Also, in addition to the single ns1 NS record issue for ubl.unsubscore.com, it's NS record TTL is set to 600 seconds.  If this is a static NS record, then it should be at least 3600 or 7200.  Most of the time NS records should be 86400 or higher unless there is some specific reason that NS records need to change quickly for some advanced HA setup.  Check out google.com's NS records which are set at days not minutes.

Currently if ns1.ubl.unsubscore.com went offline for more than 600 seconds for any reason, then the whole zone will drop off of the Internet -- DNS caches would flush all records and not be able serve responses to new DNS queries to ubl.unsubscore.com.

https://intodns.com/ubl.unsubscore.com

Another thing, the SOA serial is 16 which is a bit odd too since this zone should be changing every few minutes when records are added/updated/removed.  The SOA serial is really only used in traditional slaving but it's also informative of DNS hosting health.
Comment 20 Kevin A. McGrail 2017-11-22 16:19:21 UTC
I've ddosed datacenters accidentally pointing RBLs to the wrong place before.  The amount of traffic especially in handling states can be a lot.
Comment 21 Henrik Krohns 2019-08-12 11:18:07 UTC
Doesn't look much of a performer, hits many ham, even outlook.com mx.

# zgrep RCVD_IN_LASHBACK_LASTEXT= mail.log* |grep 'Passed CLEAN' |wc -l
17
# zgrep RCVD_IN_LASHBACK_LASTEXT= mail.log* |grep 'Blocked SPAM' |wc -l
868

# zgrep RCVD_IN_BRBL_LASTEXT= mail.log* |grep 'Passed CLEAN' |wc -l
5
# zgrep RCVD_IN_BRBL_LASTEXT= mail.log* |grep 'Blocked SPAM' |wc -l
3290

# zegrep 'RCVD_IN_[XPS]BL=' mail.log* |grep 'Passed CLEAN' |wc -l
0
# zegrep 'RCVD_IN_[XPS]BL=' mail.log* |grep 'Blocked SPAM' |wc -l
3695
Comment 22 AXB 2019-08-20 16:58:21 UTC
(In reply to Henrik Krohns from comment #21)
> Doesn't look much of a performer, hits many ham, even outlook.com mx.
> 
> # zgrep RCVD_IN_LASHBACK_LASTEXT= mail.log* |grep 'Passed CLEAN' |wc -l
> 17
> # zgrep RCVD_IN_LASHBACK_LASTEXT= mail.log* |grep 'Blocked SPAM' |wc -l
> 868
> 
> # zgrep RCVD_IN_BRBL_LASTEXT= mail.log* |grep 'Passed CLEAN' |wc -l
> 5
> # zgrep RCVD_IN_BRBL_LASTEXT= mail.log* |grep 'Blocked SPAM' |wc -l
> 3290
> 
> # zegrep 'RCVD_IN_[XPS]BL=' mail.log* |grep 'Passed CLEAN' |wc -l
> 0
> # zegrep 'RCVD_IN_[XPS]BL=' mail.log* |grep 'Blocked SPAM' |wc -l
> 3695

imo this could be dropped