Bug 6525 - Disable NJABL
Summary: Disable NJABL
Status: RESOLVED LATER
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.3.1
Hardware: PC Windows 7
: P2 major
Target Milestone: 3.3.2
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-26 01:54 UTC by Warren Togami
Modified: 2011-08-11 18:31 UTC (History)
6 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Warren Togami 2010-12-26 01:54:09 UTC
http://marc.info/?l=spamassassin-users&m=129333296720946&w=2

It seems that NJABL is long dead.  We continue to do network queries but it catches almost nothing.  It is time to disable it in our default rules.

How do we go about disabling it in the the sa-update channels?

Do our current policies require us to vote on this change?
Comment 1 AXB 2010-12-26 02:23:27 UTC
FWI: my mailfow shows lots of hits.
-1 towards disabling it
Comment 3 AXB 2010-12-26 02:48:37 UTC
(In reply to comment #2)
> What does "a lot" mean?
> 
> My own logs for the past few months show 1.2% hit rate on NJABL*.  The
> masschecks seem to indicate a similarly low hit rate on NJABL* and a relatively
> high FP rate.
> 
> http://www.spamhaus.org/faq/answers.lasso?section=DNSBL%20Usage
> This page seems to indicate that NJABL is now redundant, incorporated as part
> of Spamhaus.  Redundant is bad.

Not all.
NJBL's Dynablock is included. And some of the bot related data.

> In any case, if we were considering NJABL as a new DNSBL in spamassassin today
> it wouldn't even come close to being acceptable.  Given its very low
> performance, it wouldn't effect us much at all to simply turn it off.

Do you have a new RH/Fedora release coming up?
Comment 4 Warren Togami 2010-12-26 02:57:33 UTC
> Do you have a new RH/Fedora release coming up?

No.  I don't work there anymore.  I'm just trying to improve spamassassin.

In any case, NJABL is by far the worst performing of our default DNSBL's.  While it is catching something, it is not anywhere near enough to make it worthwhile to keep enabled.

>> http://www.spamhaus.org/faq/answers.lasso?section=DNSBL%20Usage
>> This page seems to indicate that NJABL is now redundant, incorporated as part
>> of Spamhaus.  Redundant is bad.

> Not all.
> NJBL's Dynablock is included. And some of the bot related data.

OK, which return code is dynablock?  Can we hit only on that, if that is the non-reduction portion of NJABL?

Oh wait, even then the hit rate is too insignificant to make keeping NJABL worthwhile.
Comment 5 Warren Togami 2010-12-26 02:59:08 UTC
http://en.wikipedia.org/wiki/Dynablock
"Updates of Dynablock stopped December 2003 but it became the basis for NJABL and SORBS own dynamic IP lists. The dynamic list parts of NJABL and SORBS have been developed independently since then, with NJABL using the 'dynablock' name for their list. In early 2007, NJABL passed their data along to The Spamhaus Project [1], for using in their PBL [2] service."
Comment 6 Warren Togami 2010-12-26 03:02:11 UTC
http://www.njabl.org/dynablock.html
Oh geez.  Read that.
Comment 7 AXB 2010-12-26 03:13:24 UTC
(In reply to comment #6)
> http://www.njabl.org/dynablock.html
> Oh geez.  Read that.

that's oooooold news.
There's more than just Dynablock to NJABL.
For those not using Spamhaus (due to load restrictions/budget) NJABL still offers considerable value.
Comment 8 Warren Togami 2010-12-26 03:19:38 UTC
You are ignoring the raw statistics.  ALMOST NOTHING is hitting on NJABL, and what is hitting is almost always redundant to Spamhaus.  Thus it is not worthwhile to query NJABL by default.  We should turn it off.  The user may enable it manually if they (incorrectly) believe it would help them.
Comment 9 AXB 2010-12-26 03:29:00 UTC
(In reply to comment #8)
> You are ignoring the raw statistics.  ALMOST NOTHING is hitting on NJABL, and
> what is hitting is almost always redundant to Spamhaus.  Thus it is not
> worthwhile to query NJABL by default.  We should turn it off.  The user may
> enable it manually if they (incorrectly) believe it would help them.

yep.. I ignore the SA statistics and base my comments based on active mailflow.
Comment 10 Warren Togami 2010-12-26 03:46:22 UTC
What % of spam on your server(s) hit NJABL, and what % of those hits are overlaps with Spamhaus?

Could you please quantify what you are seeing with numbers?
Comment 11 AXB 2010-12-26 04:16:55 UTC
(In reply to comment #10)
> What % of spam on your server(s) hit NJABL, and what % of those hits are
> overlaps with Spamhaus?
> 
> Could you please quantify what you are seeing with numbers?

would take ages to get more detailed data.. but quick & dirty:

After bad_helo / rdns patterns, etc rejects
zcat /mnt/maillog/maillog-2010-Dec* |  grep 'zen\.spamhaus\.org'  | wc -l
12747354

 zcat /mnt/maillog/maillog-2010-Dec* |  grep 'multi\.uribl\.com'  | wc -l
442752


After rejects:
zcat /mnt/maillog/maillog-2010-Dec* | grep 'is spam' | grep NJABL | wc -l
6360

NJABL hits may overlap with Spamcop which is good for the extra bit confidence.
Mostly 419 & asian spam (relays/proxies)

Pls note that since DSBL and ORBS's death NJABL is the only list alive which still admittedly lists proxies and relays.

It's been well mantained and has been a trustworthy data source since way before SA showed up on the map.
Comment 12 Warren Togami 2010-12-26 04:27:15 UTC
Thank you.

It seems your own numbers agree that it is an extremely low hit-rate.

I don't dispute that it might add "extra confidence" in overlaps, but I still strongly suggest that the negligible benefit of NJABL is not worth the cost.  If we were considering the worthiness of adding NJABL today it wouldn't make it.

NJABL's rare hits this past month on my server is overlapping other rules to such an extent that the average spam score is 30+.  Meanwhile, I see a relatively large amount of FP's to this tiny amount of spam hits.
Comment 13 AXB 2010-12-26 04:30:47 UTC
(In reply to comment #12)
> Thank you.
> 
> It seems your own numbers agree that it is an extremely low hit-rate.
> 
> I don't dispute that it might add "extra confidence" in overlaps, but I still
> strongly suggest that the negligible benefit of NJABL is not worth the cost. 
> If we were considering the worthiness of adding NJABL today it wouldn't make
> it.
> NJABL's rare hits this past month on my server is overlapping other rules to
> such an extent that the average spam score is 30+.  Meanwhile, I see a
> relatively large amount of FP's to this tiny amount of spam hits.

Lets see what other have to say to this:

as said I'm 100%

-1 towards removing it.
Comment 14 Kevin A. McGrail 2010-12-26 11:01:32 UTC
> as said I'm 100%
> 
> -1 towards removing it.

I have to also vote -1.  I don't believe any rbl is perfect and the extra confidence is worth it.

FPs are a different story.  Are you seeing FPs? 

Regards,
KAM
Comment 15 Warren Togami 2010-12-26 14:54:44 UTC
Yes, on a regular basis, and the masschecks indicate that I am not alone.

I don't understand why you folks are ignoring the fact that this rule is hitting so incredibly rarely.  AXB's own logs confirm it.  Masschecks confirm it.  It is completely irrational to ignore the statistics, and I disagree that this truly is "extra confidence".
Comment 16 mouss 2010-12-26 16:13:02 UTC
(In reply to comment #15)
> Yes, on a regular basis, and the masschecks indicate that I am not alone.
> 
> I don't understand why you folks are ignoring the fact that this rule is
> hitting so incredibly rarely.  AXB's own logs confirm it.  Masschecks confirm
> it.  It is completely irrational to ignore the statistics, and I disagree that
> this truly is "extra confidence".

+1 for removal. 

after all, zen.spamahaus.org already contains a subset of njabl.
Comment 17 Henrik Krohns 2010-12-26 17:02:31 UTC
-1 for removing, since there seems to be nothing inherently bad about it compared to other rules SA has.

It's up to the user to remove redundant or really bad performing rules, depending on the local mail flow. If we went this route, half of the SA rule base which overlapped and had little hits (on the mass checks mind you) would be disabled by default, leaving the user guessing what should be manually enabled. Same goes for the proposed removing of dnswl by default..
Comment 18 Warren Togami 2010-12-29 09:15:44 UTC
http://www.sdsc.edu/~jeff/spam/cbc.html
SDSC's weekly statistics agreeing with our own statistics.  Both NJABL and rfc-ignorant.org are catching 1% or less of spam.
Comment 19 Henrik Krohns 2010-12-29 10:22:13 UTC
You have nothing but -1 from committers, and lots of reasons why not to remove it. Perhaps you should move to some more worthwile cause.
Comment 20 Warren Togami 2010-12-29 13:45:20 UTC
> It's up to the user to remove redundant or really bad performing rules,
> depending on the local mail flow. If we went this route, half of the SA rule
> base which overlapped and had little hits (on the mass checks mind you) would
> be disabled by default, leaving the user guessing what should be manually

Where do we draw the line of acceptability?

It is already 1% weak.  Would 0.5% or 0% be obviously bad enough to warrant removal?

What does this say about the level of acceptability of adding new DNSBL's?

> enabled. Same goes for the proposed removing of dnswl by default..

Who proposed this?  I didn't.

> You have nothing but -1 from committers, and lots of reasons why not to remove
> it. Perhaps you should move to some more worthwile cause.

All the committers are against this, while on users@ not a single person there is in favor of keeping it after shown the raw numbers.  Is the real issue here reluctance to change something of this nature outside of a major release?  If that is the real issue then I can understand given that this is only temporary.

Would committers in general be more open to consider removal of these two network queries at the next rescore masscheck?
Comment 21 Warren Togami 2010-12-29 13:53:11 UTC
> I have to also vote -1.  I don't believe any rbl is perfect and the extra
> confidence is worth it.

> FPs are a different story.  Are you seeing FPs? 

> [reply] [-] Comment 15 Warren Togami 2010-12-26 14:54:44 UTC
> Yes, on a regular basis, and the masschecks indicate that I am not alone.

Hi.  No response to this regarding FP's?

I'm sorry, but the raw numbers do not support the idea of "extra confidence" when the spam hit rate is this low and relative FP's are this high.  None of the counter-arguments have are being argued from the statistics.

Again, the overall impact is so negligible here that it is not worthwhile to argue if this is upsetting you.  If this is really an issue of "we don't make changes like this between major releases" then I'm willing to defer this proposal for now.
Comment 22 Henrik Krohns 2010-12-29 14:18:46 UTC
> Hi.  No response to this regarding FP's?
> 
> I'm sorry, but the raw numbers do not support the idea of "extra confidence"
> when the spam hit rate is this low and relative FP's are this high.  None of
> the counter-arguments have are being argued from the statistics.
> 
> Again, the overall impact is so negligible here that it is not worthwhile to
> argue if this is upsetting you.  If this is really an issue of "we don't make
> changes like this between major releases" then I'm willing to defer this
> proposal for now.

What do you mean by relative FPs? I get see much more FP counts with BRBL and friends, that's what counts. Relative or not.

NJABL_SPAM seems to be "worst" of the bunch and is scored lower. NJABL_PROXY seems to useful rule without THAT much overlap. What other blacklists scan for open proxies?

I'm open for score tuning, but as already said, how will you tell users to enable some disabled by default rule? Yeah why don't we just enable maybe two of the best BLs to save traffic, and write in documentation that they can uncomment others if they want..
Comment 23 Henrik Krohns 2010-12-29 14:44:06 UTC
Btw the njabl rules use deep parsing, might be useful to test with lastexternal too.
Comment 24 AXB 2010-12-29 14:51:48 UTC
can we please close this "bug" as it's clearly not a bug.
Comment 25 Warren Togami 2010-12-29 15:02:23 UTC
> Btw the njabl rules use deep parsing, might be useful to test with lastexternal
> too.

This is a good point, but we can't do an apples to apples comparison because the existing rules are "reuse" and based on existing tagged mail.  But we know that this can only reduce the already tiny hit rate even further.

> can we please close this "bug" as it's clearly not a bug.

AXB, I have great respect for you, but I have to strongly disagree with you on this.  Even your own numbers in Comment #11 support my position.

> Regarding FP's

I have to go right now, I'll look deeper into this (if they are largely due to deep parsing or not) when I get back.

Meanwhile does anyone have any comment on the questions in Comment #20?

I am willing to defer this if this is really an issue of "we don't make changes like this between major releases".
Comment 26 AXB 2010-12-29 15:10:33 UTC
(In reply to comment #25)
> > Btw the njabl rules use deep parsing, might be useful to test with lastexternal
> > too.
> 
> This is a good point, but we can't do an apples to apples comparison because
> the existing rules are "reuse" and based on existing tagged mail.  But we know
> that this can only reduce the already tiny hit rate even further.
> 
> > can we please close this "bug" as it's clearly not a bug.
> 
> AXB, I have great respect for you, but I have to strongly disagree with you on
> this.  Even your own numbers in Comment #11 support my position.

You may disagree.
I agree that its has low hit rate, BUT its still valuable for the reasons previously mentioned which is why it should stay and you should put creative energy in something innovative & worthwile.


> > Regarding FP's
> 
> I have to go right now, I'll look deeper into this (if they are largely due to
> deep parsing or not) when I get back.
> 
> Meanwhile does anyone have any comment on the questions in Comment #20?
> 
> I am willing to defer this if this is really an issue of "we don't make changes
> like this between major releases".

Who mentioned a release or anything close to it? 
No need to defer. Bury it. NJABL is not going anywhere.
Comment 27 D. Stussy 2010-12-29 20:47:18 UTC
OK, so it catches only 1%.  What matters is whether that 1% is caught by any other list or mechanism.  If NOT, and if not falsing, then it should remain.  Even so, redundancy is NOT bad, especially when there's a failure.  Zero hits would be a different story.
Comment 28 Warren Togami 2010-12-29 21:22:44 UTC
> I agree that its has low hit rate, BUT its still valuable for the reasons
> previously mentioned which is why it should stay and you should put creative
> energy in something innovative & worthwile.

Please don't take this personally, but I find your tone to be hurtful.  I am indeed putting energy into various other "worthwhile" approaches to help improve spamassassin in different ways.

* Building a better spam trap with 100+ abandoned domain names spread across multiple servers.  Those feeds are going to different DNSBL's for reputation data.
* Our masscheck participation seems to be at an all-time low.  Recruiting a team of nightly masscheck participants who will also help train and recruit others.  Improving the documentation to make it a far less confusing process.
* Bug #6530: Evaluating Tiopan DNSBL
* Bug #6529: Evaluate Nix Spam DNSBL
* Subject: Floating Scores for Local-Only Rules? trying to find a good use for Adam Katz' rules.  Comments needed.
* Bug #6527: mkrules erroneously omits nopublish rules from masscheck when wrapped in ifplugin
* Soon: Analysis: Is deep parsing really a good thing?  (Spamcop mainly.)
* Soon: Discussion about the rule update policies and procedures.  It seems I am not the only one confused about the current status.

This does not change my opinion that NJABL and rfc-ignorant.org are not worthwhile for us to keep.  This new question about deep parsing vs. lastexternal should be investigated.  It is not trivial to test this because our data on deep parsing is based on --reuse, so I need to test lastexternal separately with data going forward.

> Where do we draw the line of acceptability?
> It is already 1% weak.  Would 0.5% or 0% be obviously bad enough to warrant
> removal?
> What does this say about the level of acceptability of adding new DNSBL's?

Meanwhile, I would really appreciate opinions on this question from committers.
Comment 29 Justin Mason 2010-12-30 05:33:40 UTC
(In reply to comment #27)
> OK, so it catches only 1%.  What matters is whether that 1% is caught by any
> other list or mechanism.  If NOT, and if not falsing, then it should remain. 
> Even so, redundancy is NOT bad, especially when there's a failure.  Zero hits
> would be a different story.

this is a very good point.  The "traditional" way to evaluate rule effectiveness was as a percentage of spam flow, but that's not really relevant in SpamAssassin, since we already have a large ruleset which can match 80-90% of spam.  When evaluating rules, determine how well they work against false positives (for DNSWL rules) or false negatives (for DNSBLs).  if you like, include "borderline" correct diagnoses, e.g. for DNSBLs, spam mails that get 5-7 points as well.

rules that fire only on spams that score over 10 points already are worthless to add (or keep).
Comment 30 AXB 2010-12-30 06:05:17 UTC
FTR: I'm seeing increasing NJABL hits on 419s and what NJABL states as proxies/relay.
Many of these seem to be hacked/exploited Exchange 2010 boxes.

In overall stats it would be few hits but its adding the extra nudge to low scored spams.
Comment 31 Warren Togami 2011-01-20 04:23:03 UTC
It was impossible to do an apples to apples comparison here of deep parsing vs. lastexternal because there isn't lastexternal data to reuse.  But it does appear that most of the non-overlapping hits and some of the FP's were the result of deep parsing.

An approach that I'd like to analyze later is to measure NJABL && !Spamhaus to concentrate on the non-overlapping portion that folks here seem to want most.

I am closing this with the intent to explore it again later.  We have more important fish to fry elsewhere.
Comment 32 Ben Dugdale 2011-08-11 17:55:12 UTC
I'll give a +1 for removing NJABL but for another reason.  

We had a user get an infection on their PC and get our mail server (64.18.62.36) on the dnsbl.njabl.org list.  Rightly so.  The problem is that the list administration is unresponsive to requests to be removed now that we have corrected the problem.  I take it that nobody is monitoring the removal address.  This behavior is going to lead to false positives in the long run.

PS.  I'm sure you will all assume that we run a lousy mail system (it seems most do) and it's true that something slipped past us but FYI we block direct-mail, only accept mail submissions for authenticated users, do not relay for anyone, publish a SPF record, etc.  We really strive to provide a responsible and RFC-compliant mail system.  

I guess the next step for us is outbound spam filtering and more careful outbound rate limiting.  Even then, the whole system is reactionary and does not help for the first $n entities to get hit with an infection.  Someone needs to responsibly handle RBL removal requests or the system is broken.
Comment 33 D. Stussy 2011-08-11 18:31:39 UTC
RE:  Reply #32.  I looked at what the DNSBL said and attempted removal for you as well.  The system told me you were only listed last week (04 Aug 2011) and that automatic removal requests were denied.  I did find it puzzling that the alleged evidence is a "mailbox full DSN" (over quota), not a virus infection.  However, per your story, the listing reason you gave is valid.

As for your [manual] delisting request and apparent non-response, I note that since this will require manual intervention by a real person, perhaps not enough time has passed for such to be reviewed and happen.  Also, as some mail servers queue messages for up to 2 weeks (15 days), there still could be virus messages originating from your server floating around the Internet.  Therefore, de-listing could be premature.

  cf.  http://dnsbl.njabl.org/cgi-bin/lookup.cgi?query=64.18.62.36

From the point of view of adminstering SA, what matters is how many [correct] hits this DNSBL has which are not covered by other DNSBLs.  Although your complant about an apparent lack of maintenance regarding the list may have merit, it is merely one consideration.  However, a single instance isn't enough; we'd need a pattern of no maintenance or a pattern of false results.

The fact that you were [automatically] added for a correct reason (even thought the "evidence" is wrong) tells us that the NJABL is not completely dead.  Therefore, let's keep this "bug" as "resolved/later" and therefore closed.

Should there be a periodic (annual?) review of all DNSBLs used by SA, based on masscheck results, looking specifically at hit rates where a single DNSBL hits?