Bug 3234 - RCVD_IN_SBL, RCVD_IN_XBL: missing hits
Summary: RCVD_IN_SBL, RCVD_IN_XBL: missing hits
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 normal
Target Milestone: 3.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 3208
  Show dependency tree
 
Reported: 2004-04-02 18:34 UTC by Justin Mason
Modified: 2004-04-27 15:13 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2004-04-02 18:34:07 UTC
70.464  85.6872   0.6750    0.992   1.00    0.00  __RCVD_IN_SBL_XBL
  0.000   0.0000   0.0000    0.500   0.11    1.00  RCVD_IN_XBL
  9.139  11.0924   0.1830    0.984   0.88    1.27  RCVD_IN_SBL

now I can believe the RCVD_IN_SBL figures, but _XBL seems broken, and
the subrule is *way* too high AFAICS.
Comment 1 Daniel Quinlan 2004-04-03 00:22:33 UTC
Subject: Re:  New: RCVD_IN_SBL, RCVD_IN_XBL: missing hits

Justin Mason <jm@jmason.org> writes:

> 70.464  85.6872   0.6750    0.992   1.00    0.00  __RCVD_IN_SBL_XBL
>   0.000   0.0000   0.0000    0.500   0.11    1.00  RCVD_IN_XBL
>   9.139  11.0924   0.1830    0.984   0.88    1.27  RCVD_IN_SBL
> 
> now I can believe the RCVD_IN_SBL figures, but _XBL seems broken, and
> the subrule is *way* too high AFAICS.

The __RCVD_IN_SBL_XBL results are actually correct.  The RCVD_IN_XBL is
not being hit because spamhaus.org changed the format of the TXT record
returned by the SBL-XBL blacklist.  It used to include "/xbl", but now
it doesn't:

  63.137.169.67.sbl-xbl.spamhaus.org =>
    "http://www.spamhaus.org/query/bl?ip=67.169.137.63"

Our rule is looking for (case-insensitive) "/xbl" so we could do a TXT
query to get the informative URLs *and* do a single query for SBL-XBL to
reduce network traffic and processing.

SBL still does include "/sbl", so that rule continues to work:

  32.55.114.82.sbl-xbl.spamhaus.org =>
    "http://www.spamhaus.org/SBL/sbl.lasso?query=SBL13063"

I can think of several possible solutions:

1. See if we can get SpamHaus to include "/xbl" (or something similarly
   definitive in the TXT result and hope it doesn't change again.
2. Query with type ANY and use TXT if we get it, but use the A for the
   SBL and XBL rules.
3. Revert to using A queries only, no TXT.
4. Do separate TXT queries for SBL and XBL.

My inclination was option 2 which I proceeded to implement, but before I
got too far, I received some weird results from SBL-XBL for an IP in
their databases where some of the answers were missing.  Worse, I
ultimately realized that it's impossible to attach the correct TXT
record to the correct A record result so the log entry appears in the
right place.  If we could reliably recognize the TXT record, then there
would be no issue to begin with.  I wished I had figured this out before
coding myself into an inevitable logical corner (*).

So, we're left with options 1, 3, or 4.

Oh, here's the weirdness:

------- start of cut text --------------
$ host -a 122.140.64.218.sbl-xbl.spamhaus.org
Trying "122.140.64.218.sbl-xbl.spamhaus.org"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38331
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 10, ADDITIONAL: 0

;; QUESTION SECTION:
;122.140.64.218.sbl-xbl.spamhaus.org. IN        ANY

;; ANSWER SECTION:
122.140.64.218.sbl-xbl.spamhaus.org. 2406 IN TXT "http://www.spamhaus.org/query/bl?ip=218.64.140.122"
122.140.64.218.sbl-xbl.spamhaus.org. 2406 IN TXT "http://www.spamhaus.org/SBL/sbl.lasso?query=SBL15322"

;; AUTHORITY SECTION:
sbl-xbl.spamhaus.org.   85206   IN      NS      n.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   85206   IN      NS      q.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   85206   IN      NS      t.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   85206   IN      NS      w.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   85206   IN      NS      x.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   85206   IN      NS      y.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   85206   IN      NS      z.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   85206   IN      NS      a.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   85206   IN      NS      c.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   85206   IN      NS      e.ns.spamhaus.org.

Received 344 bytes from 127.0.0.1#53 in 102 ms
------- end ----------------------------

and a few minutes later...

------- start of cut text --------------
$ host -a 122.140.64.218.sbl-xbl.spamhaus.org
Trying "122.140.64.218.sbl-xbl.spamhaus.org"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42032
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 10, ADDITIONAL: 0

;; QUESTION SECTION:
;122.140.64.218.sbl-xbl.spamhaus.org. IN        ANY

;; ANSWER SECTION:
122.140.64.218.sbl-xbl.spamhaus.org. 3555 IN A  127.0.0.2
122.140.64.218.sbl-xbl.spamhaus.org. 3555 IN A  127.0.0.4
122.140.64.218.sbl-xbl.spamhaus.org. 2344 IN TXT "http://www.spamhaus.org/query/bl?ip=218.64.140.122"
122.140.64.218.sbl-xbl.spamhaus.org. 2344 IN TXT "http://www.spamhaus.org/SBL/sbl.lasso?query=SBL15322"

;; AUTHORITY SECTION:
sbl-xbl.spamhaus.org.   86355   IN      NS      n.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   86355   IN      NS      q.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   86355   IN      NS      t.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   86355   IN      NS      w.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   86355   IN      NS      x.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   86355   IN      NS      y.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   86355   IN      NS      z.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   86355   IN      NS      a.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   86355   IN      NS      c.ns.spamhaus.org.
sbl-xbl.spamhaus.org.   86355   IN      NS      e.ns.spamhaus.org.

Received 376 bytes from 127.0.0.1#53 in 91 ms
------- end ----------------------------

(*) We could use a subrule regexp looking for the IP or the "/xbl"
    string and it would "work" if we changed the logging code to ignore
    IPs if there is already a log entry and have TXT logs overwrite IP
    logs, but it's really too horrible to contemplate.

Comment 2 Justin Mason 2004-04-05 12:30:50 UTC
holy crap, that's a good hit-rate then ;)

I'd suggest #1 preferred, #3 second-best.
Comment 3 Justin Mason 2004-04-27 21:30:46 UTC
hmm.  I think this is fixed, right Dan?
Comment 4 Daniel Quinlan 2004-04-27 23:13:34 UTC
 72.228  82.6655   0.5057    0.994   1.00    0.00  __RCVD_IN_SBL_XBL
 65.624  75.1177   0.3886    0.995   0.99    1.00  RCVD_IN_XBL
  7.920   9.0556   0.1171    0.987   0.88    1.27  RCVD_IN_SBL


quite fixed