6884 – uribl_all_types.t test is intermittently failing

Bug 6884 - uribl_all_types.t test is intermittently failing

Summary: uribl_all_types.t test is intermittently failing

Status:	RESOLVED FIXED

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	Regression Tests (show other bugs)
Version:	SVN Trunk (Latest Devel Version)
Hardware:	PC Windows 7

Importance:	P2 blocker
Target Milestone:	3.4.0
Assignee:	SpamAssassin Developer Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-01-04 17:52 UTC by Kevin A. McGrail
Modified:	2013-01-17 01:52 UTC (History)
CC List:	1 user (show)

Attachment	Type	Modified	Status	Actions	Submitter/CLA Status
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kevin A. McGrail 2013-01-04 17:52:53 UTC

- We have failed the last 5 buidls 8280 through 8285 such as: 

- Test failure output example: https://builds.apache.org/job/SpamAssassin-trunk/8284/testReport/make_test/t_uribl_all_types_t/test______2/

- make test on my systems works without issue

Thoughts:

- Might be wrong output in the rules report vis a vis a bug similar to freemail where we haveincorrect output going into the report

- Might be DNS related but Mark doesn't think so

- Try and get the log output from Jenkins which KAM doesn't know how to do

Per Michael:
One thing you could do is modify the tests/jenkins to archive the logs so that you can go to the web interface and review them.  I'd actually be surprised if that wasn't already done.

It's been awhile since I messed with the source, but I think just adding a tar czvf log_output.tgz WORKDIR/t/log/* or whatever the path actually is, would get you a .tgz of the logs and then modify the Jenkins config to archive that file along with the results.

Comment 1 Kevin A. McGrail 2013-01-04 20:48:15 UTC

The underlying cause is identified by Mark in Bug 6880.

*** This bug has been marked as a duplicate of bug 6880 ***

Comment 2 Mark Martinec 2013-01-04 21:02:45 UTC

> The underlying cause is identified by Mark in Bug 6880.

Not quite: the Bug 6880 is certainly the final drop that
broke the uribl_all_types test unconditionally,

but the intermittent variability in the sub-description
still remains. On my host it fails in about 50% of runs.

Comment 3 Mark Martinec 2013-01-04 21:04:45 UTC

Not entirely a duplicate, the original failure reason is still there.

Comment 4 Kevin A. McGrail 2013-01-04 21:38:37 UTC

(In reply to comment #3)
> Not entirely a duplicate, the original failure reason is still there.

Once I added a description, my ability to replicate the failure is gone.

Descriptions are stored in a custom Tie Hash of a single string.  I wonder if there is a bug in that storage involved rules with no descriptions?

Anyway, I'll leave this open and see what fails on the next jenkins build.

Comment 5 Kevin A. McGrail 2013-01-05 00:41:04 UTC

(In reply to comment #3)
> Not entirely a duplicate, the original failure reason is still there.

As Jenkin's is now stable, I will close this ticket but if you can find any rhyme or reason or way to duplicate the issue that might lead to the REPORT leakage issue, please feel free to re-open.

Comment 6 Mark Martinec 2013-01-05 01:20:30 UTC

I'm beginning to understand the issue.

These are relevant DNS queries from the failing test:

$ host -t A uribl-example-a.com.dnsbltest.spamassassin.org
uribl-example-a.com.dnsbltest.spamassassin.org has address 127.0.0.2

$ host -t A uribl-example-b.com.dnsbltest.spamassassin.org
uribl-example-b.com.dnsbltest.spamassassin.org has address 127.0.0.4

$ host -t A uribl-example-c.com.dnsbltest.spamassassin.org
uribl-example-c.com.dnsbltest.spamassassin.org has address 127.0.0.6

$ host -t A 98.3.137.144.dnsbltest.spamassassin.org       
98.3.137.144.dnsbltest.spamassassin.org has address 127.0.0.2

and here are the corresponding rules from the test:

  urirhssub  X_URIBL_IPSONLY  dnsbltest.spamassassin.org.    A 2
  body       X_URIBL_IPSONLY  eval:check_uridnsbl('X_URIBL_IPSONLY')
  describe   X_URIBL_IPSONLY  X_URIBL_IPSONLY
  tflags     X_URIBL_IPSONLY  net ips_only

  urirhssub  X_URIBL_DOMSONLY  dnsbltest.spamassassin.org.    A 4
  body       X_URIBL_DOMSONLY  eval:check_uridnsbl('X_URIBL_DOMSONLY')
  describe   X_URIBL_DOMSONLY  X_URIBL_DOMSONLY
  tflags     X_URIBL_DOMSONLY  net domains_only

Note that both the uribl-example-a.com.dnsbltest.spamassassin.org
as well as the 98.3.137.144.dnsbltest.spamassassin.org produce a
reply 127.0.0.2, which matches the '2' subrule filter of X_URIBL_IPSONLY,
so this rule collects both the 'uribl-example-a' and the '98.3.137.144'
as a rule sub-description - but there is room for only one answer,
so whichever comes last(?) stays there. It is a pure luck or a timing
thing (or maybe a Perl version) if Jenkins happens to always return
the same result. In my case I see one or the other result with about
the same probability (perl 5.17.7).

I see two open issues here:

- allow multiple hits to show in the sub-description (this is already
  mentioned in the code as a TODO);

- does the uribl-example-a.com.dnsbltest.spamassassin.org with its
  127.0.0.2 really merit the hit on X_URIBL_IPSONLY (despite a match
  on a filter), considering its tflag ips_only ?

Some determinism needs to be introduced here before this ticket
can be closed, although it probably isn't a blocker, as the above
test case is somewhat synthetic and should rarely occur in practice.

Comment 7 Mark Martinec 2013-01-08 16:17:38 UTC

The reason why it happens to give consistent results on Jenkins,
and why it fails in 2/3 of runs on my host is in a version of perl:


perl5176delta - what is new for perl v5.17.6

Per process hash randomization
  The seed used by Perl's hash function is now random.  This means that
  the order which keys/values will be returned from functions like
  "keys()", "values()", and "each()" will differ from run to run.

  This change was introduced to make Perl's hashes more robust to
  algorithmic complexity attacks, and also because we discovered that it
  exposes hash ordering dependency bugs and makes them easier to track
  down.

  Toolchain maintainers might want to invest in additional infrastructure
  to test for things like this.  Running tests several times in a row and
  then comparing results will make it easier to spot hash order
  dependencies in code.  Authors are strongly encouraged not to expose
  the key order of Perl's hashes to insecure audiences.



Lucily/unluckily this exposes a bug in

  SA::Plugin::URIDNSBL::complete_dnsbl_lookup

where a hit is inappropriately scored for all rules
with the same subrule filter.

Comment 8 Kevin A. McGrail 2013-01-08 22:25:09 UTC

(In reply to comment #7)
> Lucily/unluckily this exposes a bug in
> 
>   SA::Plugin::URIDNSBL::complete_dnsbl_lookup
> 
> where a hit is inappropriately scored for all rules
> with the same subrule filter.

You get bonus points for digging this one up!

Comment 9 Mark Martinec 2013-01-11 18:19:54 UTC

I'm still spinning in circles, thinking how to fix the URIDNSBL.pm.

Meanwhile, here is some cosmetics and a debugging aid check-in,
along with fixes for what appears to be an innocent bug in
PerMsgStatus.pm, namely the:

  $self->{test_log_msgs} = ();

seems a silly way of saying $self->{test_log_msgs} = undef.
What was apparently intended is a %{$self->{test_log_msgs}} = (),
which deletes both keys but keeps a hash cleaned for reuse
by the next rule.


trunk:
  Bug 6884: $pms->{test_log_msgs} is a hashref, not a hash;
  some cosmetics and debugging in Plugin/URIDNSBL.pm'
Sending lib/Mail/SpamAssassin/PerMsgStatus.pm
Sending lib/Mail/SpamAssassin/Plugin/Check.pm
Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending t/uribl_all_types.t
Committed revision 1432200.

Comment 10 Mark Martinec 2013-01-12 02:31:09 UTC

> Meanwhile, ...

Another (loosely related) DNS hardening detail:

trunk:
  DnsResolver: prevent a failed or suppressed DNS query packet generation
  from propagating unnecessarily
Sending lib/Mail/SpamAssassin/Dns.pm
Sending lib/Mail/SpamAssassin/DnsResolver.pm
Sending lib/Mail/SpamAssassin/Plugin/ASN.pm
Sending lib/Mail/SpamAssassin/Plugin/AskDNS.pm
Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Committed revision 1432359.

Comment 11 Mark Martinec 2013-01-15 19:32:48 UTC

... getting there, looks promising ...

Comment 12 Mark Martinec 2013-01-16 11:55:53 UTC

Found another test case which the current code gets wrong, and
the t/uribl.t (test 5) covers the failure by claiming that
this is an expected result.

The test 5 in t/uribl.t is:

  urifullnsrhssub X_URIBL_FULL_NS  dnsbltest.spamassassin.org.  A 8
  body       X_URIBL_FULL_NS  eval:check_uridnsbl('X_URIBL_FULL_NS')
  tflags     X_URIBL_FULL_NS  net

The sample message is t/data/spam/dnsbl.eml, which contains
an URL http://spamassassin.org/.

The urifullnsrhssub is supposed to collect NS records of spamassassin.org,
which are: a.auth-ns.sonic.net, b.auth-ns.sonic.net, c.auth-ns.sonic.net
and ns.hyperreal.org, then submit these host full host names (not just
their domains) to a DNS BL test zone dnsbltest.spamassassin.org:

  $ host -t a hyperreal.org.dnsbltest.spamassassin.org
  hyperreal.org.dnsbltest.spamassassin.org has address 127.0.0.8

  $ host -t a ns.hyperreal.org.dnsbltest.spamassassin.org
  Host ns.hyperreal.org.dnsbltest.spamassassin.org not found: 3(NXDOMAIN)

Note that ns.hyperreal.org.dnsbltest.spamassassin.org is *not* in the
zone, so rule X_URIBL_FULL_NS should *not* have fired.

The hyperreal.org.dnsbltest.spamassassin.org however *is* in the BL zone,
so the other test rule X_URIBL_NS does fire correctly:

  urinsrhssub X_URIBL_NS  dnsbltest.spamassassin.org.  A 8

... yet because X_URIBL_NS happens to have the same subtest (8) as
X_URIBL_FULL_NS, it incorrectly drags the X_URIBL_FULL_NS with itself.



Btw: seems like subroutines lookup_a, lookup_ptr, lookup_mx, and
lookup_mx_exists in Dns.pm are not used anywhere??? Drop?
(And the lookup_ns is only used during startup by is_dns_available()
in the same module.)

Comment 13 Kevin A. McGrail 2013-01-16 15:45:15 UTC

Good catch on uribl.t as well.  I'm moving this to a blocker status.

BTW, re: "subroutines lookup_a, lookup_ptr, lookup_mx, and
lookup_mx_exists in Dns.pm are not used anywhere??? Drop?
(And the lookup_ns is only used during startup by is_dns_available()
in the same module.)"

I would say not to drop them.  Too likely we will need those functions at some point in my opinion.

Regards,
KAM

Comment 14 Mark Martinec 2013-01-16 20:15:25 UTC

(In reply to comment #13)
> Good catch on uribl.t as well.  I'm moving this to a blocker status.
> 
> BTW, re: "subroutines lookup_a, lookup_ptr, lookup_mx, and
> lookup_mx_exists in Dns.pm are not used anywhere??? Drop?
> (And the lookup_ns is only used during startup by is_dns_available()
> in the same module.)"
> 
> I would say not to drop them.  Too likely we will need those functions at
> some point in my opinion.
> 
> Regards,
> KAM

Brace yourselves, here comes a biggie!

Better not look at the diff :)  there are just too many - following
a couple of days of fixing & rewriting lead one thing to another,
finding new code simplification opportunities and cleanups.
Looks like all the tests are passing now, and our production
mail server seems happy too.

The original approach in URIDNSBL.pm just didn't hold water.
Grouping rules under a zone and subrule hashes and launching a
single DNS query for these rules lost information on which
queried domains belong to each rule and what are their flags,
leading to unrelated rules being hit when another rule triggered
with the same zone and subrule but a different query domain.

The changed approach now pretends to launch a DNS query for each
uri rule, but avoids actual duplicate DNS queries by letting a
new subroutine AsyncLoop::bgsend_and_start_lookup() take care
or grouping equal DNS queries into a single launch, while dealing
with individual callbacks behind the scenes. This has the side
effect of simplifying several code sections in URIDNSBL.pm, ASN.pm,
AskDNS.pm and Dns.pm, without increasing the number of DNS queries
(in some cases even reducing it). It also fixes the problem in
AskDNS.pm, issuing warnings about 'such lookup has already been issued'.
It also revealed several unused code sections, leftovers from the
distant past before URIDNSBL.pm and AsyncLoop came to be.

While at it, it came naturally to also add a long-desired feature:

+=item dns_query_restriction (allow|deny) domain1 domain2 ...
+
+Option allows disabling of rules which would result in a DNS query to one of
+the listed domains. The first argument must be a literal C<allow> or C<deny>,
+remaining arguments are domains names.
+
+Most DNS queries (with some exceptions) are subject to dns_query_restriction.
+A domain to be queried is successively stripped-off of its leading labels
+(thus yielding a series of its parent domains), and on each iterations a
+check is made against an associative array generated by dns_query_restriction
+options. Search stops at the first match (i.e. a tightest match), and the
+matching entry with its C<allow> or C<deny> value then controls if a DNS query
+is allowed to be launched.
+
+If no match is found an implicit default is to allow all queries. The purpose
+of an explicit C<allow> entry is to be able to override a previously configured
+C<deny> on the same domain or to override an entry (possibly still to be
+configured) on one of its parent domains. Thus an 'allow zen.spamhaus.org'
+with a 'deny spamhaus.org' would permit DNS queries on a specific DNS BL zone
+but deny queries to other zones under the same parent domain 'spamhaus.org'.
+
+Domains are matched case-insensitively, no wildcards are recognized,
+there should be no leading or trailing dot.
+
+Specifying blocked domains has a similar effect as setting a score of such
+rules to zero, and can be a handy alternative to hunting for such rules
+when a site policy does not allow certain DNS block lists to be queried.
+
+Example:
+  dns_query_restriction deny  dnswl.org surbl.org
+  dns_query_restriction allow zen.spamhaus.org
+  dns_query_restriction deny  spamhaus.org mailspike.net spamcop.net

+=item clear_dns_query_restriction
+
+The option removes any entries entered by previous 'dns_query_restriction'
+options, leaving the list empty, i.e. allowing DNS queries for any domain
+(including any DNS BL zone).


So here it comes:
- a new subroutine AsyncLoop::bgsend_and_start_lookup() implements a
  commonly used idiom of calling bgsend(), followed by start_lookup(),
  while implementing a mechanism for grouping equal DNS queries,
  simplifying several other code sections;
- rewrite URIDNSBL.pm to put the new mechanism to good use,
  fixing the particular Bug 6884 and test 5 in t/uribl.t;
- fix the t/uribl.t, moving X_URIBL_FULL_NS from %patterns
  to %anti_patterns;
- add a dns_query_restriction configuration option;
- some cosmetic edits, like renaming some variables for consistency.

trunk:
  Sending lib/Mail/SpamAssassin/AsyncLoop.pm
  Sending lib/Mail/SpamAssassin/Conf.pm
  Sending lib/Mail/SpamAssassin/Dns.pm
  Sending lib/Mail/SpamAssassin/DnsResolver.pm
  Sending lib/Mail/SpamAssassin/Plugin/ASN.pm
  Sending lib/Mail/SpamAssassin/Plugin/AskDNS.pm
  Sending lib/Mail/SpamAssassin/Plugin/DNSEval.pm
  Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Committed revision 1434360.


Please pay attention to any anomalies I may have missed ;)

Comment 15 Mark Martinec 2013-01-16 20:22:50 UTC

(In reply to comment #13)
>   BTW, re: "subroutines lookup_a, lookup_ptr, lookup_mx, and
>   lookup_mx_exists in Dns.pm are not used anywhere??? Drop?
> I would say not to drop them.  Too likely we will need those
> functions at some point in my opinion.

Kevin, I ditched them anyway, sorry :)
Seems these subroutines pre-date the bgsend / start_lookup and
callbacks through AsyncLoop. If a need for these would re-appear,
these would need to be rewritten anyway with the new DNS model.
Just left the lookup_ns, which is only needed during startup
when an automatic determination of DNS availability is asked for.

Comment 16 Mark Martinec 2013-01-16 20:27:23 UTC

 forgot to check-in t/uribl.t
Sending t/uribl.t
Committed revision 1434367.

Comment 17 Kevin A. McGrail 2013-01-17 01:18:24 UTC

(In reply to comment #15)
> (In reply to comment #13)
> >   BTW, re: "subroutines lookup_a, lookup_ptr, lookup_mx, and
> >   lookup_mx_exists in Dns.pm are not used anywhere??? Drop?
> > I would say not to drop them.  Too likely we will need those
> > functions at some point in my opinion.
> 
> Kevin, I ditched them anyway, sorry :)

No worries.

In working to mark this resolved and I pass disttest with 5.8.6 but I dit NOT pass with 5.17.8 for t/whitelist_addrs.t

t/whitelist_addrs.t ............... 11/35       Not found: spamyes = X-Spam-Status: Yes at t/whitelist_addrs.t line 65.
# Failed test 11 in t/SATest.pm at line 751
Output can be examined in: log/d.whitelist_addrs/11
t/whitelist_addrs.t ............... 23/35       Not found: spamyes = X-Spam-Status: Yes at t/whitelist_addrs.t line 154.
# Failed test 23 in t/SATest.pm at line 751 fail #2
Output can be examined in: log/d.whitelist_addrs/23
t/whitelist_addrs.t ............... 35/35       Not found: spamyes = X-Spam-Status: Yes at t/whitelist_addrs.t line 245.
# Failed test 35 in t/SATest.pm at line 751 fail #3
Output can be examined in: log/d.whitelist_addrs/35
t/whitelist_addrs.t ............... Failed 3/35 subtests 

Thoughts?

Comment 18 Kevin A. McGrail 2013-01-17 01:23:10 UTC

(In reply to comment #17)
> (In reply to comment #15)
> > (In reply to comment #13)
> > >   BTW, re: "subroutines lookup_a, lookup_ptr, lookup_mx, and
> > >   lookup_mx_exists in Dns.pm are not used anywhere??? Drop?
> > > I would say not to drop them.  Too likely we will need those
> > > functions at some point in my opinion.
> > 
> > Kevin, I ditched them anyway, sorry :)
> 
> No worries.
> 
> In working to mark this resolved and I pass disttest with 5.8.6 but I dit
> NOT pass with 5.17.8 for t/whitelist_addrs.t
> 
> t/whitelist_addrs.t ............... 11/35       Not found: spamyes =
> X-Spam-Status: Yes at t/whitelist_addrs.t line 65.
> # Failed test 11 in t/SATest.pm at line 751
> Output can be examined in: log/d.whitelist_addrs/11
> t/whitelist_addrs.t ............... 23/35       Not found: spamyes =
> X-Spam-Status: Yes at t/whitelist_addrs.t line 154.
> # Failed test 23 in t/SATest.pm at line 751 fail #2
> Output can be examined in: log/d.whitelist_addrs/23
> t/whitelist_addrs.t ............... 35/35       Not found: spamyes =
> X-Spam-Status: Yes at t/whitelist_addrs.t line 245.
> # Failed test 35 in t/SATest.pm at line 751 fail #3
> Output can be examined in: log/d.whitelist_addrs/35
> t/whitelist_addrs.t ............... Failed 3/35 subtests 
> 
> Thoughts?

Found the issue.  Ignore this for a moment...

Comment 19 Kevin A. McGrail 2013-01-17 01:52:05 UTC

(In reply to comment #18)
> > In working to mark this resolved and I pass disttest with 5.8.6 but I dit
> > NOT pass with 5.17.8 for t/whitelist_addrs.t

First, marking this ticket resolved.

Second, the issue with whitelist_addrs.t is that it requires DB_File module for several of the tests.

I added logic to test for DB_File otherwise skip ALL the tests.  

svn commit -m 'Changes for t/whitelist_addrs.t to check for DB_File - bug 6884' t/
Sending        t/whitelist_addrs.t
Transmitting file data .
Committed revision 1434534.