Bug 7265

Summary: DNS resolving breaks with Net::DNS 1.03
Product: Spamassassin Reporter: eserte12
Component: LibrariesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: major CC: apache, kmcgrail, quanah, sidney
Priority: P2    
Version: 3.4.1   
Target Milestone: 3.4.2   
Hardware: PC   
OS: All   
Whiteboard:
Attachments: Proposed patch: use our own bgread(), matching our own existing bgsend()

Description eserte12 2015-11-14 18:22:02 UTC
If Net::DNS 1.03 is installed, then t/dnsbl_subtests.t fails (checked with various perl versions from 5.18.4 to 5.23.4 on a Debian/wheezy system):

	Not found: X_URIBL_Y_2A =  X_URIBL_Y_2A  at t/dnsbl_subtests.t line 303.
# Failed test 2 in t/SATest.pm at line 755
	Not found: X_URIBL_Y_2B =  X_URIBL_Y_2B  at t/dnsbl_subtests.t line 303.
# Failed test 3 in t/SATest.pm at line 755 fail #2
	Not found: X_URIBL_Y_2C =  X_URIBL_Y_2C  at t/dnsbl_subtests.t line 303.
# Failed test 4 in t/SATest.pm at line 755 fail #3
	Not found: X_URIBL_Y_2D =  X_URIBL_Y_2D  at t/dnsbl_subtests.t line 303.
# Failed test 5 in t/SATest.pm at line 755 fail #4
	Not found: X_URIBL_Y_2E =  X_URIBL_Y_2E  at t/dnsbl_subtests.t line 303.
# Failed test 6 in t/SATest.pm at line 755 fail #5
	Not found: X_URIBL_Y_2F =  X_URIBL_Y_2F  at t/dnsbl_subtests.t line 303.
# Failed test 7 in t/SATest.pm at line 755 fail #6
	Not found: X_URIBL_Y_2G =  X_URIBL_Y_2G  at t/dnsbl_subtests.t line 303.
# Failed test 8 in t/SATest.pm at line 755 fail #7
	Not found: X_URIBL_Y_ANY =  X_URIBL_Y_ANY  at t/dnsbl_subtests.t line 303.
# Failed test 9 in t/SATest.pm at line 755 fail #8
	Not found: X_URIBL_Y_2A =  X_URIBL_Y_2A  at t/dnsbl_subtests.t line 303.
# Failed test 17 in t/SATest.pm at line 755 fail #9
	Not found: X_URIBL_Y_2B =  X_URIBL_Y_2B  at t/dnsbl_subtests.t line 303.
# Failed test 18 in t/SATest.pm at line 755 fail #10
	Not found: X_URIBL_Y_2C =  X_URIBL_Y_2C  at t/dnsbl_subtests.t line 303.
# Failed test 19 in t/SATest.pm at line 755 fail #11
	Not found: X_URIBL_Y_2D =  X_URIBL_Y_2D  at t/dnsbl_subtests.t line 303.
# Failed test 20 in t/SATest.pm at line 755 fail #12
	Not found: X_URIBL_Y_2E =  X_URIBL_Y_2E  at t/dnsbl_subtests.t line 303.
# Failed test 21 in t/SATest.pm at line 755 fail #13
	Not found: X_URIBL_Y_2F =  X_URIBL_Y_2F  at t/dnsbl_subtests.t line 303.
# Failed test 22 in t/SATest.pm at line 755 fail #14
	Not found: X_URIBL_Y_2G =  X_URIBL_Y_2G  at t/dnsbl_subtests.t line 303.
# Failed test 23 in t/SATest.pm at line 755 fail #15
	Not found: X_URIBL_Y_3 =  X_URIBL_Y_3  at t/dnsbl_subtests.t line 303.
# Failed test 24 in t/SATest.pm at line 755 fail #16
	Not found: X_URIBL_Y_ANY =  X_URIBL_Y_ANY  at t/dnsbl_subtests.t line 303.
# Failed test 25 in t/SATest.pm at line 755 fail #17
	Not found: X_URIBL_Y_255A =  X_URIBL_Y_255A  at t/dnsbl_subtests.t line 303.
# Failed test 31 in t/SATest.pm at line 755 fail #18
	Not found: X_URIBL_Y_255B =  X_URIBL_Y_255B  at t/dnsbl_subtests.t line 303.
# Failed test 32 in t/SATest.pm at line 755 fail #19
	Not found: X_URIBL_Y_2A =  X_URIBL_Y_2A  at t/dnsbl_subtests.t line 303.
# Failed test 33 in t/SATest.pm at line 755 fail #20
	Not found: X_URIBL_Y_2B =  X_URIBL_Y_2B  at t/dnsbl_subtests.t line 303.
# Failed test 34 in t/SATest.pm at line 755 fail #21
	Not found: X_URIBL_Y_2C =  X_URIBL_Y_2C  at t/dnsbl_subtests.t line 303.
# Failed test 35 in t/SATest.pm at line 755 fail #22
	Not found: X_URIBL_Y_2D =  X_URIBL_Y_2D  at t/dnsbl_subtests.t line 303.
# Failed test 36 in t/SATest.pm at line 755 fail #23
	Not found: X_URIBL_Y_2E =  X_URIBL_Y_2E  at t/dnsbl_subtests.t line 303.
# Failed test 37 in t/SATest.pm at line 755 fail #24
	Not found: X_URIBL_Y_2F =  X_URIBL_Y_2F  at t/dnsbl_subtests.t line 303.
# Failed test 38 in t/SATest.pm at line 755 fail #25
	Not found: X_URIBL_Y_2G =  X_URIBL_Y_2G  at t/dnsbl_subtests.t line 303.
# Failed test 39 in t/SATest.pm at line 755 fail #26
	Not found: X_URIBL_Y_FFA =  X_URIBL_Y_FFA  at t/dnsbl_subtests.t line 303.
# Failed test 40 in t/SATest.pm at line 755 fail #27
	Not found: X_URIBL_Y_FFB =  X_URIBL_Y_FFB  at t/dnsbl_subtests.t line 303.
# Failed test 41 in t/SATest.pm at line 755 fail #28
	Not found: X_URIBL_Y_FFC =  X_URIBL_Y_FFC  at t/dnsbl_subtests.t line 303.
# Failed test 42 in t/SATest.pm at line 755 fail #29
t/dnsbl_subtests.t ................ 
Failed 29/46 subtests 


Statistical analysis suggests that the failure is caused by Net::DNS 1.03:

****************************************************************
Regression 'mod:Net::DNS'
****************************************************************
Name                   Theta          StdErr     T-stat
[0='const']           1.0000          0.0000    81615992991178720.00
[1='eq_0.78']         0.0000          0.0000       3.51
[2='eq_0.79']         0.0000          0.0000       2.62
[3='eq_0.80']         0.0000          0.0000       3.27
[4='eq_0.82']         0.0000          0.0000       1.91
[5='eq_0.83']         0.0000          0.0000       4.31
[6='eq_1.01']         0.0000          0.0000       4.10
[7='eq_1.02']         0.0000          0.0000       7.02
[8='eq_1.03']        -1.0000          0.0000    -68978103730387456.00

R^2= 1.000, N= 48, K= 9
****************************************************************
Comment 1 Quanah Gibson-Mount 2015-11-14 18:53:07 UTC
As already reported on the SA users list, you cannot use Net::DNS 1.03 with SpamAssassin as the Net::DNS authors made a major API change in a point release.  Do not upgrade to Net::DNS 1.03 as it may break anything depending on the old API.

This is generally the fault of the Net::DNS authors as major API changes that are not backwards compatible should never be made in a point release.
Comment 2 Mark Martinec 2015-11-19 15:24:24 UTC
trunk:
  Sending lib/Mail/SpamAssassin/DnsResolver.pm
Committed revision 1715196.
Comment 3 Mark Martinec 2015-11-19 15:26:24 UTC
Created attachment 5348 [details]
Proposed patch: use our own bgread(), matching our own existing bgsend()
Comment 4 Mark Martinec 2015-11-19 15:32:46 UTC
3.4:
  Sending lib/Mail/SpamAssassin/DnsResolver.pm
Committed revision 1715197.
Comment 5 Kevin A. McGrail 2015-11-19 15:55:36 UTC
Thanks for the workaround.  Can't believe they didn't revert the ip change though.
Comment 7 Mark Martinec 2015-11-19 16:35:03 UTC
Actually DNS resolving in a DKIM plugin seems to be still broken.
Working on it...
Comment 8 Mark Martinec 2015-11-19 19:23:44 UTC
> Actually DNS resolving in a DKIM plugin seems to be still broken.
> Working on it...

The DKIM plugin was broken by Net::DNS 1.03 for a different reason:

- Net::DNS 1.03 has a bug in that it sends a barrage of queries
  up to a retry count in one go, with a timeout of 0, so it
  depends on a luck and a DNS resolver's cache whether a reply
  is received in a few milliseconds or not - which is why a dkim
  test fails more likely then not. This bug is still undocumented.

- our fix (in a proposed patch above) would be able to avoid the
  bug in Net::DNS, but due to a bug in Mail/SpamAssassin/Plugin/DKIM.pm
  it failed to provide our own resolver to Mail::DKIM, so it used
  the problematic Net::DNS::Resolver;


This now fixes our bug in MS::Plugin::DKIM:

--- lib/Mail/SpamAssassin/Plugin/DKIM.pm        (revision 1715244)
+++ lib/Mail/SpamAssassin/Plugin/DKIM.pm        (working copy)
@@ -794,7 +794,8 @@
         # Only do so if EDNS0 provides a reasonably-sized UDP payload size,
         # as our interface does not provide a DNS fallback to TCP, unlike
         # the Net::DNS::Resolver::send which does provide it.
-        my $res = $self->{main}->{resolver}->get_resolver;
+        my $res = $self->{main}->{resolver};
+        dbg("dkim: providing our own resolver: %s", ref $res);
         Mail::DKIM::DNS::resolver($res);
       }
     }


trunk:
  Sending lib/Mail/SpamAssassin/Plugin/DKIM.pm
Committed revision 1715245.

3.4:
  Sending lib/Mail/SpamAssassin/Plugin/DKIM.pm
Committed revision 1715248.
Comment 9 Mark Martinec 2015-11-20 01:18:07 UTC
> - Net::DNS 1.03 has a bug in that it sends a barrage of queries
>   up to a retry count in one go, with a timeout of 0, so it
>   depends on a luck and a DNS resolver's cache whether a reply
>   is received in a few milliseconds or not - which is why a dkim
>   test fails more likely then not. This bug is still undocumented.

Actually perhaps not a bug, but a change in semantics
of "retry" and "retrans" options with Net::DNS 1.03,
which caused the failure of DNS lookups by a DKIM plugin. 

Now documented at [rt.cpan.org #109183] :

  https://rt.cpan.org/Ticket/Display.html?id=109183
Comment 10 Quanah Gibson-Mount 2015-12-11 04:28:23 UTC
(In reply to Mark Martinec from comment #8)

> --- lib/Mail/SpamAssassin/Plugin/DKIM.pm        (revision 1715244)
> +++ lib/Mail/SpamAssassin/Plugin/DKIM.pm        (working copy)
> @@ -794,7 +794,8 @@
>          # Only do so if EDNS0 provides a reasonably-sized UDP payload size,
>          # as our interface does not provide a DNS fallback to TCP, unlike
>          # the Net::DNS::Resolver::send which does provide it.
> -        my $res = $self->{main}->{resolver}->get_resolver;
> +        my $res = $self->{main}->{resolver};
> +        dbg("dkim: providing our own resolver: %s", ref $res);
>          Mail::DKIM::DNS::resolver($res);
>        }
>      }

I think this fix is incomplete, as Mail::DKIM also uses the Net::DNS resolver at:

              if (Mail::DKIM::AuthorDomainPolicy->UNIVERSAL::can("fetch")) {
                dbg("dkim: adsp: performing lookup on _adsp._domainkey.%s",
                    $author_domain);
                # get our Net::DNS::Resolver object
                my $res = $self->{main}->{resolver}->get_resolver;
                $practices = Mail::DKIM::AuthorDomainPolicy->fetch(
                               Protocol => "dns", Domain => $author_domain,
                               DnsResolver => $res);
              }
              1;

which is around line 1046 in DKIM.pm.  I would expect we want to use the native resolver in both places?
Comment 11 Kevin A. McGrail 2015-12-11 04:31:07 UTC
Good point.  Though should we consider 1.03 a burned release and require other versions instead of working around it?
Comment 12 Quanah Gibson-Mount 2015-12-11 04:39:35 UTC
(In reply to Kevin A. McGrail from comment #11)
> Good point.  Though should we consider 1.03 a burned release and require
> other versions instead of working around it?

I'm seeing a really odd issue atm with SA 3.4.1 and Net::DNS 1.04.  I've now backed out the Net::DNS 1.03 specific changes to SA to see if that resolves it.

Basically since I deployed Net::DNS 1.04 and SA 3.4.1 patched, I'm seeing a lot of:

Dec 10 20:24:46 zre-ldap002 postfix/amavisd/smtpd[31292]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
Dec 10 20:25:31 zre-ldap002 postfix/dkimmilter/smtpd[31801]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]


However, since amavisd *also* uses Net::DNS via Net::DNS::Resolver::Programmable, I don't know if the problem is with Amavis or SA's usage of Net::DNS. ;)  Will know shortly after some testing.
Comment 13 Quanah Gibson-Mount 2015-12-11 05:09:43 UTC
(In reply to Quanah Gibson-Mount from comment #12)

> However, since amavisd *also* uses Net::DNS via
> Net::DNS::Resolver::Programmable, I don't know if the problem is with Amavis
> or SA's usage of Net::DNS. ;)  Will know shortly after some testing.

Yeah, this is a bug in amavis.  I wish it had a bug tracker too.

As to the question here... It may be safer overall, to use the SA specific resolver since it exists now, and Net::DNS is such a crapshoot.  Definitely the DKIM.pm should be consistent on which one it uses, though. :)

--Quanah
Comment 14 Quanah Gibson-Mount 2015-12-11 05:19:25 UTC
(In reply to Quanah Gibson-Mount from comment #13)
> (In reply to Quanah Gibson-Mount from comment #12)
> 
> > However, since amavisd *also* uses Net::DNS via
> > Net::DNS::Resolver::Programmable, I don't know if the problem is with Amavis
> > or SA's usage of Net::DNS. ;)  Will know shortly after some testing.
> 
> Yeah, this is a bug in amavis.  I wish it had a bug tracker too.

Ok, not amavis, either.  There's just some rogue program on the system connecting to localhost and then never sending anything, so this is noise I can ignore. ;)
Comment 15 Quanah Gibson-Mount 2015-12-12 22:12:12 UTC
(In reply to Kevin A. McGrail from comment #11)
> Good point.  Though should we consider 1.03 a burned release and require
> other versions instead of working around it?

Yes, I think 1.03 should be blacklisted, regardless.

Mark's work on using SA's own resolver for Mail::DKIM appears to be working fine in my lab.  It's probably easiest, long term, to make Mail::DKIM use that in both locations.  Net::DNS of course continues to be required for other bits, but this way we are safe in the future if they change the Net::DNS resolver behavior once again.
Comment 16 Mark Martinec 2015-12-14 17:03:03 UTC
(In reply to Quanah Gibson-Mount from comment #10)
> > --- lib/Mail/SpamAssassin/Plugin/DKIM.pm        (revision 1715244)
> > +++ lib/Mail/SpamAssassin/Plugin/DKIM.pm        (working copy)
> > -        my $res = $self->{main}->{resolver}->get_resolver;
> > +        my $res = $self->{main}->{resolver};
> > +        dbg("dkim: providing our own resolver: %s", ref $res); 
> I think this fix is incomplete, as Mail::DKIM also uses the Net::DNS
> resolver at: [...]
> which is around line 1046 in DKIM.pm.  I would expect we want to use the
> native resolver in both places?

You are right, it would make sense to use the same resolver in both
places in the DKIM plugin.

Now that 1.04 reverted the incompatible 'retry' semantics change, we
have again a choice of using the Net::DNS resolver object with our
settings (the one at {resolver}->get_resolver), or to use our private
send/receive methods as other DNS lookups in SA use (i.e. our private
resolver object at $self->{main}->{resolver}).

The benefit of using the Net::DNS native resolver is that it falls back
to a TCP query in case the UDP reply returns a 'truncated' flag (which
can happen with long DKIM public keys and a local recursive DNS server
not supporting EDNS0 long packets, or having some broken firewall in
the path. The benefit of using our own send/receive code is some
independence of future changes in Net::DNS. Haven't decided what would
be the best choice.


> I'm seeing a really odd issue atm with SA 3.4.1 and Net::DNS 1.04.
> I've now backed out the Net::DNS 1.03 specific changes to SA to see
> if that resolves it.
> Basically since I deployed Net::DNS 1.04 and SA 3.4.1 patched,
> I'm seeing a lot of:
> Dec 10 20:24:46 zre-ldap002 postfix/amavisd/smtpd[31292]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
> However, since amavisd *also* uses Net::DNS via Net::DNS::Resolver::Programmable, > I don't know if the problem is with Amavis or SA's usage of Net::DNS. ;)

Amavis does not use Net::DNS::Resolver::Programmable, nor Net::DNS.
All that amavis has to do with Net::DNS is to provide a pre-built
Net::DNS::Resolver object and pass it to Mail::DKIM.
Comment 17 Kevin A. McGrail 2015-12-16 16:29:16 UTC
On a similar note, Mark, based on comment 7, is the SPF plugin also affected by this?
Comment 18 Kevin A. McGrail 2015-12-16 16:49:07 UTC
(In reply to Kevin A. McGrail from comment #17)
> On a similar note, Mark, based on comment 7, is the SPF plugin also affected
> by this?

Mark's comment from the list:

Not sure about SPF. It's supposed to be fixed in the current 3.4 branch
and in trunk, which is why I'm not seeing a problem with Net::DNS 1.03
or Net::DNS 1.04. Will check how the released version of 3.4.1 fares
with Net::DNS 1.04 regarding SPF. The emergency patches were applied
to FreeBSD ports, don't know about other distributions.

Here are the relevant bug entries:
  https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7231
  https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7265

There were three changes in New::DNS that affected SpamAssassin.
The change in 1.01 affected SpamAssassin due to our sloppy
parsing of Net::DNS results. The changes brought by 1.03
affected SpamAssassin on two fronts, both are due to an
incompoatible API change in Net::DNS: different object class
expected by bgread (which affected a handful of other
Perl modules too), and a change in semantics of "retry" and
"retrans" options, which affected DKIM plugin.

  Mark
Comment 19 Kevin A. McGrail 2015-12-16 17:33:42 UTC
More input:

I forgot about the Bug 7223, which also affects Net::DNS 1.01
or later:

    https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223


Tried it now with 3.4.1 and Net::DNS 1.04.

You still need to apply the patch from Bug 7223 (in addition
to a patch from Bug 7231), then it passes all tests with
Net::DNS 1.04 (even without patches from Bug 7265).

Seems easiest to install SpamAssassin from a svn 3.4 branch
( svn checkout http://svn.apache.org/repos/asf/spamassassin/branches/3.4 spamassassin-3.4 )
or downgrade Net::DNS to a pre-1.* version (i.e. 0.83).
Comment 20 Henrik Krohns 2016-01-01 18:37:14 UTC
Revision 1715196 breaks something. Or just prints ugly stuff. Sorry no time to investigate.

Ubuntu 14.04 stock perl etc, with SA installed from trunk:

libnet-dns-perl                     0.68-1.2build1


_WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751, <GEN11> line 9.
_WARN: bgread: received a 116 bytes packet from 127.0.0.1, decoded 0 bytes
_WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751, <GEN11> line 9.
_WARN: bgread: received a 99 bytes packet from 127.0.0.1, decoded 0 bytes
_WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751.
_WARN: bgread: received a 132 bytes packet from 127.0.0.1, decoded 0 bytes
_WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751.
_WARN: bgread: received a 119 bytes packet from 127.0.0.1, decoded 0 bytes
_WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751.
_WARN: bgread: received a 149 bytes packet from 127.0.0.1, decoded 0 bytes
Comment 21 Mark Martinec 2016-06-15 17:18:03 UTC
> Revision 1715196 breaks something. Or just prints ugly stuff. Sorry no time
> to investigate.
> Ubuntu 14.04 stock perl etc, with SA installed from trunk:
> 
> _WARN: Argument "" isn't numeric in sprintf at
> /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751,
> <GEN11> line 9.
> _WARN: bgread: received a 116 bytes packet from 127.0.0.1, decoded 0 bytes

Don't know the reason. If it occurs infrequently this is fine
and just the excessive warning needs fixing (a fix below).
If "decoded 0 bytes" occurs regularly this needs investigation
(please report the Net::DNS version if this is the case).


trunk:

Bug 7265: fix warnings 'Argument "" isn't numeric in sprintf'
as reported in Comment 20, reordered comments for clarity
  Sending lib/Mail/SpamAssassin/DnsResolver.pm
Committed revision 1748609.
Comment 22 Henrik Krohns 2016-07-08 05:48:12 UTC
(In reply to Mark Martinec from comment #21)
> 
> Don't know the reason. If it occurs infrequently this is fine
> and just the excessive warning needs fixing (a fix below).
> If "decoded 0 bytes" occurs regularly this needs investigation
> (please report the Net::DNS version if this is the case).

Using trunk again, getting warnings on everything..

Ubuntu 14.04.1
libnet-dns-perl                     0.68-1.2build1

It seems Net::DNS::Packet doesn't return length!!

return wantarray ? ( $self, $@ ) : $self;

Compared to the latest version..

return wantarray ? ( $self, $offset ) : $self;
Comment 23 Sidney Markowitz 2017-04-16 00:37:52 UTC
I built an Ubuntu 14.04 VM and confirmed that trunk and 3.4 branch get the decode 0 bytes error on various network tests, so no problem reproducing this.

Can we get this fixed for the 3.4.2 release?
Comment 24 Sidney Markowitz 2017-04-16 05:43:13 UTC
Ok, I fixed it in trunk. The length return from Net::DNS::Packet is only used to output a warning so the test for the warning can be skipped if no length value is returned. I tested that it works in Ubuntu 14.04.

Committed revision 1791568
Comment 25 Sidney Markowitz 2017-04-16 06:04:33 UTC
ported to branch 3.4, should be ready for 3.4.2 release now
Committed revision 1791569

Mark, could you check if this bug can be closed?
Comment 26 Kevin A. McGrail 2017-04-16 11:54:26 UTC
(In reply to Sidney Markowitz from comment #25)
> ported to branch 3.4, should be ready for 3.4.2 release now
> Committed revision 1791569
> 
> Mark, could you check if this bug can be closed?

Sidney, did you add a test for this issue or is there a good method to test?
Comment 27 Sidney Markowitz 2017-04-16 12:18:11 UTC
Oh, right, what we were getting were error messages not test failures. I guess there should be a test that makes it fail when it happens. I can look at that tomorrow if nobody comes up with one first. However, I only looked at the error message that was showing up in Ubuntu 11.04. Someone else has to look at this issue itself and decide if it is ready to be closed.
Comment 28 Kevin A. McGrail 2018-08-28 03:13:44 UTC
As Sidney mentions, yes, this can be closed.  While a test case might be nice, the trunk and 3.4 fixes resolve the issue.