|
SA Bugzilla – Full Text Bug Listing |
Summary: | DNS resolving breaks with Net::DNS 1.03 | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | eserte12 |
Component: | Libraries | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | apache, kmcgrail, quanah, sidney |
Priority: | P2 | ||
Version: | 3.4.1 | ||
Target Milestone: | 3.4.2 | ||
Hardware: | PC | ||
OS: | All | ||
Whiteboard: | |||
Attachments: | Proposed patch: use our own bgread(), matching our own existing bgsend() |
Description
eserte12
2015-11-14 18:22:02 UTC
As already reported on the SA users list, you cannot use Net::DNS 1.03 with SpamAssassin as the Net::DNS authors made a major API change in a point release. Do not upgrade to Net::DNS 1.03 as it may break anything depending on the old API. This is generally the fault of the Net::DNS authors as major API changes that are not backwards compatible should never be made in a point release. trunk: Sending lib/Mail/SpamAssassin/DnsResolver.pm Committed revision 1715196. Created attachment 5348 [details]
Proposed patch: use our own bgread(), matching our own existing bgsend()
3.4: Sending lib/Mail/SpamAssassin/DnsResolver.pm Committed revision 1715197. Thanks for the workaround. Can't believe they didn't revert the ip change though. For reference: https://rt.cpan.org/Public/Bug/Display.html?id=108745 http://marc.info/?l=spamassassin-users&m=144737403507883 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204656 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204682 Actually DNS resolving in a DKIM plugin seems to be still broken. Working on it... > Actually DNS resolving in a DKIM plugin seems to be still broken. > Working on it... The DKIM plugin was broken by Net::DNS 1.03 for a different reason: - Net::DNS 1.03 has a bug in that it sends a barrage of queries up to a retry count in one go, with a timeout of 0, so it depends on a luck and a DNS resolver's cache whether a reply is received in a few milliseconds or not - which is why a dkim test fails more likely then not. This bug is still undocumented. - our fix (in a proposed patch above) would be able to avoid the bug in Net::DNS, but due to a bug in Mail/SpamAssassin/Plugin/DKIM.pm it failed to provide our own resolver to Mail::DKIM, so it used the problematic Net::DNS::Resolver; This now fixes our bug in MS::Plugin::DKIM: --- lib/Mail/SpamAssassin/Plugin/DKIM.pm (revision 1715244) +++ lib/Mail/SpamAssassin/Plugin/DKIM.pm (working copy) @@ -794,7 +794,8 @@ # Only do so if EDNS0 provides a reasonably-sized UDP payload size, # as our interface does not provide a DNS fallback to TCP, unlike # the Net::DNS::Resolver::send which does provide it. - my $res = $self->{main}->{resolver}->get_resolver; + my $res = $self->{main}->{resolver}; + dbg("dkim: providing our own resolver: %s", ref $res); Mail::DKIM::DNS::resolver($res); } } trunk: Sending lib/Mail/SpamAssassin/Plugin/DKIM.pm Committed revision 1715245. 3.4: Sending lib/Mail/SpamAssassin/Plugin/DKIM.pm Committed revision 1715248. > - Net::DNS 1.03 has a bug in that it sends a barrage of queries > up to a retry count in one go, with a timeout of 0, so it > depends on a luck and a DNS resolver's cache whether a reply > is received in a few milliseconds or not - which is why a dkim > test fails more likely then not. This bug is still undocumented. Actually perhaps not a bug, but a change in semantics of "retry" and "retrans" options with Net::DNS 1.03, which caused the failure of DNS lookups by a DKIM plugin. Now documented at [rt.cpan.org #109183] : https://rt.cpan.org/Ticket/Display.html?id=109183 (In reply to Mark Martinec from comment #8) > --- lib/Mail/SpamAssassin/Plugin/DKIM.pm (revision 1715244) > +++ lib/Mail/SpamAssassin/Plugin/DKIM.pm (working copy) > @@ -794,7 +794,8 @@ > # Only do so if EDNS0 provides a reasonably-sized UDP payload size, > # as our interface does not provide a DNS fallback to TCP, unlike > # the Net::DNS::Resolver::send which does provide it. > - my $res = $self->{main}->{resolver}->get_resolver; > + my $res = $self->{main}->{resolver}; > + dbg("dkim: providing our own resolver: %s", ref $res); > Mail::DKIM::DNS::resolver($res); > } > } I think this fix is incomplete, as Mail::DKIM also uses the Net::DNS resolver at: if (Mail::DKIM::AuthorDomainPolicy->UNIVERSAL::can("fetch")) { dbg("dkim: adsp: performing lookup on _adsp._domainkey.%s", $author_domain); # get our Net::DNS::Resolver object my $res = $self->{main}->{resolver}->get_resolver; $practices = Mail::DKIM::AuthorDomainPolicy->fetch( Protocol => "dns", Domain => $author_domain, DnsResolver => $res); } 1; which is around line 1046 in DKIM.pm. I would expect we want to use the native resolver in both places? Good point. Though should we consider 1.03 a burned release and require other versions instead of working around it? (In reply to Kevin A. McGrail from comment #11) > Good point. Though should we consider 1.03 a burned release and require > other versions instead of working around it? I'm seeing a really odd issue atm with SA 3.4.1 and Net::DNS 1.04. I've now backed out the Net::DNS 1.03 specific changes to SA to see if that resolves it. Basically since I deployed Net::DNS 1.04 and SA 3.4.1 patched, I'm seeing a lot of: Dec 10 20:24:46 zre-ldap002 postfix/amavisd/smtpd[31292]: timeout after END-OF-MESSAGE from localhost[127.0.0.1] Dec 10 20:25:31 zre-ldap002 postfix/dkimmilter/smtpd[31801]: timeout after END-OF-MESSAGE from localhost[127.0.0.1] However, since amavisd *also* uses Net::DNS via Net::DNS::Resolver::Programmable, I don't know if the problem is with Amavis or SA's usage of Net::DNS. ;) Will know shortly after some testing. (In reply to Quanah Gibson-Mount from comment #12) > However, since amavisd *also* uses Net::DNS via > Net::DNS::Resolver::Programmable, I don't know if the problem is with Amavis > or SA's usage of Net::DNS. ;) Will know shortly after some testing. Yeah, this is a bug in amavis. I wish it had a bug tracker too. As to the question here... It may be safer overall, to use the SA specific resolver since it exists now, and Net::DNS is such a crapshoot. Definitely the DKIM.pm should be consistent on which one it uses, though. :) --Quanah (In reply to Quanah Gibson-Mount from comment #13) > (In reply to Quanah Gibson-Mount from comment #12) > > > However, since amavisd *also* uses Net::DNS via > > Net::DNS::Resolver::Programmable, I don't know if the problem is with Amavis > > or SA's usage of Net::DNS. ;) Will know shortly after some testing. > > Yeah, this is a bug in amavis. I wish it had a bug tracker too. Ok, not amavis, either. There's just some rogue program on the system connecting to localhost and then never sending anything, so this is noise I can ignore. ;) (In reply to Kevin A. McGrail from comment #11) > Good point. Though should we consider 1.03 a burned release and require > other versions instead of working around it? Yes, I think 1.03 should be blacklisted, regardless. Mark's work on using SA's own resolver for Mail::DKIM appears to be working fine in my lab. It's probably easiest, long term, to make Mail::DKIM use that in both locations. Net::DNS of course continues to be required for other bits, but this way we are safe in the future if they change the Net::DNS resolver behavior once again. (In reply to Quanah Gibson-Mount from comment #10) > > --- lib/Mail/SpamAssassin/Plugin/DKIM.pm (revision 1715244) > > +++ lib/Mail/SpamAssassin/Plugin/DKIM.pm (working copy) > > - my $res = $self->{main}->{resolver}->get_resolver; > > + my $res = $self->{main}->{resolver}; > > + dbg("dkim: providing our own resolver: %s", ref $res); > I think this fix is incomplete, as Mail::DKIM also uses the Net::DNS > resolver at: [...] > which is around line 1046 in DKIM.pm. I would expect we want to use the > native resolver in both places? You are right, it would make sense to use the same resolver in both places in the DKIM plugin. Now that 1.04 reverted the incompatible 'retry' semantics change, we have again a choice of using the Net::DNS resolver object with our settings (the one at {resolver}->get_resolver), or to use our private send/receive methods as other DNS lookups in SA use (i.e. our private resolver object at $self->{main}->{resolver}). The benefit of using the Net::DNS native resolver is that it falls back to a TCP query in case the UDP reply returns a 'truncated' flag (which can happen with long DKIM public keys and a local recursive DNS server not supporting EDNS0 long packets, or having some broken firewall in the path. The benefit of using our own send/receive code is some independence of future changes in Net::DNS. Haven't decided what would be the best choice. > I'm seeing a really odd issue atm with SA 3.4.1 and Net::DNS 1.04. > I've now backed out the Net::DNS 1.03 specific changes to SA to see > if that resolves it. > Basically since I deployed Net::DNS 1.04 and SA 3.4.1 patched, > I'm seeing a lot of: > Dec 10 20:24:46 zre-ldap002 postfix/amavisd/smtpd[31292]: timeout after END-OF-MESSAGE from localhost[127.0.0.1] > However, since amavisd *also* uses Net::DNS via Net::DNS::Resolver::Programmable, > I don't know if the problem is with Amavis or SA's usage of Net::DNS. ;) Amavis does not use Net::DNS::Resolver::Programmable, nor Net::DNS. All that amavis has to do with Net::DNS is to provide a pre-built Net::DNS::Resolver object and pass it to Mail::DKIM. On a similar note, Mark, based on comment 7, is the SPF plugin also affected by this? (In reply to Kevin A. McGrail from comment #17) > On a similar note, Mark, based on comment 7, is the SPF plugin also affected > by this? Mark's comment from the list: Not sure about SPF. It's supposed to be fixed in the current 3.4 branch and in trunk, which is why I'm not seeing a problem with Net::DNS 1.03 or Net::DNS 1.04. Will check how the released version of 3.4.1 fares with Net::DNS 1.04 regarding SPF. The emergency patches were applied to FreeBSD ports, don't know about other distributions. Here are the relevant bug entries: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7231 https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7265 There were three changes in New::DNS that affected SpamAssassin. The change in 1.01 affected SpamAssassin due to our sloppy parsing of Net::DNS results. The changes brought by 1.03 affected SpamAssassin on two fronts, both are due to an incompoatible API change in Net::DNS: different object class expected by bgread (which affected a handful of other Perl modules too), and a change in semantics of "retry" and "retrans" options, which affected DKIM plugin. Mark More input: I forgot about the Bug 7223, which also affects Net::DNS 1.01 or later: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223 Tried it now with 3.4.1 and Net::DNS 1.04. You still need to apply the patch from Bug 7223 (in addition to a patch from Bug 7231), then it passes all tests with Net::DNS 1.04 (even without patches from Bug 7265). Seems easiest to install SpamAssassin from a svn 3.4 branch ( svn checkout http://svn.apache.org/repos/asf/spamassassin/branches/3.4 spamassassin-3.4 ) or downgrade Net::DNS to a pre-1.* version (i.e. 0.83). Revision 1715196 breaks something. Or just prints ugly stuff. Sorry no time to investigate. Ubuntu 14.04 stock perl etc, with SA installed from trunk: libnet-dns-perl 0.68-1.2build1 _WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751, <GEN11> line 9. _WARN: bgread: received a 116 bytes packet from 127.0.0.1, decoded 0 bytes _WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751, <GEN11> line 9. _WARN: bgread: received a 99 bytes packet from 127.0.0.1, decoded 0 bytes _WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751. _WARN: bgread: received a 132 bytes packet from 127.0.0.1, decoded 0 bytes _WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751. _WARN: bgread: received a 119 bytes packet from 127.0.0.1, decoded 0 bytes _WARN: Argument "" isn't numeric in sprintf at /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751. _WARN: bgread: received a 149 bytes packet from 127.0.0.1, decoded 0 bytes > Revision 1715196 breaks something. Or just prints ugly stuff. Sorry no time > to investigate. > Ubuntu 14.04 stock perl etc, with SA installed from trunk: > > _WARN: Argument "" isn't numeric in sprintf at > /usr/local/share/perl/5.18.2/Mail/SpamAssassin/DnsResolver.pm line 751, > <GEN11> line 9. > _WARN: bgread: received a 116 bytes packet from 127.0.0.1, decoded 0 bytes Don't know the reason. If it occurs infrequently this is fine and just the excessive warning needs fixing (a fix below). If "decoded 0 bytes" occurs regularly this needs investigation (please report the Net::DNS version if this is the case). trunk: Bug 7265: fix warnings 'Argument "" isn't numeric in sprintf' as reported in Comment 20, reordered comments for clarity Sending lib/Mail/SpamAssassin/DnsResolver.pm Committed revision 1748609. (In reply to Mark Martinec from comment #21) > > Don't know the reason. If it occurs infrequently this is fine > and just the excessive warning needs fixing (a fix below). > If "decoded 0 bytes" occurs regularly this needs investigation > (please report the Net::DNS version if this is the case). Using trunk again, getting warnings on everything.. Ubuntu 14.04.1 libnet-dns-perl 0.68-1.2build1 It seems Net::DNS::Packet doesn't return length!! return wantarray ? ( $self, $@ ) : $self; Compared to the latest version.. return wantarray ? ( $self, $offset ) : $self; I built an Ubuntu 14.04 VM and confirmed that trunk and 3.4 branch get the decode 0 bytes error on various network tests, so no problem reproducing this. Can we get this fixed for the 3.4.2 release? Ok, I fixed it in trunk. The length return from Net::DNS::Packet is only used to output a warning so the test for the warning can be skipped if no length value is returned. I tested that it works in Ubuntu 14.04. Committed revision 1791568 ported to branch 3.4, should be ready for 3.4.2 release now Committed revision 1791569 Mark, could you check if this bug can be closed? (In reply to Sidney Markowitz from comment #25) > ported to branch 3.4, should be ready for 3.4.2 release now > Committed revision 1791569 > > Mark, could you check if this bug can be closed? Sidney, did you add a test for this issue or is there a good method to test? Oh, right, what we were getting were error messages not test failures. I guess there should be a test that makes it fail when it happens. I can look at that tomorrow if nobody comes up with one first. However, I only looked at the error message that was showing up in Ubuntu 11.04. Someone else has to look at this issue itself and decide if it is ready to be closed. As Sidney mentions, yes, this can be closed. While a test case might be nice, the trunk and 3.4 fixes resolve the issue. |