SA Bugzilla – Bug 7891
spamassassin expands domain by wrong TLD
Last modified: 2021-04-18 15:13:43 UTC
spamassassin expands some domains by a .com TLD although the domain without .com is perfectly valid (uncommon but valid). To reproduce: add a domain like www.ch into a mail and see spamassassin's debug output ``` spamassassin -D </tmp/redacted 2>&1|egrep 'URIHOSTS|URIDOMAINS' Mar 15 08:43:09.824 [32657] dbg: check: tagrun - tag URIHOSTS is now ready, value: ARY:[www.ch,www.ch.com] Mar 15 08:43:09.825 [32657] dbg: check: tagrun - tag URIDOMAINS is now ready, value: ARY:[ch.com,www.ch] ``` in the message only www.ch is present but SA expands it by .com and then checks domains completely not related to this message (ch.com and www.ch.com) If a scheme is prepended to the uri link (ex http://www.ch) then SA does **mot** append the .com TLD
Yeah Util.pm / uri_list_canonicalize should probably only add .com to unknows TLDs. # http://foobar -> http://www.foobar.com as Firefox does (Bug 6596) # (do this here so we don't trip on those 0x123 IPs etc..) # https://hg.mozilla.org/mozilla-central/file/tip/docshell/base/nsDefaultURIFixup.cpp elsif ($proto eq 'http://' && $auth eq '' && $nhost ne 'localhost' && $port eq '80' && $nhost =~ /^(?:www\.)?([^.]+)$/) { push(@nuris, join('', $proto, 'www.', $1, '.com', $rest)); Schemeless uris are already known as valid, but this function has no knowledge if the uri was originally schemeless or not. This function also does not have any access to is_domain_valid function since Util.pm is standalone module. I guess we could pass $self->{main}->{registryboundaries} as argument to uri_list_canonicalize, so we can use it for this purpose. sub uri_list_canonicalize { my($redirector_patterns, @uris, $rb) = @_;
(In reply to Henrik Krohns from comment #1) > sub uri_list_canonicalize { > my($redirector_patterns, @uris, $rb) = @_; Uh we can't do this since @uris is not passed as reference *facepalm*.. kinda hard to fix this without breaking compatibility (if we care about it in this case).
uri_list_canonicalize now detects if it's called by new usage style and will use $rb to check for valid domain in this case. my @uris; my $rb; if (ref($_[0]) eq 'ARRAY') { # New call style: # - reference to array of redirector_patterns # - reference to array of URIs # - reference to $self->{main}->{registryboundaries} @uris = @{$_[0]}; $rb = $_[1]; } else { # Old call style: # - reference to array of redirector_patterns # - rest of the arguments is list of uris @uris = @_; } Committed to trunk. 3.4 is in RTC mode, needs votes for possible commit. Sending trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm Sending trunk/lib/Mail/SpamAssassin/Util.pm Sending trunk/t/uri.t Transmitting file data ...done Committing transaction... Committed revision 1887669.
(In reply to Henrik Krohns from comment #3) > Committed to trunk. 3.4 is in RTC mode, needs votes for possible commit. As per discussion on dev list, changing milestone from 3.4.6 to 4.0.0. I'm leaving it open until after the 3.4.6 release, as then Henrik, you can decide if you want to commit it to the 3.4 branch and close it with a 3.4.7 milestone, or just close it as Fixed for 4.0.0.
Backported to 3.4 as I feel it's simple and safe. Sending spamassassin-3.4/lib/Mail/SpamAssassin/PerMsgStatus.pm Sending spamassassin-3.4/lib/Mail/SpamAssassin/Util.pm Sending spamassassin-3.4/t/uri.t Transmitting file data ...done Committing transaction... Committed revision 1888907.