SA Bugzilla – Bug 6572
URI parser mishandles parenthesis
Last modified: 2018-08-22 04:29:00 UTC
Current spam run has URIs like this: www.(sunkeymaker.com) This results in "(sunkeymaker.com" as parsed domain and it's querying that from URIBLs, including the parenthesis. If nothing more, it's just wasthing DNS lookups.
I'm personally patching this like this. It's certain not to break anything. I'm not sure if some plugins are actually expecting bad behavior like this, no time to find out.. --- /local/src/amavis/spamassassin-trunk/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm Mon Dec 13 22:10:00 2010 +++ URIDNSBL.pm Fri Apr 15 10:40:26 2011 @@ -428,6 +428,8 @@ # take the usable domains and add them to the ordered list while (my($host,$domain) = each( %{$info->{hosts}} )) { + $host =~ s/\(//g; + $domain =~ s/\(//g; if ($skip_domains->{$domain}) { dbg("uridnsbl: domain $domain in skip list"); } else {
Henrik, I was thinking that the RBL should just list www.(sunkeymaker.com, www.(sunkeymaker.com) and www.sunkeymaker.com to fix the issue more so than changing the lookup technology?
(In reply to comment #2) > Henrik, > > I was thinking that the RBL should just list www.(sunkeymaker.com, > www.(sunkeymaker.com) and www.sunkeymaker.com to fix the issue more so than > changing the lookup technology? I was just wondering if that's even legal character for DNS or domains. Seems strange that we would willingly parse the domain as such. Either ignore or fix it..
Created attachment 5492 [details] check for valid domain names Parenthesis is not allowed in domain names, same for other chars.
not parsing out "(sunkeymaker.com" is a step forward, but really the correct thing to do is to find "sunkeymaker.com". The parsing seems to be rather fragile, for example printf '\nbuy from canadianpharmacy.com. \n' |spamasssassin works, but printf '\nbuy from canadianpharmacy.com, \n' |spamasssassin doesn't. (All I've done there is to change a full stop to a comma.)
Created attachment 5494 [details] Remove parenthesis from uri if present Remove parenthesis from uri, slightly tested with few emails.
(In reply to RW from comment #5) > not parsing out "(sunkeymaker.com" is a step forward, but really the correct > thing to do is to find "sunkeymaker.com". > > The parsing seems to be rather fragile, for example > > printf '\nbuy from canadianpharmacy.com. \n' |spamasssassin > > works, but > > printf '\nbuy from canadianpharmacy.com, \n' |spamasssassin > > doesn't. (All I've done there is to change a full stop to a comma.) Can you provide an email message that breaks this way ? Thanks.
I don't have an email, I was looking to see what happened with things like canadianpharmacy.com! !!!canadianpharmacy.com!!! --canadianpharmacy.com-- Most them don't work correctly. I've always found things like printf '\nbuy from canadianpharmacy.com, \n' to work fine for testing body rules. The newline at the beginning causes SA to treat the text that follows as body text with the default mime type of text/plain; charset=us-ascii. Sometimes mime headers are needed and occasionally a Subject, but not in this case.
With default rules I cannot spot differences and a "eval:check_uridnsbl('URIBL_SBLXBL')" rule is not triggered for me; note that some rules are better triggered iff the content is detected as html because the code path is different.
I wasn't checking for rule hits, I was checking for whether a URI was parsed out of the body. I passed it through something like spamassassin -D uri 2>&1 1>/dev/null |grep canadian
Created attachment 5496 [details] Add !,? and ) as end delimiter for uris This should fix your issue, anyway I would change this only if such uris are present in real emails; imho this regexp should be changed only to detect uris that are made "clickable" by mail user agents.
Paranthesis are not valid in a uri so it's not being mishandled