Bug 6572 - URI parser mishandles parenthesis
Summary: URI parser mishandles parenthesis
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-15 03:33 UTC by Henrik Krohns
Modified: 2018-08-22 04:29 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
check for valid domain names patch None Giovanni Bechis [HasCLA]
Remove parenthesis from uri if present patch None Giovanni Bechis [HasCLA]
Add !,? and ) as end delimiter for uris patch None Giovanni Bechis [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Henrik Krohns 2011-04-15 03:33:44 UTC
Current spam run has URIs like this:

www.(sunkeymaker.com)

This results in "(sunkeymaker.com" as parsed domain and it's querying that from URIBLs, including the parenthesis. If nothing more, it's just wasthing DNS lookups.
Comment 1 Henrik Krohns 2011-04-15 03:52:01 UTC
I'm personally patching this like this. It's certain not to break anything. I'm not sure if some plugins are actually expecting bad behavior like this, no time to find out..

--- /local/src/amavis/spamassassin-trunk/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm       Mon Dec 13 22:10:00 2010
+++ URIDNSBL.pm Fri Apr 15 10:40:26 2011
@@ -428,6 +428,8 @@

     # take the usable domains and add them to the ordered list
     while (my($host,$domain) = each( %{$info->{hosts}} )) {
+      $host =~ s/\(//g;
+      $domain =~ s/\(//g;
       if ($skip_domains->{$domain}) {
         dbg("uridnsbl: domain $domain in skip list");
       } else {
Comment 2 Kevin A. McGrail 2012-01-17 17:39:43 UTC
Henrik, 

I was thinking that the RBL should just list www.(sunkeymaker.com, www.(sunkeymaker.com) and www.sunkeymaker.com to fix the issue more so than changing the lookup technology?
Comment 3 Henrik Krohns 2012-01-17 18:44:20 UTC
(In reply to comment #2)
> Henrik, 
> 
> I was thinking that the RBL should just list www.(sunkeymaker.com,
> www.(sunkeymaker.com) and www.sunkeymaker.com to fix the issue more so than
> changing the lookup technology?

I was just wondering if that's even legal character for DNS or domains. Seems strange that we would willingly parse the domain as such. Either ignore or fix it..
Comment 4 Giovanni Bechis 2017-12-08 00:35:23 UTC
Created attachment 5492 [details]
check for valid domain names

Parenthesis is not allowed in domain names, same for other chars.
Comment 5 RW 2017-12-08 14:28:36 UTC
not parsing out "(sunkeymaker.com" is a step forward, but really the correct thing to do is to find "sunkeymaker.com".  

The parsing seems to be rather fragile, for example  

 printf  '\nbuy from canadianpharmacy.com. \n' |spamasssassin

works, but

 printf  '\nbuy from canadianpharmacy.com, \n' |spamasssassin
 
doesn't. (All I've done there is to change a full stop to a comma.)
Comment 6 Giovanni Bechis 2017-12-11 08:23:38 UTC
Created attachment 5494 [details]
Remove parenthesis from uri if present

Remove parenthesis from uri, slightly tested with few emails.
Comment 7 Giovanni Bechis 2017-12-11 08:24:52 UTC
(In reply to RW from comment #5)
> not parsing out "(sunkeymaker.com" is a step forward, but really the correct
> thing to do is to find "sunkeymaker.com".  
> 
> The parsing seems to be rather fragile, for example  
> 
>  printf  '\nbuy from canadianpharmacy.com. \n' |spamasssassin
> 
> works, but
> 
>  printf  '\nbuy from canadianpharmacy.com, \n' |spamasssassin
>  
> doesn't. (All I've done there is to change a full stop to a comma.)

Can you provide an email message that breaks this way ?
Thanks.
Comment 8 RW 2017-12-11 12:45:54 UTC
I don't have an email, I was looking to see what happened with things like
   canadianpharmacy.com!
!!!canadianpharmacy.com!!! 
--canadianpharmacy.com--

Most them don't work correctly.

I've always found things like printf  '\nbuy from canadianpharmacy.com, \n' to work fine for testing body rules. The newline at the beginning causes SA to treat the text that follows as body text with the default mime type of text/plain; charset=us-ascii. Sometimes mime headers are needed and occasionally a Subject, but not in this case.
Comment 9 Giovanni Bechis 2017-12-12 07:51:23 UTC
With default rules I cannot spot differences and a "eval:check_uridnsbl('URIBL_SBLXBL')" rule is not triggered for me; note that some rules are better triggered iff the content is detected as html because the code path is different.
Comment 10 RW 2017-12-12 12:28:07 UTC
I wasn't checking for rule hits, I was checking for whether a URI was parsed out of the body. I passed it through something like 


  spamassassin -D uri 2>&1 1>/dev/null |grep canadian
Comment 11 Giovanni Bechis 2017-12-12 23:19:44 UTC
Created attachment 5496 [details]
Add !,? and ) as end delimiter for uris

This should fix your issue, anyway I would change this only if such uris are present in real emails; imho this regexp should be changed only to detect uris that are made "clickable" by mail user agents.
Comment 12 Kevin A. McGrail 2018-08-22 04:29:00 UTC
Paranthesis are not valid in a uri so it's not being mishandled