Bug 7365 - URIs containing parts of TLD .net receive URI_OBFU_WWW score
Summary: URIs containing parts of TLD .net receive URI_OBFU_WWW score
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Score Generation (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-27 07:23 UTC by mail
Modified: 2018-08-26 21:33 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description mail 2016-10-27 07:23:47 UTC
Whenever an email contains a link with a URI like http://www.sci-net.de (this is our actual domain, where the error occurs) spamassassin returns an URI_OBFU_WWW score of 3.099.

With some testing we found out that the -net part of our domain is the key to this behavior.

For testing purposes we changed the link in our email templates we send to http://www.scinet.de (without the "-"). 
Now the email isn't flagged by spamassassin anymore.

It seems that spamassassin is confused with the "-net" as part of our domain name.
Comment 1 RW 2016-10-27 11:48:53 UTC
What I'm seeing is that http://www.sci-net.de doesn't hit the rule but https://www.sci-net.de and www.sci-net.de does. 

I think a reasonable compromise would be to change the assertion from (?<!http:\/\/) to (?<!:\/\/)
Comment 2 John Hardin 2016-10-28 01:26:17 UTC
How is that getting 3+ points? The score limit has been 2.000 since last March...
Comment 3 John Hardin 2016-10-28 01:52:20 UTC
The masscheck S/O is abysmal, it's not respecting the score limit (possibly due to the abysmal S/O) and I can't repro this behavior in my test environment - it should not even hit .de at all, because that's not one of the TLDs it's looking for.

Disabling.

$ svn commit 20_uri_obfu_ws.cf
Sending        svn/trunk/rulesrc/sandbox/jhardin/20_uri_obfu_ws.cf
Transmitting file data .done
Committing transaction...
Committed revision 1766914.
Comment 4 RW 2016-10-28 14:52:36 UTC
(In reply to John Hardin from comment #3)
> The masscheck S/O is abysmal, it's not respecting the score limit (possibly
> due to the abysmal S/O) and I can't repro this behavior in my test
> environment - it should not even hit .de at all, because that's not one of
> the TLDs it's looking for.

It's finding www.sci-net, or would do if you used https://www.sci-net.de or just www.sci-net.de. 

It's possible that the poor S/O is caused by the general switch from http to https, so that the look-behind assertion isn't avoiding the FPs any more. Once it's generalized to include https, the rule makes sense because people commonly drop the www part outside of a proper URL and just write the domain name, so most FPs on the aggressive host name matching are avoided.
Comment 5 John Hardin 2016-10-28 15:03:20 UTC
I couldn't reproduce it and I'm not reluctant to try to tune a rule I can't get a hit for. But, I will restore it with the broader exclusion and see how it does in masscheck.
Comment 6 John Hardin 2016-10-28 15:13:03 UTC
(In reply to John Hardin from comment #5)
> I couldn't reproduce it and I'm not reluctant to try to tune a rule I can't
> get a hit for.

Oops. That should be: I *am* reluctant to try to tune a rule I can't get a hit for.

It didn't hit on any form of that URI.

I reenabled it and made your recommended change, we'll see what masscheck says.

$ svn commit 20_uri_obfu_ws.cf
Sending        svn/trunk/rulesrc/sandbox/jhardin/20_uri_obfu_ws.cf
Transmitting file data .done
Committing transaction...
Committed revision 1767032.
Comment 7 RW 2016-10-28 15:31:22 UTC
$ printf  "\n\nhttps://www.sci-net.de" |spamassassin -D 2>&1 | grep -Eo "ran body rule URI_OBFU_WWW.*"
ran body rule URI_OBFU_WWW ======> got hit: "www.sci-net"

$ printf  "\n\nhttp://www.sci-net.de" |spamassassin -D 2>&1 | grep -Eo "ran body rule URI_OBFU_WWW.*"

$ printf  "\n\nwww.sci-net.de" |spamassassin -D 2>&1 | grep -Eo "ran body rule URI_OBFU_WWW.*"
ran body rule URI_OBFU_WWW ======> got hit: "www.sci-net"
Comment 8 Kevin A. McGrail 2018-08-26 21:33:30 UTC
The rule is not being published.  Closing