Bug 4945 - Add support for open-whois.org
Summary: Add support for open-whois.org
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 enhancement
Target Milestone: 3.2.0
Assignee: SpamAssassin Developer Mailing List
URL: http://open-whois.org
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-06-07 03:27 UTC by Jamie Penman-Smithson
Modified: 2009-07-20 07:07 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Jamie Penman-Smithson 2006-06-07 03:27:13 UTC
bl.open-whois.org is a list of domains registered through anonymous or proxy
domain registration services, such as Katz Domain Trust, Domains by Proxy and
SecureWhois.

There are some example rules here:
<http://open-whois.org/01_openwhois.cf>
Comment 1 Justin Mason 2006-08-02 09:30:29 UTC
I've added these to my sandbox -- thanks!

let's see how they do in the weekly mass-check ( http://ruleqa.spamassassin.org/
after saturday).

BTW, if they do perform well, note that we won't actually incorporate them into
a release without say-so from the admins of open-whois.org; having a ruleset
enabled by default in SA can cause massive load on the servers, so we'd have to
be sure they're ready for that.
Comment 2 Jamie Penman-Smithson 2006-08-02 10:57:35 UTC
(In reply to comment #1)
> I've added these to my sandbox -- thanks!
> 
> let's see how they do in the weekly mass-check ( http://ruleqa.spamassassin.org/
> after saturday).
> 
> BTW, if they do perform well, note that we won't actually incorporate them into
> a release without say-so from the admins of open-whois.org; having a ruleset
> enabled by default in SA can cause massive load on the servers, so we'd have to
> be sure they're ready for that.

I probably should have made it clearer that I'm the admin of open-whois.org :)

There's also a recently updated list of rules at
http://open-whois.org/01_openwhois.cf (which includes a few extra anonymising
services)
Comment 3 Justin Mason 2006-08-02 12:11:09 UTC
'I probably should have made it clearer that I'm the admin of open-whois.org :)'

ok, that's cool then ;)

'There's also a recently updated list of rules at
http://open-whois.org/01_openwhois.cf (which includes a few extra anonymising
services)'

yep -- those are the ones I've picked up.
Comment 4 Jamie Penman-Smithson 2006-08-02 19:30:37 UTC
(In reply to comment #3)
> 'There's also a recently updated list of rules at
> http://open-whois.org/01_openwhois.cf (which includes a few extra anonymising
> services)'
> 
> yep -- those are the ones I've picked up.

Unless you got them 10 hours ago, they're the old ones. Which reminds me, I
should probably stick a 'date modified' comment in there somewhere..
Comment 5 Justin Mason 2006-08-02 19:56:41 UTC
ok, I've picked up the new ones then ;)
Comment 6 Justin Mason 2006-10-12 06:04:21 UTC
in recent mass-checks, these have been doing nicely:
http://ruleqa.spamassassin.org/?daterev=20061007-r453869-n&s_defcorpus=on&rule=&srcpath=sandbox%2Fjm%2F20_openwhois&s_zero=on&s_detail=checked+&g=Change

These are good:

MSECS     	SPAM%     	HAM%     	S/O%     	RANK     	SCORE     	NAME     	
0.00000 	1.4394 3766 of 261633 messages 	0.0000 0 messages 	1.000 	0.84 	0.00 
T_WHOIS_AITPRIV 		
0.00000 	1.1310 2960 of 261633 messages 	0.0064 5 of 62951 messages 	0.994 
0.81 	0.00 	T_WHOIS_PRIVPROT 		
0.00000 	0.7407 1938 of 261633 messages 	0.0032 3 of 62951 messages 	0.996 
0.78 	0.00 	T_DNS_FROM_OPENWHOIS 		
0.00000 	0.2389 626 of 261633 messages 	0.0000 0 messages 	1.000 	0.66 	0.00 
T_WHOIS_SECUREWHOIS 		
0.00000 	0.1697 444 of 261633 messages 	0.0016 1 of 62951 messages 	0.991 
0.62 	0.00 	T_WHOIS_WHOISGUARD 		
0.00000 	0.1407 369 of 261633 messages 	0.0000 0 messages 	1.000 	0.60 	0.00 
T_WHOIS_REGISTERFLY 		
0.00000 	0.1173 307 of 261633 messages 	0.0000 0 messages 	1.000 	0.58 	0.00 
T_WHOIS_NAMEKING 	

and these don't really hit enough spam, or hit too much ham:
	
0.00000 	0.1888 494 of 261633 messages 	0.0683 43 of 62951 messages 	0.734 
0.57 	0.00 	T_WHOIS_DMNBYPROXY 		
0.00000 	0.0520 137 of 261633 messages 	0.0032 3 of 62951 messages 	0.942 
0.51 	0.00 	T_WHOIS_UNLISTED 		
0.00000 	0.0084 22 of 261633 messages 	0.0000 0 messages 	1.000 	0.45 	0.00 
T_WHOIS_PRIVACYPOST 		
0.00000 	0.0031 9 of 261633 messages 	0.0000 0 messages 	1.000 	0.45 	0.00 
T_WHOIS_GKGPROXY 		
0.00000 	0.0023 7 of 261633 messages 	0.0000 0 messages 	1.000 	0.44 	0.00 
T_WHOIS_WHOISPROT 		
0.00000 	0.0019 5 of 261633 messages 	0.0000 0 messages 	1.000 	0.44 	0.00 
T_WHOIS_SECINFOSERV 		
0.00000 	0.0011 3 of 261633 messages 	0.0000 0 messages 	1.000 	0.44 	0.00 
T_WHOIS_NOLDC 		
0.00000 	0.0019 5 of 261633 messages 	0.0016 1 of 62951 messages 	0.546 
0.44 	0.00 	T_WHOIS_MONIKER_PRIV 		
0.00000 	0.0008 3 of 261633 messages 	0.0000 0 messages 	1.000 	0.44 	0.00 
T_WHOIS_SPAMFREE 		
0.00000 	0.0008 3 of 261633 messages 	0.0000 0 messages 	1.000 	0.44 	0.00 
T_WHOIS_CONTACTPRIV 		
0.00000 	0.0191 50 of 261633 messages 	0.1048 66 of 62951 messages 	0.154 
0.38 	0.00 	T_WHOIS_MYPRIVREG 		
0.00000 	0.1242 325 of 261633 messages 	0.5449 344 of 62951 messages 	0.186 
0.31 	0.00 	T_WHOIS_NETSOLPR

I guess that's entirely to be expected. ;)

the high scoring ones seem to have correlations to specific spammers or spam
gangs; e.g. one of the rules overlaps noticeably with a specific URL format!

The next issue is one for us -- we need to figure out how to publish these,
since we obv can't promote all the rules.  ideally, we could leave the nightly
rule-promotion code to do it.  However, right now it is not allowed to
auto-promote network rules, since they are too "heavyweight" in impact.

what I've done is just insert 'publish NAMEOFRULE' lines for the good ones
above... that works, but will require occasional revisiting as hitrates change :(

anyway, they're in now! marking fixed...
Comment 7 Justin Mason 2009-07-20 07:07:09 UTC
removed due to the host's domain lapsing!  see bug 6157