SA Bugzilla – Bug 4945
Add support for open-whois.org
Last modified: 2009-07-20 07:07:09 UTC
bl.open-whois.org is a list of domains registered through anonymous or proxy domain registration services, such as Katz Domain Trust, Domains by Proxy and SecureWhois. There are some example rules here: <http://open-whois.org/01_openwhois.cf>
I've added these to my sandbox -- thanks! let's see how they do in the weekly mass-check ( http://ruleqa.spamassassin.org/ after saturday). BTW, if they do perform well, note that we won't actually incorporate them into a release without say-so from the admins of open-whois.org; having a ruleset enabled by default in SA can cause massive load on the servers, so we'd have to be sure they're ready for that.
(In reply to comment #1) > I've added these to my sandbox -- thanks! > > let's see how they do in the weekly mass-check ( http://ruleqa.spamassassin.org/ > after saturday). > > BTW, if they do perform well, note that we won't actually incorporate them into > a release without say-so from the admins of open-whois.org; having a ruleset > enabled by default in SA can cause massive load on the servers, so we'd have to > be sure they're ready for that. I probably should have made it clearer that I'm the admin of open-whois.org :) There's also a recently updated list of rules at http://open-whois.org/01_openwhois.cf (which includes a few extra anonymising services)
'I probably should have made it clearer that I'm the admin of open-whois.org :)' ok, that's cool then ;) 'There's also a recently updated list of rules at http://open-whois.org/01_openwhois.cf (which includes a few extra anonymising services)' yep -- those are the ones I've picked up.
(In reply to comment #3) > 'There's also a recently updated list of rules at > http://open-whois.org/01_openwhois.cf (which includes a few extra anonymising > services)' > > yep -- those are the ones I've picked up. Unless you got them 10 hours ago, they're the old ones. Which reminds me, I should probably stick a 'date modified' comment in there somewhere..
ok, I've picked up the new ones then ;)
in recent mass-checks, these have been doing nicely: http://ruleqa.spamassassin.org/?daterev=20061007-r453869-n&s_defcorpus=on&rule=&srcpath=sandbox%2Fjm%2F20_openwhois&s_zero=on&s_detail=checked+&g=Change These are good: MSECS SPAM% HAM% S/O% RANK SCORE NAME 0.00000 1.4394 3766 of 261633 messages 0.0000 0 messages 1.000 0.84 0.00 T_WHOIS_AITPRIV 0.00000 1.1310 2960 of 261633 messages 0.0064 5 of 62951 messages 0.994 0.81 0.00 T_WHOIS_PRIVPROT 0.00000 0.7407 1938 of 261633 messages 0.0032 3 of 62951 messages 0.996 0.78 0.00 T_DNS_FROM_OPENWHOIS 0.00000 0.2389 626 of 261633 messages 0.0000 0 messages 1.000 0.66 0.00 T_WHOIS_SECUREWHOIS 0.00000 0.1697 444 of 261633 messages 0.0016 1 of 62951 messages 0.991 0.62 0.00 T_WHOIS_WHOISGUARD 0.00000 0.1407 369 of 261633 messages 0.0000 0 messages 1.000 0.60 0.00 T_WHOIS_REGISTERFLY 0.00000 0.1173 307 of 261633 messages 0.0000 0 messages 1.000 0.58 0.00 T_WHOIS_NAMEKING and these don't really hit enough spam, or hit too much ham: 0.00000 0.1888 494 of 261633 messages 0.0683 43 of 62951 messages 0.734 0.57 0.00 T_WHOIS_DMNBYPROXY 0.00000 0.0520 137 of 261633 messages 0.0032 3 of 62951 messages 0.942 0.51 0.00 T_WHOIS_UNLISTED 0.00000 0.0084 22 of 261633 messages 0.0000 0 messages 1.000 0.45 0.00 T_WHOIS_PRIVACYPOST 0.00000 0.0031 9 of 261633 messages 0.0000 0 messages 1.000 0.45 0.00 T_WHOIS_GKGPROXY 0.00000 0.0023 7 of 261633 messages 0.0000 0 messages 1.000 0.44 0.00 T_WHOIS_WHOISPROT 0.00000 0.0019 5 of 261633 messages 0.0000 0 messages 1.000 0.44 0.00 T_WHOIS_SECINFOSERV 0.00000 0.0011 3 of 261633 messages 0.0000 0 messages 1.000 0.44 0.00 T_WHOIS_NOLDC 0.00000 0.0019 5 of 261633 messages 0.0016 1 of 62951 messages 0.546 0.44 0.00 T_WHOIS_MONIKER_PRIV 0.00000 0.0008 3 of 261633 messages 0.0000 0 messages 1.000 0.44 0.00 T_WHOIS_SPAMFREE 0.00000 0.0008 3 of 261633 messages 0.0000 0 messages 1.000 0.44 0.00 T_WHOIS_CONTACTPRIV 0.00000 0.0191 50 of 261633 messages 0.1048 66 of 62951 messages 0.154 0.38 0.00 T_WHOIS_MYPRIVREG 0.00000 0.1242 325 of 261633 messages 0.5449 344 of 62951 messages 0.186 0.31 0.00 T_WHOIS_NETSOLPR I guess that's entirely to be expected. ;) the high scoring ones seem to have correlations to specific spammers or spam gangs; e.g. one of the rules overlaps noticeably with a specific URL format! The next issue is one for us -- we need to figure out how to publish these, since we obv can't promote all the rules. ideally, we could leave the nightly rule-promotion code to do it. However, right now it is not allowed to auto-promote network rules, since they are too "heavyweight" in impact. what I've done is just insert 'publish NAMEOFRULE' lines for the good ones above... that works, but will require occasional revisiting as hitrates change :( anyway, they're in now! marking fixed...
removed due to the host's domain lapsing! see bug 6157