SA Bugzilla – Bug 3827
[review] SURBL ccTLD list updated, please update SA TLD code
Last modified: 2004-10-02 09:52:30 UTC
>>> | name,ai,au,bd,bh,ck,eg,et,fk,il,in,kh,kr,mk,mt,na, >>> | np,nz,pg,pk,qa,sa,sb,sg,sv,ua,ug,uk,uy,vn,za,zw I have updated the ccTLDs for these, removed some duplicates, and added some data for a few other ccTLDs. The results are in: http://spamcheck.freeapp.net/two-level-tlds Really this is just for completeness since geographic domains other than .us aren't used in spams too often.
ok, looking at the diff... I'm pretty happy with additions, so I'm left with the removes that are questionable (aka, why were they removed?): < org\.au < notaires\.fr < nui\.hu < nm\.kr < ac\.pa < co\.sv
Those were corrected typos and duplicates.
committed for 3.1, r47440 will attach a 3.0.1 patch shortly
Created attachment 2387 [details] patch to update to the list provided
BTW, on the advice of a German registrar and others, we've removed the .de entries from the list. They are not proper generic ccTLDs. .de apparently has no reserved second level generic geographic TLDs. Removed are: nic.de denic.de moebel.de glueckwunsch.de buecher.de boerse.de kueche.de buero.de fluege.de kuechen.de aerzte.de reisebuero.de doctoren.de aero.de museum.de coop.de pro.de
+1 Some of these don't really seem like we should be bothering to break them out. somerandomname.com can all be one domain as far as I'm concerned. I think ccTLDs really need to represent some organizational unit larger than a simple company or organization.
+1 but note: 'somerandomname.com can all be one domain as far as I'm concerned. I think ccTLDs really need to represent some organizational unit larger than a simple company or organization.' the point is not how big the registrar is -- it's if a spammer can obtain NS and SOA records in a zone under that domain. if they can, then we have a possible hole that we'll miss in our rules; if they can't, we don't have a problem.
'the point is not how big the registrar is -- it's if a spammer can obtain NS and SOA records in a zone under that domain. if they can, then we have a possible hole that we'll miss in our rules; if they can't, we don't have a problem.' If delegations are available under a given domain, then those delegations can be abused independently of the parent domain. So lists like this help us know which child domain levels need to be checked.
Created attachment 2388 [details] same as before, minus .de
Subject: Re: SURBL ccTLD list updated, please update SA TLD code On Tue, Sep 28, 2004 at 07:18:56PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > BTW, on the advice of a German registrar and others, we've removed the .de > entries from the list. They are not proper generic ccTLDs. .de apparently has > no reserved second level generic geographic TLDs. Removed are: ok, updated in 3.1 and in the 3.0.1 patch.
-0.5: From the maintenance POV that's a nightmare, can we please split that list up into two REs, one conisiting of official second level domains and one of other instances which offer those? Actually, I don't think we should hard-code the list of inofficial domains anyway (there will always be n+1 more providers for such things) but if you really want, please split the lists. An alternative for 3.1 or 3.2 could be to read this list from a file in DATADIR which can be updated from somewhere with a simple wget call.
Hmmm... as the code is already in 3.0.0 it's probably the best idea to just update it. But especially those .fr, .hu and .jp "word"-domains (maybe I missed) some look as useless as the .de ones. In the initial report Jeff wrote 'Really this is just for completeness since geographic domains other than .us aren't used in spams too often.' -- then IMO we should not include any of the others as bigger the RE as more overhead we have.
Subject: Re: [review] SURBL ccTLD list updated, please update SA TLD code On Wed, Sep 29, 2004 at 05:03:30PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > In the initial report Jeff wrote 'Really this is just for completeness since > geographic domains other than .us aren't used in spams too often.' -- then IMO > we should not include any of the others as bigger the RE as more overhead we > have. Yeah, I actually was noticing that the 4TLD and 3TLD domains are REs, but the 2TLD ones are simple "example.tld" strings. So in theory, we could just grab the last two sections of the FQDN, then do a hash lookup. For the 3.0.1 code, I'd like to just do the update. For the 3.1 code, I'm tempted to change it around. If things like SURBL are only going to list actual domains, we need to deal with that correctly.
'If things like SURBL are only going to list actual domains, we need to deal with that correctly.' what d'you mean -- actual registrar-boundary domains, or any domain that a spammer could possibly register, even if not with an ICANN-recognized registrar boundary?
Subject: Re: [review] SURBL ccTLD list updated, please update SA TLD code On Wed, Sep 29, 2004 at 05:16:40PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > what d'you mean -- actual registrar-boundary domains, or any domain that a > spammer could possibly register, even if not with an ICANN-recognized registrar > boundary? I'd say registrar boundaries. I want to find all the proper domains.
Does that mean domains like "medecin.fr" would stay in? I think the principle of these is that doctors could register subdomains under that one, etc.
Is medicin.fr an "official" subdomain by the French NIC (whatever it is called)? (And is it actually abused?) If not, whats the difference to other (free) "ccTLDs", like maybe gmxhome.de (just one which came to my mind) or somerandomprovider.fr? Just like the removed .de domains which were actually just a random list of generic words. We just can't keep an up-to-date list of every provider which offers third-level domains in our codebase, especially not in one big RE. (I must admit that I don't exactly know what this RE is used for but the above is true anyway.) For 3.0 I'm fine (aka +1) if the RE is updated as suggested. If not impossible, I'd love to see those generic-word-domains go from the list so that only the official boundaries stay, but if the SURBL code needs to have these they can stay in but that has to be fixed to something more dynamic for 3.1.
I think I should update what the pros and cons of this listing non-ICANN-registrar domain boundaries are, since there seems to be some confusion. When we initially considered how SURBL and other RHSBL-style domain tests should work, we considered the possible abusable holes that spammers could use. This is one of them. Here's how it works: 1. if we only list ICANN-registrar domain boundaries (ie, "com", "co.uk", "info", "cn" et al), then we have a smaller regexp and less maintainance 2. however, if "sh.cn" is a small company that offers for-free or for-pay subdelegation to third parties, and a spammer registers "foo.sh.cn", but there are nonspam domains at "bar.sh.cn", "baz.sh.cn", we cannot list them (because we'd have to list "sh.cn" and hit all the nonspam domains too). in other words, we have a hole in our rules and in SURBL. 3. therefore we should list any "registrar boundary" where a company or organisation allows third parties to register domains under their domain, even if it's not an "official" one. (what is an "official" one anyway? do ICANN maintain a list of all the sub-ccTLD delegators, like whoever deals with registration for .co.uk, .ac.uk, et al?) So the danger is that if we cut the list down, we'll provide a hole spammers can drive through. If you all are OK with that, then fine ;)
'2. however, if "sh.cn" is a small company that offers for-free or for-pay subdelegation to third parties, and a spammer registers "foo.sh.cn", but there are nonspam domains at "bar.sh.cn", "baz.sh.cn", we cannot list them (because we'd have to list "sh.cn" and hit all the nonspam domains too).' k, thanks for the explanation. But my point still stands: There are n provider which offer third-level domains, with lim(n)->oo. We simply can't list all those providers in one big RE (or even a hash) because (a) they are too many and (b) they may change under our feet. When I grep through my spam I could maybe tell you 10 such provider which are actually abused and aren't listed there (one from my head which is very much abused: 0catch.com) and think of a list of ones which aren't yet (like gmxhome.de and internet-provider.net). So as long as those currently listed are abused very much I don't see a point in including them. '(what is an "official" one anyway? do ICANN maintain a list of all the sub-ccTLD delegators, like whoever deals with registration for .co.uk, .ac.uk, et al?)' I don't think ICANN maintains such a list; some NICs even changed their mind at some time and started to offer direct second-level domains, too. But at least for the "official" ones the possibility that such a domain is abused is higher than with some random provider.
oh -- forgot to mention: +1 ;)
r51815