W3C gets an immense amount of DTD traffic with user-agent often only identifying itself as Python or Java. http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic In a number of cases we have heard back from people affected by our automated blocking indicating they are running Xalan and/or Xerces doing such things as validating XML or doing XSL transforms. We have directed some we have been in correspondence with to your catalog instructions. http://xerces.apache.org/xerces2-j/faq-xcatalogs.html The vast majority of Xalan/Xerces installations most likely do not implement catalogs nor caching of external DTDs and other schemata. It would seem the resolver does not care about HTTP response codes nor caching directives. http://www.ietf.org/rfc/rfc2616.txt Better than a default catalog would be a caching XML Catalog resolver as I understand is part of Glassfish http://norman.walsh.name/2007/09/07/treadLightly There are other Java libraries contributing to this traffic as well. Xalan and Xerces are widely used, important libraries. Your assistance in reducing this excessive traffic to W3C and others hosting standards schemata would be greatly appreciated.
Ted, I'm not sure what you're suggesting we do. Changing default behaviours has the potential to break many applications. We simply cannot do that. There are well documented ways for applications to avoid or reduce network access, including the use of XML Catalogs, custom entity resolvers and the grammar caching facilities supported by Xerces and also the JAXP standard. The tools are there. People should be using them. I believe improving the situation is a matter of education. The more folks you block with a 503 response the more they'll realize that they need to do something and will have to change their application for it to work again.