ProjectHelper has for some time used various tricks to ensure that the URL it passes to the XML parser escapes some nasty filename chars like '#': revision 1.29 date: 2000/09/14 14:04:38; author: bodewig; state: Exp; lines: +3 -3 Provide the SAXParser with the File instead of an InputSource to give it a chance to parse relative URIs. PR: 54 Submitted by: Achim Dannecker <A.Dannecker@gmx.de> revision 1.45 date: 2001/01/16 13:36:38; author: conor; state: Exp; lines: +15 -1 Handle directories with # in them by passing the parser an inputstream, rather than a name. Submitted by: Yossie Teitz <yossie@reachcast.co.il> This is currently handled (now in ProjectHelperImpl.java) by the following code: String uri = "file:" + buildFile.getAbsolutePath().replace('\\', '/'); for (int index = uri.indexOf('#'); index != -1; index = uri.indexOf('#')) { uri = uri.substring(0, index) + "%23" + uri.substring(index + 1); } which works for this one case, but not others, and furthermore uses undocumented aspects of the JRE's treatment of file: URLs which may break on some JRE implementations. JDK 1.2 had the File.toURL() method, but it did not solve the escaping problem correctly; see Java bug #4472849 (see link for details). Fortunately JDK 1.4 lets you do this: file.toURI().toURL().toExternalForm() which ought to provide a URL string with all suitable escapes, ready to be sent to the parser, with semantics guaranteed by the Java platform. I suggest that ProjectHelperImpl run code similar to this via reflection when running on JDK 1.4, rather than the current ad-hoc code.
Do you have patch for this, Jesse? We could put into ant1.6
File.toURI() is since 1.4..... Does Ant still have a JDK 1.1 requirement?
No trace of FIXED in CVS, sorry.
The escaping code has now been factored to FileUtils, all ASCII chars that need escaping are now getting treated correctly. If you ever wanted to provide a patch, this would be the place to put it into.
Well I have a patch now, which uses URI on JDK 1.4 and otherwise uses the old logic. All unit tests seem to pass for me on Linux, JDK 1.4.2, offline=true; otherwise YMMV. Not sure how safe this is for 1.6, might be more appropriate for 1.7 - up to judgement of the committers. Of especial interest: 1. A couple of unit tests were IMHO incorrect and I changed them enough to pass (hopefully still asserting what should be true). Please inspect carefully. Note that Java does *not* specify what precise form file: URIs should take on various platforms, only some aspects of their behavior, so the unit tests have to live on the edge a bit. 2. I don't know where the convention of using URLs such as "file:./foo.xml" or "file:foo.xml" (rather than simply "foo.xml") for entity includes came from, but it certainly looks like it violates common sense as well as the URI specification. (Relative URIs have no protocol and are resolved from a base URI which does.) JDK 1.4's new File(URI) rejects such URIs (correctly IMHO, though I could imagine them being resolved relative to the JVM's current working directory, however useless that might be). Ant's faq.xml used to recommend this "file:./etc" style for whatever reason. The patch corrects the FAQ. However, for compatibility with existing (IMHO erroneous) build scripts, file: URIs which do not have an absolute path beginning with '/' (and are thus illegal) are just passed through to the old parsing mechanism even on JDK 1.4. IncludeTest thus continues to pass using both syntaxes. 2a. While fromURI thus accepts relative paths for compatibility, toURI attempts to always produce a URI encoding an absolute path for safety. 3. Fixed up some places in the codebase that were failing to use the FileUtils methods to create or interpret file: URLs and were thus in danger of behaving incorrectly for some pathnames. 4. Removed obsolete FILE_PROTOCOL_PREFIX from the XSLTLiaison interface. It was not being used anywhere, and was not really correct anyway. (F_P_P + file.getAbsolutePath() is not a safe way to make URIs, and the extra "//" at the end of the prefix is gratuitous since there will never be an authority for a file URI.)
Created attachment 8706 [details] Suggested patch (needs review)
See also: http://developer.java.sun.com/developer/bugParade/bugs/4273532.html
BTW I've noticed that in Ant 1.6b3, if I try to run a build script contained in a directory whose name contains UTF-8-encoded non-ASCII characters (on Linux: locale en_US.UTF-8, glibc 2.3.2, kernel 2.4.20), it fails right away with an ArrayIndexOutOfBoundsException (e.g. index 269) from FileUtils.toURI: if (isSpecial[c]) { <-- here Ant 1.5.3 runs the same script just fine, and everything else in Java and in fact the rest of my operating system deals with, and displays, the directory correctly... I presume the previously attached patch would solve this bug (on JDK 1.4), though I haven't tried it. A quick fix for Ant 1.6 might be to change this line to read: if (c < 256 && isSpecial[c]) { which I think would at least pass through Unicode characters untouched, which Java should be able to handle. The only special characters you really care about - e.g. '#' or '?' - are in ASCII anyway.
See bug #26642 re. my last comment.
Note that Peter added the recommended c < 256 check on 2/3/2004.
...documented in bug 26642 which is mentioned above.
*** Bug 32718 has been marked as a duplicate of this bug. ***
Patches submitted, thanks. Note that toURI is implemented with java.net.URI.toASCIIString(), otherwise the non ascii characters are not properly % escaped. I have also had to do some small changes for the FileUtilsTest. fromURI("file:///foo") ==> C:\foo before it was \foo Actually, this creates an inconsistency in the behavior of Ant between ant running on JDK 1.2 or 1.3 and Ant running on JDK 1.4 or 1.5. This should be fixed in the implemetation of fromURI.
Is this still open?
Seems to have already been applied on 16 Dec 2005, SVN rev 357131.