UTF-8 file generated with windows applications like Notepad add BOM (Byte-order mark) at the begining of the file. And when I try to import a .sql file in UTF-8 for "sql" ant task it's not able to read the file. If I erase the BOM, all is OK. Of course I added encoding="UTF-8" in task definition. BOM are not mandatory in UTF-8 file. BOM is composed of bytes EF BB BF, in ISO-8859-1 it is char "". Perhaps I missed a parameter for the sql task definition?
In fact, according to http://www.unicode.org/unicode/faq/utf_bom.html#25 it can have this BOM in UTF-8... It seems it's java that doesn't care of it when openning an InputStreamReader whith "utf-8" charsetName. To solve my problem I made a litle (and not good) hack in SQLExec.java in runTransaction method. I replaced the line : --- Reader reader = (encoding == null) ? new FileReader(tSrcFile) : new InputStreamReader( new FileInputStream(tSrcFile), encoding); --- with this block : --- Reader reader = new FileReader(tSrcFile); if (reader.read() == 0xEF && reader.read() == 0xBB && reader.read() == 0xBF) { reader.close(); //Has to be UTF8 reader = new InputStreamReader(new FileInputStream(tSrcFile), "utf-8"); //Read the BOM char; reader.read(); } else { reader.close(); reader = (encoding == null) ? new FileReader(tSrcFile) : new InputStreamReader( new FileInputStream(tSrcFile), encoding); } --- What it does? it opens the file, checks if there is the utf8 BOM (EF BB BF). If BOM exists, open an InputStreamReader with utf8 charset and read first char (the BOM, else do like before. if file is less than 3 bytes, it will raise an exception I guess
This is an issue for so many other classes in Ant that any local fix for the sql task would be wrong. Sun's own docs don't talk about a BOM for utf-8, so it's pretty likely they'll not support it properly. Even the unicode FAQ you link to says "Note that some recipients of UTF-8 encoded data do not expect a BOM." and says it wouldn't make any difference for the endianess of the stream. So Notepad is allowed to do that, but it's useless and dangerous. Obviously Java clients are in the "do not expect a BOM" department. I'm not really sure what to do here. Your patch should probably only apply if requested or native encoding is UTF-8 and it should be farmed out into a helper class so that othere tasks can reuse it.
There is a (long term) outstanding java bug for this: http://developer.java.sun.com/developer/bugParade/bugs/4508058.html