The current implementation of <junitreport> has no support for JUnit testcases with non-ASCII strings in their assertion targets or system outputs. When non-ASCII strings are used, the rendered summary will have inappropriate encoding specification and can be unreadable. This (possible) bug is caused by the fact that the current implementation of <junitreport> always specifies the encoding of the generated summary as "US-ASCII". This is hard-coded in the stylesheets, src/etc/junit-frames.xsl and src/etc/junit-noframes.xsl. Attached patch adds an optional attribute named 'encoding' to <junitreport> task. The default value for this attribute is set to "US-ASCII" so that we can keep the current behaiviour if we don't use this option. This patch is for Ant rev. 952676 (https://svn.apache.org/repos/asf/ant/core/trunk@952676)
Created attachment 25571 [details] a sample build.xml and a JUnit testcase to reproduce the problem of non-ASCII strings in a junitreport task
Created attachment 25572 [details] proposed patch to add an option to specify encoding in a junitreport task
Would seem simpler and more friendly to just set the encoding to UTF-8 unconditionally. Anyway I cannot reproduce a problem on Ubuntu with Ant 1.8.2 dev; the output has <META http-equiv="Content-Type" content="text/html; charset=US-ASCII"> ... <code>junit.framework.AssertionFailedError: expected:<[123]> but was:<[一二三]><br> at Test1.test1(Unknown Source)<br> which displays fine. Of course it would be preferable to use UTF-8 encoding even still.
Created attachment 25581 [details] Simpler patch
(In reply to comment #3) > Would seem simpler and more friendly to just set the encoding to UTF-8 > unconditionally. > > Anyway I cannot reproduce a problem on Ubuntu with Ant 1.8.2 dev; Let me add more to comment #1 first. The problem was actually in stderr outputs shown in the summary, not in assertion results. The procedure to reproduce the problem in stderr is as follows. After generating the test summary in the way described above, click "<none>" in the table bottom, "1" below "Errors" and "System.err » ". The file will contain "123 vs. �Œ ", instead of "123 vs. 一二三", which is written in the source code. And commenting to Jess's solution, I don't think it would be sufficient to hardcode UTF-8 instead of US-ASCII, because some Java environments have default encodings that are not compatible to UTF-8. For example, Sun's JDK for Japanese Windows has MS932 as its default encoding. I couldn't manage to make such a patch, but using the default encoding of the system as that of the stylesheets for junitreport might be more reasonable.
(In reply to comment #5) > The problem was actually in stderr outputs shown in the summary Ah, yes - because it is written to a text file which cannot signify its encoding. > I don't think it would be sufficient to > hardcode UTF-8 instead of US-ASCII, because some Java environments have default > encodings that are not compatible to UTF-8. For example, Sun's JDK for > Japanese Windows has MS932 as its default encoding. Doesn't really matter what the JDK's default encoding is; you can still write output in any encoding you like. > I couldn't manage to make such a patch, but using the default encoding of the > system as that of the stylesheets for junitreport might be more reasonable. It's easy to do and I will attach a patch. The question is which patch is better. The encoding used for the HTML pages does not matter much (since output will be written with character references if necessary); it is a bit nicer to use UTF-8 but not essential. Regarding the plain text output, there are arguments in either direction: 1. Using platform default encoding may be convenient if the web browser used to view the result is on the same machine as the one which ran <junitreport>, or which otherwise happens to have the same default encoding, and the web browser is set to use the platform default encoding by default for pages specifying no encoding (and is unable to sniff the encoding). 2. Using UTF-8 ensures that no characters will ever be dropped as unencodable, i.e. output will never be lossy. At worst you may need to take a special action to display the page in UTF-8.
Created attachment 25587 [details] Alternate patch; produces output in platform default encoding
Created attachment 25588 [details] Yet another patch, using HTML output only (only covers -frames and stderr) Better yet, use HTML output for everything, so that the browser does not have to guess what the encoding is.
(In reply to comment #8) > Created an attachment (id=25588) [details] > Yet another patch, using HTML output only (only covers -frames and stderr) > > Better yet, use HTML output for everything, so that the browser does not have > to guess what the encoding is. Going to HTML output for everything seems to be a right choice. It's fairly simple and works for me on Ubuntu + OpenJDK (default encoding: UTF-8) and on Windows XP + Sun JDK (default encoding: MS932). In the Windows case, I didn't have to change the browser's setting , since the results were all encoded in UTF-8 properly. # Sorry for mistyping your name in my last comment #5, Jesse.
revision 954484