Bug 49418 - Add support for non-ASCII encoding to <junitreport> task
Summary: Add support for non-ASCII encoding to <junitreport> task
Status: RESOLVED FIXED
Alias: None
Product: Ant
Classification: Unclassified
Component: Optional Tasks (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: 1.8.2
Assignee: Ant Notifications List
URL:
Keywords: PatchAvailable, XSLTBug
Depends on:
Blocks:
 
Reported: 2010-06-09 20:48 UTC by Yusuke Matsubara
Modified: 2010-06-14 10:30 UTC (History)
1 user (show)



Attachments
a sample build.xml and a JUnit testcase to reproduce the problem of non-ASCII strings in a junitreport task (534 bytes, application/x-gzip)
2010-06-09 20:57 UTC, Yusuke Matsubara
Details
proposed patch to add an option to specify encoding in a junitreport task (5.14 KB, patch)
2010-06-09 20:59 UTC, Yusuke Matsubara
Details | Diff
Simpler patch (1.93 KB, patch)
2010-06-10 17:26 UTC, Jesse Glick
Details | Diff
Alternate patch; produces output in platform default encoding (1.88 KB, patch)
2010-06-11 10:07 UTC, Jesse Glick
Details | Diff
Yet another patch, using HTML output only (only covers -frames and stderr) (1.80 KB, patch)
2010-06-11 10:17 UTC, Jesse Glick
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Matsubara 2010-06-09 20:48:55 UTC
The current implementation of <junitreport> has no support for JUnit testcases with  non-ASCII strings in their assertion targets or system outputs.  When non-ASCII strings are used, the rendered summary will have inappropriate encoding specification and can be unreadable.

This (possible) bug is caused by the fact that the current implementation of <junitreport> always specifies the encoding of the generated summary as "US-ASCII".  This is hard-coded in the stylesheets, src/etc/junit-frames.xsl and src/etc/junit-noframes.xsl.

Attached patch adds an optional attribute named 'encoding' to <junitreport> task.  The default value for this attribute is set to "US-ASCII" so that we can keep the current behaiviour if we don't use this option.  This patch is for Ant rev. 952676 (https://svn.apache.org/repos/asf/ant/core/trunk@952676)
Comment 1 Yusuke Matsubara 2010-06-09 20:57:11 UTC
Created attachment 25571 [details]
a sample build.xml and a JUnit testcase to reproduce the problem of non-ASCII strings in a junitreport task
Comment 2 Yusuke Matsubara 2010-06-09 20:59:46 UTC
Created attachment 25572 [details]
proposed patch to add an option to specify encoding in a junitreport task
Comment 3 Jesse Glick 2010-06-10 17:25:33 UTC
Would seem simpler and more friendly to just set the encoding to UTF-8 unconditionally.

Anyway I cannot reproduce a problem on Ubuntu with Ant 1.8.2 dev; the output has

<META http-equiv="Content-Type" content="text/html; charset=US-ASCII">
...
<code>junit.framework.AssertionFailedError: expected:&lt;[123]&gt; but was:&lt;[&#19968;&#20108;&#19977;]&gt;<br>	at Test1.test1(Unknown Source)<br>

which displays fine. Of course it would be preferable to use UTF-8 encoding even still.
Comment 4 Jesse Glick 2010-06-10 17:26:15 UTC
Created attachment 25581 [details]
Simpler patch
Comment 5 Yusuke Matsubara 2010-06-11 08:30:25 UTC
(In reply to comment #3)
> Would seem simpler and more friendly to just set the encoding to UTF-8
> unconditionally.
> 
> Anyway I cannot reproduce a problem on Ubuntu with Ant 1.8.2 dev;

Let me add more to comment #1 first.

The problem was actually in stderr outputs shown in the summary, not in assertion results.  The procedure to reproduce the problem in stderr is as follows.  After generating the test summary in the way described above, click  "<none>" in the table bottom, "1" below "Errors" and "System.err » ".  The file will contain "123 vs. �Œ	", instead of "123 vs. 一二三", which is written in the source code.

And commenting to Jess's solution, I don't think it would be sufficient to hardcode UTF-8 instead of US-ASCII, because some Java environments have default encodings that are not compatible to UTF-8.  For example, Sun's JDK for Japanese Windows has MS932 as its default encoding.

I couldn't manage to make such a patch, but using the default encoding of the system as that of the stylesheets for junitreport might be more reasonable.
Comment 6 Jesse Glick 2010-06-11 10:06:48 UTC
(In reply to comment #5)
> The problem was actually in stderr outputs shown in the summary

Ah, yes - because it is written to a text file which cannot signify its encoding.

> I don't think it would be sufficient to
> hardcode UTF-8 instead of US-ASCII, because some Java environments have default
> encodings that are not compatible to UTF-8.  For example, Sun's JDK for
> Japanese Windows has MS932 as its default encoding.

Doesn't really matter what the JDK's default encoding is; you can still write output in any encoding you like.

> I couldn't manage to make such a patch, but using the default encoding of the
> system as that of the stylesheets for junitreport might be more reasonable.

It's easy to do and I will attach a patch.

The question is which patch is better. The encoding used for the HTML pages does not matter much (since output will be written with character references if necessary); it is a bit nicer to use UTF-8 but not essential. Regarding the plain text output, there are arguments in either direction:

1. Using platform default encoding may be convenient if the web browser used to view the result is on the same machine as the one which ran <junitreport>, or which otherwise happens to have the same default encoding, and the web browser is set to use the platform default encoding by default for pages specifying no encoding (and is unable to sniff the encoding).

2. Using UTF-8 ensures that no characters will ever be dropped as unencodable, i.e. output will never be lossy. At worst you may need to take a special action to display the page in UTF-8.
Comment 7 Jesse Glick 2010-06-11 10:07:30 UTC
Created attachment 25587 [details]
Alternate patch; produces output in platform default encoding
Comment 8 Jesse Glick 2010-06-11 10:17:53 UTC
Created attachment 25588 [details]
Yet another patch, using HTML output only (only covers -frames and stderr)

Better yet, use HTML output for everything, so that the browser does not have to guess what the encoding is.
Comment 9 Yusuke Matsubara 2010-06-11 22:22:56 UTC
(In reply to comment #8)
> Created an attachment (id=25588) [details]
> Yet another patch, using HTML output only (only covers -frames and stderr)
> 
> Better yet, use HTML output for everything, so that the browser does not have
> to guess what the encoding is.

Going to HTML output for everything seems to be a right choice.  It's fairly simple and works for me on Ubuntu + OpenJDK (default encoding: UTF-8) and on Windows XP + Sun JDK (default encoding: MS932).  In the Windows case, I didn't have to change the browser's setting , since the results were all encoded in UTF-8 properly.

# Sorry for mistyping your name in my last comment #5, Jesse.
Comment 10 Jesse Glick 2010-06-14 10:30:17 UTC
revision 954484