Bug 63436

Summary: JUnitLauncher Task does not embedd stdout/stderr output in cdata
Product: Ant Reporter: Michael Seele <mseele>
Component: Optional TasksAssignee: Ant Notifications List <notifications>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 1.10.6   
Target Milestone: 1.10.10   
Hardware: PC   
OS: All   
Bug Depends on:    
Bug Blocks: 65030    
Attachments: PATCH with bugfix

Description Michael Seele 2019-05-15 10:04:20 UTC
If you use <listener type="legacy-xml" sendSysErr="true" sendSysOut="true"/> in the Junit5 JUnitLauncher task, the output is not embedded in CDATA like it is in JUnit Task. 
This breaks the following junitreport Task if the sysout/syserr contains invalid xml data which is very likely to happen for sysout/syserr.

Example:

(correct) JUnit Task produces:
<system-out><![CDATA[test ouput]]></system-out>

(incorrect) JUnitLauncher task produces:
<system-out>test ouput</system-out>
Comment 2 Michael Seele 2019-05-17 09:31:33 UTC
Created attachment 36591 [details]
PATCH with bugfix
Comment 3 Jaikiran Pai 2019-05-18 04:53:43 UTC
Thank you for reporting this and the patch. However, the reason it's been coded that way was to avoid loading the entire system.out/system.err messages into memory and potentially running into OOM. This was the motivation to have (although a bit involved logic) it stream instead of holding the whole content in memory and then writing it into that report. This isn't a theoretical case either since issues like these OOMs have been reported in our JUnit task of Ant.

Having said that, the XML is expected to be well formed as well as usable by the junitreport. Can you please attach a sample xml which got generated and which is causing issues with the junitreport task?
Comment 4 Michael Seele 2019-05-20 06:57:55 UTC
(In reply to Jaikiran Pai from comment #3)
> Thank you for reporting this and the patch. However, the reason it's been
> coded that way was to avoid loading the entire system.out/system.err
> messages into memory and potentially running into OOM. This was the
> motivation to have (although a bit involved logic) it stream instead of
> holding the whole content in memory and then writing it into that report.
> This isn't a theoretical case either since issues like these OOMs have been
> reported in our JUnit task of Ant.
> 
> Having said that, the XML is expected to be well formed as well as usable by
> the junitreport. Can you please attach a sample xml which got generated and
> which is causing issues with the junitreport task?

You're right, XML "Files" inside sysout/err are escaped correct in junitreport task.

Maybe my problems came all from https://bz.apache.org/bugzilla/show_bug.cgi?id=63446

Anyway, even JUnit 5's org.junit.platform.reporting.legacy.xml.LegacyXmlReportGeneratingListener is using CDATA for sysout/err. 
And they do it in a way without having sysout/err in memory.

Please see:
https://github.com/junit-team/junit5/blob/master/junit-platform-reporting/src/main/java/org/junit/platform/reporting/legacy/xml/XmlReportWriter.java
Comment 5 Stefan Bodewig 2020-12-27 11:48:14 UTC
We are using DOMElementWriter.encode and add XMLStreamWriter.writCharacters on top of it, which may lead to double-escaped & characters as the code currently stands. If I read the code correctly encode will turn a < into &lt; and writeCharacters will turn that into &amp;lt; - will have to check.

Simply using CDATA instead of writeCharacters will not help as you need to make sure the characters you write do not contain the sequence ]]>. 

Michael can you give us an example of something that went wring with the existing code? I believe we should not invoke encode but leave that job to writeCharacters but we do need to strip out characters illegal in XML - see bug 65030.
Comment 6 Stefan Bodewig 2020-12-28 15:17:11 UTC
I performed some experiments and writeCharacters by itself takes care of replacing ampersands and friends, so encoding ourselves is hurting rather than helping. IMHO we should only deal with the characters illegal in XML (see bug #65030 ) and leave the rest of the input alone.

And of course, if there are additional problems, we should solve them.
Comment 7 Stefan Bodewig 2021-01-16 17:26:39 UTC
If I'm guessing correctly that double-encoding of sysout/err has been a problem, which I've fixed in master - so I'm closing this now.