Bug 49354 - XMLLayout writes illegal characters to XML file
Summary: XMLLayout writes illegal characters to XML file
Status: NEW
Alias: None
Product: Log4j - Now in Jira
Classification: Unclassified
Component: Layout (show other bugs)
Version: 1.2
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: log4j-dev
URL:
Keywords:
Depends on:
Blocks: 58035
  Show dependency tree
 
Reported: 2010-05-28 10:33 UTC by Myles Bunbury
Modified: 2015-06-14 21:47 UTC (History)
1 user (show)



Attachments
Log4j XML output containing illegal characters (472 bytes, application/octet-stream)
2010-05-28 10:37 UTC, Myles Bunbury
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Myles Bunbury 2010-05-28 10:33:59 UTC
XMLLayout does not appear to escape or scrub the log message to deal with illegal characters. This can result in invalid XML output by log4j, which in turn can cause XML parsers downstream to blow up.

A corner case we encountered while using log4j v1.2.15 produced the attached XML output. The message, which usually has normal text, ended up having some illegal XML characters in it. Stylus Studio reports the error on line 2, column 119 as follows:
  FATAL ERROR: Invalid character (Unicode: 0x15)

This character is indeed illegal in XML, as per:
http://www.xml.com/axml/target.html#sec-cdata-sect

A nice summary can be found here:
http://www.coderanch.com/t/124970/XML/Invalid-Character-inside-CDATA

XMLLayout should ensure that what it write is legal XML, either by escaping illegal characters, removing them, or replacing them.
Comment 1 Myles Bunbury 2010-05-28 10:37:29 UTC
Created attachment 25492 [details]
Log4j XML output containing illegal characters
Comment 2 Myles Bunbury 2010-07-08 11:17:48 UTC
(In reply to comment #0)
> XMLLayout should ensure that what it write is legal XML, either by escaping
> illegal characters, removing them, or replacing them.

Since many applications probably know in advance that their logging output is XML safe, perhaps creating a 'SafeXMLLayout' subclass would be a good solution such that people can choose safety or performance based on their requirements.
Comment 3 Curt Arnold 2010-07-08 22:45:18 UTC
The XSLTLayout in the extras companion should be immune to these types of problems since it uses the serializer in the JDK's XSLT processor.  Unless a transform is provided, it will just output the "raw" form that should be compatible with XMLLayout.  I have not done any performance benchmarking around it to compare it with XMLLayout.
Comment 4 Mat Gessel 2013-10-11 23:58:59 UTC
I have run into this a few times logging errors that occur authenticating via JNDI against an ActiveDirectory server. Either JNDI or AD returns an error message that is terminated with a NUL (0x00) character. The NUL char is illegal in a CDATA section (or anywhere in an XML document for that matter). 

I have represented the NUL char as <<NUL>> below. 

<log4j:event logger="com.co.authn.LDAPAuthenticator" timestamp="1349723665747" level="INFO" thread="http-8443-57">
<log4j:message>
<![CDATA[authenticate(user: bob, domain: foo.com): failed with javax.naming.AuthenticationException message [LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903AA, comment: AcceptSecurityContext error, data 525, v1772<<NUL>>]]]>
</log4j:message>
</log4j:event>

Looks like the place to do the escaping is org.apache.log4j.helpers.Transform.appendEscapingCDATA().