Bug 9888

Summary: Parser corrupts document end tag
Product: Crimson Reporter: loney
Component: otherAssignee: Edwin Goei <edwingo>
Status: NEW ---    
Severity: major    
Priority: P3    
Version: 1.1   
Target Milestone: ---   
Hardware: Other   
OS: All   
Bug Depends on: 34387    
Bug Blocks:    

Description loney 2002-06-15 00:32:32 UTC
Running Crimson 1.1.1 on the input XML document shown below corrupts the input 
stream. The error is highly sensitive to the exact sequence of input 
characters. E.g., a tab initiates the line:

		   stopX="xxxxx"/>

If a space is substituted for the initial tab, then the input is successfully 
parsed. Similarly bizarre, if the initial comment is removed, then the input 
parses successfully. 

An attempt has been made to mask out letters in the input whose value appears 
not to affect the parse. These letters have value 'x'; they must be present for 
the parse to fail, but it appears that any letter can be substituted for 'x'.

This is not true of other letters. E.g., substituting all occurrences of the 
string 'xxxxxxxs' in the attribute values with the string 'xxxxxxxx' results in 
a successful parse.

Examining the stream content indicates that the parse corrupts the characters 
in the end tag. Execution results in the exception:

org.xml.sax.SAXParseException: Expected "</application>" to terminate element 
starting on line 5.
	at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3108)
	at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3102)
	at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1500)
	at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:500)
	at org.apache.crimson.parser.Parser2.parse(Parser2.java:305)
	at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433)
	at org.apache.crimson.jaxp.DocumentBuilderImpl.parse
(DocumentBuilderImpl.java:185)

This error does not occur with Xerces 2. Needless to say, the problem takes a 
long time to isolate. There is no work-around other than making random changes 
to valid input until something parses, hardly a satisfying solution.

Input content
-------------
<?xml version="1.0" encoding="UTF-8"?>

<!-- Cf. "Administrator's Guide" for information about this file -->

<application>
  <xxxxxxx xxxxxxxy="xxxxxx" xxxx="log4j"
           xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.services.Logger"
           xxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxx.Log4jLogger"/>
  <xxxxxxx xxxxxxxy="formatter" xxxx="standard"
           
xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.MessageFormatter"
           
xxxxxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxx.MessageFormatterImplMBean"
           provider="xxx.xxxxxxxxxx.xxxxxxxxx.xxxx.MessageFormatterImpl"/>
  <xxxxxxx xxxxxxxy="xxxxxxx.administrator" xxxx="jmx"
           
xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.ServiceAdministrator"
           
xxxxxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.jmx.JmxHtmlServiceAdmini
stratorMBean"
           
provider="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.jmx.JmxHtmlServiceAdministrator"/>
  <xxxxxxx xxxxxxxy="id.generator" xxxx="IETF"
           xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxx.IdGenerator"
           provider="xxx.xxxxxxxxxx.xxxxxxxxx.xxxx.IETFIdGenerator"/>
  <xxxxxxx xxxxxxxy="cache" xxxx="simple"
           xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.Cache"
           
xxxxxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.persistence.ObjectCacheMBean"
           provider="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxxxxx.ObjectCache"
	       stopX="xxxxx"/>
  <xxxxxxx xxxxxxxy="encryptor" xxxx="base64"
           xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.Encryptor"
           provider="xxx.xxxxxxxxxx.xxxxxxxxx.xxxx.Base64Encryptor"/>
  <xxxxxxx xxxxxxxy="schema" xxxx="xx"
           xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.Schema"
           xxxxxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.domain.SchemaMBean"
           provider="xxx.xxxxxxxxxx.xx.domain.SxSchema"/>
  <xxxxxxx xxxxxxxy="xxxxxxxxxxx" xxxx="transient"
           
xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.PersistentStoreFactory"
           
xxxxxxxxxxxxxxxxxxxx="test.xxxxxxxxx.xxxxxxxxxxx.TransientStoreFactoryMBean"
           provider="test.xxxxxxxxx.xxxxxxxxxxx.TransientStoreFactory"/>
  <xxxxxxx xxxxxxxy="xxxxxxxxxxx" xxxx="xml"
           
xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.PersistentStoreFactory"
           
xxxxxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxxxxx.PersistentStoreFactor
yMBean"
           provider="xxx.xxxxxxxxxx.xx.xxxxxxxxxxx.xxx.SxXmlStoreFactory"/>
  <xxxxxxx xxxxxxxy="xxxxxxxxxxx" xxxx="ejb"
           
xxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxs.PersistentStoreFactory"
           
xxxxxxxxxxxxxxxxxxxx="xxx.xxxxxxxxxx.xxxxxxxxx.xxxxxxxxxxx.PersistentStoreFactor
yMBean"
           
provider="xxx.xxxxxxxxxx.xx.enterprise.xxxxxxxxxxx.SxEjbStoreFactory"/>
</application>
Comment 1 loney 2002-06-15 00:37:41 UTC
Unfortunately, Bugzilla mangles the formatting of the file included in the bug 
report. Each attribute should be on its own line. Send me a note if you'd like 
a true copy of the input test case.