Bug 60626

Summary: ArrayIndexOutOfBoundsException in EvilUnclosedBRFixingInputStream
Product: POI Reporter: Joachim Piketz <pik>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: blocker    
Priority: P2    
Version: 3.16-dev   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: VML file that causes the problem
Alternative implementation of EvilUnclosedBRFixingInputStream

Description Joachim Piketz 2017-01-23 08:32:46 UTC
Created attachment 34663 [details]
VML file that causes the problem

I have an Excel file that can't be loaded. I found that EvilUnclosedBRFixingInputStream has a problem with a VML file with was part of my Excel file. 

The following sample code reproduces the problem:

String xmlFile = "vmlDrawing3.vml";
byte[] data = Files.readAllBytes(Paths.get(xmlFile));
ByteArrayInputStream bis = new ByteArrayInputStream(data);
EvilUnclosedBRFixingInputStream is = new EvilUnclosedBRFixingInputStream(bis);
DocumentHelper.readDocument(is);

The following Exception is thrown, however not in all Operating Systems/JDK Versions:

   Caused by: java.lang.ArrayIndexOutOfBoundsException: 2048
     at org.apache.xerces.impl.io.UTF8Reader.read(UTF8Reader.java:336)
     at org.apache.xerces.impl.XMLEntityScanner.load(XMLEntityScanner.java:1753)
     at org.apache.xerces.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:834)
     at org.apache.xerces.impl.XMLScanner.scanAttributeValue(XMLScanner.java:772)
     at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:529)
     at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:181)
     at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1653)
     at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324)
     at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:875)
     at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:798)
     at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108)
     at org.apache.xerces.parsers.DOMParser.parse(DOMParser.java:230)
     at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:298)
     at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
     at org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:137)
	...
Comment 1 Joachim Piketz 2017-01-23 08:34:52 UTC
I used a FilterInputStream to test if it works with another EvilUnclosedBRFixingInputStream implementation, and it did. I have attached the file, but not tested it any further.
Comment 2 Joachim Piketz 2017-01-23 08:36:02 UTC
Created attachment 34664 [details]
Alternative implementation of EvilUnclosedBRFixingInputStream
Comment 3 Andreas Beeker 2017-01-31 00:37:03 UTC
Not sure who copied from whom, but if this wasn't yours first, it would be nice if you've posted the source:

http://stackoverflow.com/a/40941512/2066598

https://github.com/Inbot/inbot-utils/blob/master/src/main/java/io/inbot/utils/ReplacingInputStream.java

Btw. I'm facing exactly the same with some user input and from the looks of it, the new implementation looks cleaner ...
Comment 4 Andreas Beeker 2017-02-08 01:26:50 UTC
The old implementation was somehow affected by the amount of bytes which were  read ... although there was a test for different buffer sizes.

I've updated the license references - but it should be ok, because the MIT license is compatible [1]

fixed with r1782095 

[1] https://www.apache.org/legal/resolved#category-a