Created attachment 34663 [details] VML file that causes the problem I have an Excel file that can't be loaded. I found that EvilUnclosedBRFixingInputStream has a problem with a VML file with was part of my Excel file. The following sample code reproduces the problem: String xmlFile = "vmlDrawing3.vml"; byte[] data = Files.readAllBytes(Paths.get(xmlFile)); ByteArrayInputStream bis = new ByteArrayInputStream(data); EvilUnclosedBRFixingInputStream is = new EvilUnclosedBRFixingInputStream(bis); DocumentHelper.readDocument(is); The following Exception is thrown, however not in all Operating Systems/JDK Versions: Caused by: java.lang.ArrayIndexOutOfBoundsException: 2048 at org.apache.xerces.impl.io.UTF8Reader.read(UTF8Reader.java:336) at org.apache.xerces.impl.XMLEntityScanner.load(XMLEntityScanner.java:1753) at org.apache.xerces.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:834) at org.apache.xerces.impl.XMLScanner.scanAttributeValue(XMLScanner.java:772) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:529) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:181) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1653) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324) at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:875) at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:798) at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108) at org.apache.xerces.parsers.DOMParser.parse(DOMParser.java:230) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:298) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:137) ...
I used a FilterInputStream to test if it works with another EvilUnclosedBRFixingInputStream implementation, and it did. I have attached the file, but not tested it any further.
Created attachment 34664 [details] Alternative implementation of EvilUnclosedBRFixingInputStream
Not sure who copied from whom, but if this wasn't yours first, it would be nice if you've posted the source: http://stackoverflow.com/a/40941512/2066598 https://github.com/Inbot/inbot-utils/blob/master/src/main/java/io/inbot/utils/ReplacingInputStream.java Btw. I'm facing exactly the same with some user input and from the looks of it, the new implementation looks cleaner ...
The old implementation was somehow affected by the amount of bytes which were read ... although there was a test for different buffer sizes. I've updated the license references - but it should be ok, because the MIT license is compatible [1] fixed with r1782095 [1] https://www.apache.org/legal/resolved#category-a