Created attachment 33300 [details] MS word file that raises an AIOOBE (poi v3.13-FINAL) Hi, We are trying to parse a number of MS Word documents using Tika v1.11 (POI v.3.13-FINAL), however an AIOOBE is raised when trying to parse the document attached. Even if the file is corrupted, shouldn't we get a different exception other than an "Unexpected RuntimeException" Could you please have a look at this? Thanks, Panagiotis Stacktrace: Cause: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@6ebda34c Cause: 128 Caused by: java.lang.ArrayIndexOutOfBoundsException: 128 at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:224) at org.apache.poi.util.ShortField.readFromBytes(ShortField.java:166) at org.apache.poi.util.ShortField.<init>(ShortField.java:91) at org.apache.poi.poifs.property.Property.<init>(Property.java:165) at org.apache.poi.poifs.property.DirectoryProperty.<init>(DirectoryProperty.java:69) at org.apache.poi.poifs.property.PropertyFactory.convertToProperties(PropertyFactory.java:79) at org.apache.poi.poifs.property.NPropertyTable.buildProperties(NPropertyTable.java:110) at org.apache.poi.poifs.property.NPropertyTable.<init>(NPropertyTable.java:66) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:416) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:228) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:164) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:164) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
Upon a quick look it looks like the data that we read tries to read a huge property-string where the actual byte-array with data is much smaller, so based on that it looks like the document is really incorrectly formatted. It also does not look too good when opened in LibreOffice, lots of giverish and no readable content as far as I can see. So for now we only can try to improve the error message here with an additional check.