There were a handful of new exceptions on xlsb files on a recent run of regression tests. I'm not able to replicate this problem and the files I've reviewed are not truncated. I _think_ the problem is that we're relying on "skip" rather than "skipFully". I propose modifying the one line in XSSFBParser and fixing LittleEndianInputStream's "skipFully" to throw an EOF if the skip isn't complete...to make it parallel with its "readFully" The stacktraces were all triggered by running getAbsPathMetadata(): org.apache.poi.xssf.binary.XSSFBParseException: End of file reached before expected. Tried to skip 105, but only skipped 61 at org.apache.poi.xssf.binary.XSSFBParser.readNext(XSSFBParser.java:101) at org.apache.poi.xssf.binary.XSSFBParser.parse(XSSFBParser.java:66) at org.apache.poi.xssf.eventusermodel.XSSFBReader.getAbsPathMetadata(XSSFBReader.java:92) at org.apache.tika.parser.microsoft.ooxml.XSSFBExcelExtractorDecorator.buildXHTML(XSSFBExcelExtractorDecorator.java:87)
It turns out that there are a few handfuls of places in our codebase where we should be using skipfully. Unless there are objections, I'll make those changes shortly.
r1857277