Created attachment 33191 [details] One triggering file from govdocs1 While running regression testing for the release of Tika 1.11...we found a handful of new exceptions during initialization of some ppts (TIKA-1780). One example file attached.
Many apologies for not running these regression tests before we released 3.13! :(
Sample exception: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.poi.hslf.usermodel.HSLFTextParagraph.applyParagraphIndents(HSLFTextParagraph.java:1260) at org.apache.poi.hslf.usermodel.HSLFTextParagraph.findTextParagraphs(HSLFTextParagraph.java:1171) at org.apache.poi.hslf.usermodel.HSLFTextParagraph.findTextParagraphs(HSLFTextParagraph.java:1081) at org.apache.poi.hslf.usermodel.HSLFTextParagraph.findTextParagraphs(HSLFTextParagraph.java:1017) at org.apache.poi.hslf.usermodel.HSLFTitleMaster.<init>(HSLFTitleMaster.java:41) at org.apache.poi.hslf.usermodel.HSLFSlideShow.buildSlidesAndNotes(HSLFSlideShow.java:334) at org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:143) at org.apache.poi.hslf.extractor.PowerPointExtractor.<init>(PowerPointExtractor.java:136) at org.apache.poi.hslf.extractor.PowerPointExtractor.<init>(PowerPointExtractor.java:117) at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:262) at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:231) at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:137) at org.apache.poi.stress.AbstractFileHandler.handleExtractingInternal(AbstractFileHandler.java:85) at org.apache.poi.stress.AbstractFileHandler.handleExtracting(AbstractFileHandler.java:64) at org.apache.poi.stress.HSLFFileHandler.testExtractor(HSLFFileHandler.java:65)
Fixed with r1711380 / r1711381 Please give it a try in Tika.
Will do...prob have to push to next week. Argh. Thank you!