We have encountered this problem when using Tika 1.11 that embeds POI 3.13. The error below is logged to stderr when parsing some ppt files. Unfortunately I cannot attach such a file because they are condfidential. Although the error seems to be harmless, we have many such exceptions, and application log is so flooded by them that we had to create a patch for it. How we patched it: in class org.apache.poi.hslf.record.TxMasterStyleAtom, line 72, there is an exception caught and printed to stderr ("e.printStackTrace();"). Instead we have added "POILogFactory.getLogger(TxMasterStyleAtom.class).log(POILogger.WARN, "Exception when reading available styles", e);". java.lang.ArrayIndexOutOfBoundsException: 110 at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:224) at org.apache.poi.hslf.model.textproperties.TabStopPropCollection.parseProperty(TabStopPropCollection.java:100) at org.apache.poi.hslf.model.textproperties.TextPropCollection.buildTextPropList(TextPropCollection.java:224) at org.apache.poi.hslf.record.TxMasterStyleAtom.init(TxMasterStyleAtom.java:157) at org.apache.poi.hslf.record.TxMasterStyleAtom.<init>(TxMasterStyleAtom.java:67) at sun.reflect.GeneratedConstructorAccessor498.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181) at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:128) at org.apache.poi.hslf.record.Environment.<init>(Environment.java:54) at sun.reflect.GeneratedConstructorAccessor690.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181) at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:128) at org.apache.poi.hslf.record.Document.<init>(Document.java:122) at sun.reflect.GeneratedConstructorAccessor688.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181) at org.apache.poi.hslf.record.Record.buildRecordAtOffset(Record.java:103) at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:286) at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:267) at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:178) at org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:171) at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) at org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:219) at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:182) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
Thank you for raising this. We should probably fix this in the handful of other places that we have a printStackTrace...