Created attachment 28900 [details] offending word document Error (stack trace heer) when parsing 'old' .doc format word doc. When same doc is saved to docx format, error no longer occurs. <p class="tOC_3"><i>Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@4c5cc942 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:133) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:400) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:101) Caused by: java.lang.IndexOutOfBoundsException: Index: 151, Size: 79 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.poi.hwpf.model.ListTables.getOverride(ListTables.java:196) at org.apache.poi.hwpf.usermodel.Paragraph.newParagraph(Paragraph.java:108) at org.apache.poi.hwpf.usermodel.Range.getParagraph(Range.java:890) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:96) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ... 5 more
I verified that text/properties from this document can be extracted successfully.with current POI (3.12-beta1), therefore resolving this as fixed.