Created attachment 36998 [details] example file embedded in govdocs1 296107.doc We identified a fairly common regression in parsing old excel files in the most recent regression tests for POI 4.1.2-rc2. With r1872302, readByte() was introduced to OldSheetRecord after reading the "field_4_sheetname_length". We should check if the sheetname length == 0 before trying to read the byte. This causes ~550 new exceptions on the regression corpus. Stacktrace: Caused by: org.apache.poi.util.RecordFormatException at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:246) at org.apache.poi.hssf.record.RecordInputStream.readByte(RecordInputStream.java:255) at org.apache.poi.hssf.record.OldSheetRecord.<init>(OldSheetRecord.java:51) at org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:242) at o.a.t.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57) at o.a.t.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:157) at o.a.t.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) at o.a.t.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) at o.a.t.parser.CompositeParser.parse(CompositeParser.java:280)
Fixed in r1873863