Created attachment 38006 [details] example of files Hi, sometimes we get an error for the file that was modified in Excel (Microsoft Office 365). Library org.apache.poi:poi v5.0.0 (and the same result for other versions) might have some issue. Simple example below to reproduce our issue: {code} import org.apache.poi.hssf.usermodel.HSSFWorkbook; import org.apache.poi.ss.usermodel.Workbook; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; public class CheckXLSReading { public static void main(String[] args) throws IOException { InputStream inputStream = new FileInputStream("D:\\file_to_check.xls"); Workbook workbook = new HSSFWorkbook(inputStream); System.out.println(workbook); } } {code} 1) First file file_to_check_365.xls has been modified in "Microsoft Excel for Microsoft 365 MSO (16.0.14228.20216) 64-bit". And we have the following error for this file: Console output {code} Exception in thread "main" org.apache.poi.util.RecordFormatException: Not enough data (0) to read requested (2) bytes at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:246) at org.apache.poi.hssf.record.RecordInputStream.readShort(RecordInputStream.java:265) at org.apache.poi.hssf.record.common.UnicodeString.<init>(UnicodeString.java:77) at org.apache.poi.hssf.record.SSTDeserializer.manufactureStrings(SSTDeserializer.java:57) at org.apache.poi.hssf.record.SSTRecord.<init>(SSTRecord.java:235) at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:79) at org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:289) at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:255) at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:166) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:343) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:399) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:381) at CheckXLSReading.main(CheckXLSReading.java:12) {code} 2) Second file file_to_check_2016.xls has been re-saved from the previous file only using "Microsoft Excel 2016 (16.0.5188.1000) MSO (16.0.5188.1000) 32-bit". And after that we don't have any errors. Console output {code} org.apache.poi.hssf.usermodel.HSSFWorkbook@58c1670b {code} Could you please check this issue. Thank you in advance!
Hi, We've also seen this issue. About 350 times in the last month. The shared string table record reports more strings than are present in the file. In one example ~4,500 when only ~1,350 strings are present. It appears that the following code in SSTDeserializer.java was added to cope with this, but it (apparently) doesn't work: if (in.available() == 0 && !in.hasNextRecord()) { LOG.atError().log("Ran out of data before creating all the strings! String at index {}", box(i)); str = new UnicodeString(""); } If I were to create a patch to fix this issue (with tests) how likely is it that it'll be accepted? Simon Using version 5.2.3.
We are happy to review and merge patches.
Thanks. I've created a patch bug here: https://bz.apache.org/bugzilla/show_bug.cgi?id=66412
*** Bug 66412 has been marked as a duplicate of this bug. ***
Thanks for the patch - added with r1906434 For future reference, could you avoid creating 'patch' issues (first I've ever heard of such a concept)? You can attach them to the original issue.
Thank you for accepting my patch so quickly. For the record, I must have misinterpreted the Submitting Patches section in https://poi.apache.org/devel/guidelines.html#SubmittingPatches.