Bug 65543 - RecordFormatException: Not enough data (0) to read requested (2) bytes
Summary: RecordFormatException: Not enough data (0) to read requested (2) bytes
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 5.0.0-FINAL
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
: 66412 (view as bug list)
Depends on:
Reported: 2021-09-01 07:32 UTC by Egor Yashkov
Modified: 2023-01-09 08:52 UTC (History)
1 user (show)

example of files (41.79 KB, application/x-7z-compressed)
2021-09-01 07:32 UTC, Egor Yashkov

Note You need to log in before you can comment on or make changes to this bug.
Description Egor Yashkov 2021-09-01 07:32:00 UTC
Created attachment 38006 [details]
example of files


sometimes we get an error for the file that was modified in Excel (Microsoft Office 365). Library org.apache.poi:poi v5.0.0 (and the same result for other versions) might have some issue. 

Simple example below to reproduce our issue:

import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Workbook;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

public class CheckXLSReading {
    public static void main(String[] args) throws IOException {
        InputStream inputStream = new FileInputStream("D:\\file_to_check.xls");

        Workbook workbook = new HSSFWorkbook(inputStream);


1) First file file_to_check_365.xls has been modified in "Microsoft Excel for Microsoft 365 MSO (16.0.14228.20216) 64-bit". And we have the following error for this file:

Console output
Exception in thread "main" org.apache.poi.util.RecordFormatException: Not enough data (0) to read requested (2) bytes
	at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:246)
	at org.apache.poi.hssf.record.RecordInputStream.readShort(RecordInputStream.java:265)
	at org.apache.poi.hssf.record.common.UnicodeString.<init>(UnicodeString.java:77)
	at org.apache.poi.hssf.record.SSTDeserializer.manufactureStrings(SSTDeserializer.java:57)
	at org.apache.poi.hssf.record.SSTRecord.<init>(SSTRecord.java:235)
	at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:79)
	at org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:289)
	at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:255)
	at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:166)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:343)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:399)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:381)
	at CheckXLSReading.main(CheckXLSReading.java:12)

2) Second file file_to_check_2016.xls has been re-saved from the previous file only using "Microsoft Excel 2016 (16.0.5188.1000) MSO (16.0.5188.1000) 32-bit". And after that we don't have any errors.

Console output

Could you please check this issue. Thank you in advance!
Comment 1 simon.carter 2023-01-03 15:39:18 UTC

We've also seen this issue. About 350 times in the last month.

The shared string table record reports more strings than are present in the file. In one example ~4,500 when only ~1,350 strings are present.

It appears that the following code in SSTDeserializer.java was added to cope with this, but it (apparently) doesn't work:

if (in.available() == 0 && !in.hasNextRecord()) {
   LOG.atError().log("Ran out of data before creating all the strings! String at index {}", box(i));
   str = new UnicodeString("");

If I were to create a patch to fix this issue (with tests) how likely is it that it'll be accepted?


Using version 5.2.3.
Comment 2 PJ Fanning 2023-01-03 15:53:09 UTC
We are happy to review and merge patches.
Comment 3 simon.carter 2023-01-05 16:21:16 UTC
Thanks. I've created a patch bug here: https://bz.apache.org/bugzilla/show_bug.cgi?id=66412
Comment 4 PJ Fanning 2023-01-06 23:35:01 UTC
*** Bug 66412 has been marked as a duplicate of this bug. ***
Comment 5 PJ Fanning 2023-01-06 23:52:50 UTC
Thanks for the patch - added with r1906434

For future reference, could you avoid creating 'patch' issues (first I've ever heard of such a concept)? You can attach them to the original issue.
Comment 6 simon.carter 2023-01-09 08:52:13 UTC
Thank you for accepting my patch so quickly. 

For the record, I must have misinterpreted the Submitting Patches section in https://poi.apache.org/devel/guidelines.html#SubmittingPatches.