Bug 65543 - RecordFormatException: Not enough data (0) to read requested (2) bytes
Summary: RecordFormatException: Not enough data (0) to read requested (2) bytes
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 5.0.0-FINAL
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 66412 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-09-01 07:32 UTC by Egor Yashkov
Modified: 2023-01-09 08:52 UTC (History)
1 user (show)



Attachments
example of files (41.79 KB, application/x-7z-compressed)
2021-09-01 07:32 UTC, Egor Yashkov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Egor Yashkov 2021-09-01 07:32:00 UTC
Created attachment 38006 [details]
example of files

Hi,

sometimes we get an error for the file that was modified in Excel (Microsoft Office 365). Library org.apache.poi:poi v5.0.0 (and the same result for other versions) might have some issue. 

Simple example below to reproduce our issue:

{code}
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Workbook;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

public class CheckXLSReading {
    public static void main(String[] args) throws IOException {
        InputStream inputStream = new FileInputStream("D:\\file_to_check.xls");

        Workbook workbook = new HSSFWorkbook(inputStream);

        System.out.println(workbook);
    }
}
{code}

1) First file file_to_check_365.xls has been modified in "Microsoft Excel for Microsoft 365 MSO (16.0.14228.20216) 64-bit". And we have the following error for this file:

Console output
{code}
Exception in thread "main" org.apache.poi.util.RecordFormatException: Not enough data (0) to read requested (2) bytes
	at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:246)
	at org.apache.poi.hssf.record.RecordInputStream.readShort(RecordInputStream.java:265)
	at org.apache.poi.hssf.record.common.UnicodeString.<init>(UnicodeString.java:77)
	at org.apache.poi.hssf.record.SSTDeserializer.manufactureStrings(SSTDeserializer.java:57)
	at org.apache.poi.hssf.record.SSTRecord.<init>(SSTRecord.java:235)
	at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:79)
	at org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:289)
	at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:255)
	at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:166)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:343)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:399)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:381)
	at CheckXLSReading.main(CheckXLSReading.java:12)
{code}

2) Second file file_to_check_2016.xls has been re-saved from the previous file only using "Microsoft Excel 2016 (16.0.5188.1000) MSO (16.0.5188.1000) 32-bit". And after that we don't have any errors.

Console output
{code}
org.apache.poi.hssf.usermodel.HSSFWorkbook@58c1670b
{code}

Could you please check this issue. Thank you in advance!
Comment 1 simon.carter 2023-01-03 15:39:18 UTC
Hi,

We've also seen this issue. About 350 times in the last month.

The shared string table record reports more strings than are present in the file. In one example ~4,500 when only ~1,350 strings are present.

It appears that the following code in SSTDeserializer.java was added to cope with this, but it (apparently) doesn't work:

if (in.available() == 0 && !in.hasNextRecord()) {
   LOG.atError().log("Ran out of data before creating all the strings! String at index {}", box(i));
   str = new UnicodeString("");
} 

If I were to create a patch to fix this issue (with tests) how likely is it that it'll be accepted?

Simon

Using version 5.2.3.
Comment 2 PJ Fanning 2023-01-03 15:53:09 UTC
We are happy to review and merge patches.
Comment 3 simon.carter 2023-01-05 16:21:16 UTC
Thanks. I've created a patch bug here: https://bz.apache.org/bugzilla/show_bug.cgi?id=66412
Comment 4 PJ Fanning 2023-01-06 23:35:01 UTC
*** Bug 66412 has been marked as a duplicate of this bug. ***
Comment 5 PJ Fanning 2023-01-06 23:52:50 UTC
Thanks for the patch - added with r1906434

For future reference, could you avoid creating 'patch' issues (first I've ever heard of such a concept)? You can attach them to the original issue.
Comment 6 simon.carter 2023-01-09 08:52:13 UTC
Thank you for accepting my patch so quickly. 

For the record, I must have misinterpreted the Submitting Patches section in https://poi.apache.org/devel/guidelines.html#SubmittingPatches.