Bug 48803 - File error: data may have been lost
Summary: File error: data may have been lost
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.6-FINAL
Hardware: PC Windows Vista
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-23 11:35 UTC by Michiel Leegwater
Modified: 2016-07-26 20:05 UTC (History)
0 users



Attachments
File with error (zipped, due to size limit) (238.00 KB, application/zip)
2010-02-23 11:39 UTC, Michiel Leegwater
Details
Same file (zipped) without error (file with error saved by MS Excel 2003) (238.58 KB, application/zip)
2010-02-23 11:40 UTC, Michiel Leegwater
Details
New file (zipped) without Rich Text (235.46 KB, application/zip)
2010-02-25 16:01 UTC, Michiel Leegwater
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michiel Leegwater 2010-02-23 11:35:23 UTC
While using POI to generate a new Excel file no problems occur. However, when reading the generated file using Excel 2003 (SP3) the message "File error: data may have been lost" shows up when opening the generated file. (File is attached with contents converted to gibberish (quite literally)). Otherwise the file seems quite fine when reading it with Excel 2003. However when opening it using Excel 2000 the error message also occurs, but the file is not readable.
Comment 1 Michiel Leegwater 2010-02-23 11:39:51 UTC
Created attachment 25041 [details]
File with error (zipped, due to size limit)
Comment 2 Michiel Leegwater 2010-02-23 11:40:47 UTC
Created attachment 25042 [details]
Same file (zipped) without error (file with error saved by MS Excel 2003)
Comment 3 Michael Zalewski 2010-02-24 18:35:11 UTC
The attached XLS contains two strings in the string table with a value of "gibberish".

The first one is a 'Rich Text' type, but there is only one format run. According to my reading of the Excel spec, a 'Rich Text' type should contain at least two distinct format runs.

Look at your code where you are generating a Rich Text string. If there is only one format, apply the style to the cell itself (instead of applying one format to the entire string).
Comment 4 Michiel Leegwater 2010-02-25 16:01:45 UTC
Created attachment 25060 [details]
New file (zipped) without Rich Text
Comment 5 Michiel Leegwater 2010-02-25 16:07:29 UTC
Thanks for your efforts Michael. A new 'gibberish' file has been added which, unfortunately has the same problem (or it looks like it). The Rich Text objects have been removed and the style issues have been implemented using the suggested cell.setCellStyle().
Comment 6 Michael Zalewski 2010-02-26 00:34:07 UTC
I see one more problem with the shared string table, but I have no idea how it comes about. In 'gibberish with errors.xls', the shared string table looks like this:

[SST] strings=97908 unique=4
0: 'gibberish' ref=2365 hash=a3697bb2
1: 'gibberish'5 ref=95543 hash=a3697bb2
EXTSST {strings_per_entry=8, sst_index=[{sst=1964, offset=12, unused=0}]}
[/SST]

The header claims that the table has 4 unique strings, but there are only 2.

In 'gibberish with errors 2.xls', the shared string table looks like this:

[SST] strings=97926 unique=2
0: 'gibberish' ref=97926 hash=a3697bb2
EXTSST {strings_per_entry=8, sst_index=[{sst=2057, offset=12, unused=0}]}
[/SST]

Now the table claims to have 2 unique strings, but there is only 1. It's also interesting to note: The two files have a different number of strings. In the first file, the total strings count is 97908, in the second it is 97926. In the above output fragments, ref= refers to the number of LABELSST cells that refer to a particular string. These numbers do add up, so POI is probably getting this number correct. (In the first file, 95543 + 2365 = 97908. In the second, there is only one SST entry, and it has 97926 cells which reference that value).

I guess your code must be creating the value 'gibberish' in two different ways. Perhaps you set one value and then change it? In any event, this is definately a bug in POI. Probably two bugs

1) POI should not allow a RichText value with only one format run to be put into the SST as a RichText. Instead, the format should be applied to the entire cell, and the value should be placed into the SST as simple text.

2) POI appears to be counting the unique strings in the SST incorrectly. It doesn't usually do this -- there are probably thousands of users who do not encounter this problem.
Comment 7 Michiel Leegwater 2010-03-11 09:01:31 UTC
The cells are only written once. When I disable the 'gibberish11' tab the problem is no longer there. It may seem that this problem is size related. The 'gibberish9' and 'gibberish10' tabs are generated using the same code.

Thanks for your investigations.
Comment 8 Michiel Leegwater 2011-02-11 05:59:41 UTC
Er, right. Revisited this issue today and found that setting the width of too many columns (more than HSSF supports) caused the problem.

Should calls to HSSFSheet.setColumnWidth(int, int) be protected with a range check?
Comment 9 Nick Burch 2011-02-11 07:22:50 UTC
(In reply to comment #8)
> Er, right. Revisited this issue today and found that setting the width of too
> many columns (more than HSSF supports) caused the problem.
> 
> Should calls to HSSFSheet.setColumnWidth(int, int) be protected with a range
> check?

HSSF should support the same number of files as excel does (256 for a .xls file). Are you finding something different?
Comment 10 Michiel Leegwater 2011-02-11 08:47:54 UTC
(In reply to comment #9)
> HSSF should support the same number of files as excel does (256 for a .xls
> file). Are you finding something different?
I am not finding something different. To comply with the column limitation the table was transposed, but the column widths were not.
Comment 11 Dominik Stadler 2016-07-26 20:05:33 UTC
As far as I can identify this should either be fixed or was some other problem in the first place, therefore I am closing this old bug report for now, please report a new bug if there is still something that seems to not work for you.