Bug 45778

Summary: Cloning autofilters leads to HUGE files
Product: POI Reporter: Antti Koskimäki <antti.koskimaki>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Attachments: excel and sourcecode to reproduce situation

Description Antti Koskimäki 2008-09-11 05:14:47 UTC
Created attachment 22554 [details]
excel and sourcecode to reproduce situation

Cloning sheets having several columns with autofilter leads to HUGE output files.

Opening the file into excel and saving again reduces size dramatically.

Cloning same sheets without autofilters produces much smaller file.

- without autofilter     320k
- with autofilter      13080k
- same back from excel   580k
 
Size-effect is the same with both svn r696683 (working autofilters, see fix 45720) as with 3.0.2-FINAL (broken autofilters).

Sourcecode to create examples attached.
Comment 1 Josh Micich 2008-10-23 12:29:45 UTC
fixed in svn r707450

Junit added.

ObjRecord was not reading sub-record ftLbsData properly.  ftLbsData does not encode its length in the second ushort field.  This field often has a value like 0x1FFEE.  Due to another bug in RecordInputStream that allows buffer read-overruns, the ftLbsData was being interpreted as being nearly 8KB instead of 20 bytes.  


I re-ran your test (cloning 50 sheets from a 22KB input workbook).
Here are the resulting file sizes:
before fix: 13,101KB
after fix:     588KB

Your test spreadsheet had 30 (autofilter) combo-boxes, which get cloned 50 times.  Assuming 8KB extra per combo-box, this would account for roughly 12MB extra allocation.  I didn't attempt the same test (manually) with Excel, but the remaining difference from the 580KB that you got could be explained by POI's expansion of MULRK records.  Preserving MULRK, MULBLANK and SHARREDFMLA records is on the to-do list but currently not a high priority.
Comment 2 Antti Koskimäki 2008-10-24 00:05:28 UTC
> I re-ran your test (cloning 50 sheets from a 22KB input workbook).
> Here are the resulting file sizes:
> before fix: 13,101KB
> after fix:     588KB

Thanks, verified that this is OK.

But, when I tried to stress-test the fix, I noticed that by adding more cloning-iterations the workbook got corrupted at some point (limit=83, to be exact :=). Workbook without auto-filters seems not to corrupt, not even with 1000 clones.

When I tried to open the workbook I got several "File error, data may have been lost" pop-ups in row, one for each (!) cloned sheet. Besides fatal, also very annoying :=)

Not sure how closely relates to this bug or fix, seems to re-produce with 3.2-FINAL too. But you mentioning buffer-overruns and me using the same test-case to reproduce, plus auto-filters having something to do with it, I decided to re-open this bug instead of reporting new one.

 

Comment 3 Josh Micich 2009-01-28 18:40:31 UTC
(In reply to comment #2)

BTW - the buffer overrun problem in RecordInputStream (that helped keep this bug silent) was fixed in svn r707778 .

It would be better if we close off this bugzilla to help track things more clearly.  I am pretty sure that the original bug is fixed.  The new bug you describe has different characteristics and the fix will probably be unrelated to svn r707450 .

Please open a new bugzilla for your second bug.  I suspect that this new bug will be of a lower priority since most POI users won't clone anywhere near 83 sheets.  However, if you need a fix urgently, you can speed things up by providing a patch or at least identifying the exact cause of the bug.