Bug 49050

Summary: [PATCH] Inefficient concatenation of ContinueRecord contents in AbstractEscherHolderRecord
Product: POI Reporter: Trejkaz (pen name) <trejkaz>
Component: HSSFAssignee: POI Developers List <dev>
Severity: major Keywords: PatchAvailable
Priority: P2    
Version: 3.5-FINAL   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Proposed patch
Proposed patch

Description Trejkaz (pen name) 2010-04-06 03:49:05 UTC
We encountered a spreadsheet with a lot of drawings in it, which resulted in a DrawingGroupRecord which was so large it had multiple separate ContinueRecord entries to make up all its data.

Subsequently, the gradual growing of the byte array as these were encountered became a performance issue, with the file taking around 20 minutes to load.

I can't provide a sample file, but I will provide a patch which reduces this case to around 3 seconds.
Comment 1 Trejkaz (pen name) 2010-04-06 03:52:15 UTC
Created attachment 25231 [details]
Proposed patch

Proposed patch fixes the issue for that specific record type.

I took a look at some other record types (e.g. DrawingRecord.)  A lot of them assume a single ContinueRecord and indeed I haven't seen a case of them having more than one.  But in the event that one day they do, this solution would work there as well.

Another thing I noticed is that other record types explicitly keep the ContinueRecord data separate and serialise it separately, whereas with EscherDrawingRecord this was not the case.  This may not be a problem though, I just thought it was worth noting.
Comment 2 Trejkaz (pen name) 2010-04-06 04:33:32 UTC
Created attachment 25232 [details]
Proposed patch

Newer version after our own review.
Comment 3 Nick Burch 2010-05-05 12:46:41 UTC
Thanks for the patch, applied in r941379.