Bug 9448

Summary: HSSF fails to properly handle extended strings over continuations
Product: POI Reporter: Daniel Stephens <daniel>
Component: HSSFAssignee: POI Developers List <dev>
Severity: critical    
Priority: P3    
Version: 1.5.1   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Daniel Stephens 2002-05-27 19:44:17 UTC
Reading an excel produced spreadsheet crashes with a NegativeArraySizeException
to be thrown in processString.

I've done some extensive delving to try and find the cause of this problem and
hopefully fix it, what it boils down to is that there's a few incorrect
assumptions made in bits of SSTRecord (specifically within  manufactureStrings
and processContinueRecord) which fail if one has a string with  extended
information that doesn't fit completely in the middle of a record (i.e. one
which requires continue records).

I have a nasty suspicion that this is a design issue which would require
re-working the parser so that instead of trying to build an incomplete string
up, it'll have to assemble an 'accumulated record' before string parsing, until
the whole string and its other data can be fished out (or appropriately ignored).

I can if necessary provide a spreadsheet which exhibits this problem, though
it's pretty large.  It may well be that a simple example is enough, though:

Part way through a record there's a string which has a char_count of 44.
This is a wide string, so its initial byte count is 91, and it also has the
extended flag set, so that the total size expands up to 26719 bytes [Which is
obviously too big to fit in a single buffer].

When manufactureStrings gets to this, it realizes it's going to cross records,
but the subsequent 'partial string' logic completely fails to take into account
both the additional length field BEFORE the string, and that some of the final
size might not actually be part of the character data, so it ends up vastly
overestimating how many characters the string would be.

I can't see any way of fixing this without re-writing the continuation handling
as described above (and that's a little too major of a change for me to attack
at this moment, especially since I only downloaded HSSF yesterday! 8-))
Comment 1 Andy Oliver 2002-05-27 20:46:29 UTC
great detective work.  Yes this is a known problem.  It means that SSTRecord is
going to become even MORE complicated and painful than it already is.  Glen is
working on refactoring this I believe and Marc mentioned an interest in
correcting the problem.

*** This bug has been marked as a duplicate of 7655 ***
Comment 2 Glen Stampoultzis 2002-05-28 00:16:56 UTC
Oooh, yes nice job.  As Andy suggested I'm looking at this area right now 
(although for another reason), I'll take this one.  SST is a meaty little 

Do you have a testcase or spreadsheet that's causing this problem?  That would 
be a big help.