Bug 40073

Summary:	SSTDeserializer problem
Product:	POI	Reporter:	Sergey Dubovitskiy <Dubovitskiy>
Component:	HPSF	Assignee:	POI Developers List <dev>
Status:	RESOLVED FIXED
Severity:	normal
Priority:	P2
Version:	2.5-FINAL
Target Milestone:	---
Hardware:	Other
OS:	other
Attachments:	sstTest.xls

Description Sergey Dubovitskiy 2006-07-19 11:30:12 UTC

Hi
Gentlemen,

We are using POI 2.5 (2.5.1 shows the same problem) to parse Excel files and 
have encountered a problem with the SSTDeserializer class.

We have solved the problem but we would like to consult with you if our 
corrections are acceptable. 
This is also possible that this is a known problem and an official fix exists?

We’ll appreciate any help much.

Reproduce the problem:
Just try to open file sstTest.xls from attachment

Following exception occurs during opening of excel file: 
java.lang.NullPointerException
            at org.apache.poi.hssf.record.SSTRecord.getString
(SSTRecord.java:277)
            at org.apache.poi.hssf.model.Workbook.getSSTString
(Workbook.java:649)
            at org.apache.poi.hssf.usermodel.HSSFCell.<init>(HSSFCell.java:283)
            at org.apache.poi.hssf.usermodel.HSSFRow.createCellFromRecord
(HSSFRow.java:198)
            at org.apache.poi.hssf.usermodel.HSSFSheet.setPropertiesFromSheet
(HSSFSheet.java:156)
            at org.apache.poi.hssf.usermodel.HSSFSheet.<init>
(HSSFSheet.java:110)
            at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>
(HSSFWorkbook.java:177)
            at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>
(HSSFWorkbook.java:210)
            at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>
(HSSFWorkbook.java:191)
 
Problem Description:
The problem reproduces when string in CONTINUE RECORD finishes at the end of 
record and extra CONTINUE RECORD exists. The readStringRemainder method reads 
remainder of current string but doesn’t update continuationCharsRead variable 
with new length. As a result when next CONTINUE record is called with 
processContinueRecord the function isStringFinished returns invalud result,and  
string is treated as unfinished one.


Solutuion:
 We  suggest to correct this problem by modifying readStringRemainder code like 
this 

private void readStringRemainder( final byte[] record )
    {
        int stringRemainderSizeInBytes = calculateByteCount( charCount-
getContinuationCharsRead() );
        byte[] unicodeStringData = new byte[SSTRecord.STRING_MINIMAL_OVERHEAD
                + stringRemainderSizeInBytes];

        // write the string length
        LittleEndian.putShort( unicodeStringData, 0, (short) (charCount-
getContinuationCharsRead()) );

        // write the options flag
        unicodeStringData[LittleEndianConsts.SHORT_SIZE] = createOptionByte( 
wideChar, richText, extendedText );

        // copy the bytes/words making up the string; skipping
        // past all the overhead of the str_data array
        arraycopy( record, LittleEndianConsts.BYTE_SIZE, unicodeStringData,
                SSTRecord.STRING_MINIMAL_OVERHEAD,
                stringRemainderSizeInBytes );

        // use special constructor to create the final string
        UnicodeString string = new UnicodeString( UnicodeString.sid,
                (short) unicodeStringData.length, unicodeStringData,
                unfinishedString );
        Integer integer = new Integer( strings.size() );

        addToStringTable( strings, integer, string );

        int newOffset = offsetForContinuedRecord( stringRemainderSizeInBytes );
    

        // ----------------------- CORRECTIONS BEGIN-------------------------
       /* 
        * This function doesn't update the continuationCharsRead variable
        * with new string length (unfinished string length + remaining string 
length )
        * Because string variable is a concatenation of unfinishedString and 
stringRemainder,
        * it 's length is characketers can be used as new value for 
continuationCharsRead.
      */

        setContinuationCharsRead(string.getCharCount() );
      
        /*
         * If we didn't reach end of current record we have to call 
         * manufactureStrings to process other strings in this record. 
         * Because of manufactureStrings  checks if end of record is reached 
         * it can be called unconditionally.
         * But the problem is, manufactureStrings   
         * will call initVars first and reset the continuationCharsRead value,
         * which is necessary for isStringFinished to work correctly
         * when next processContinueRecord will be called.
         */
        if (newOffset < record.length)
        {
            manufactureStrings( record, newOffset);
        }

        // ----------------------- CORRECTIONS END -------------------------

    }

 
 

 Thank you.
Sergey.

Comment 1 Sergey Dubovitskiy 2006-07-19 11:32:10 UTC

Created attachment 18618 [details]
sstTest.xls

Comment 2 Jason Height 2006-07-24 08:32:20 UTC

Works against latest SVN. SST handling was completely rewritten in the current
SVN. Release pending.

Jason