Bug 40073 - SSTDeserializer problem
Summary: SSTDeserializer problem
Alias: None
Product: POI
Classification: Unclassified
Component: HPSF (show other bugs)
Version: 2.5-FINAL
Hardware: Other other
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2006-07-19 11:30 UTC by Sergey Dubovitskiy
Modified: 2006-07-24 01:32 UTC (History)
0 users

sstTest.xls (28.50 KB, application/octet-stream)
2006-07-19 11:32 UTC, Sergey Dubovitskiy

Note You need to log in before you can comment on or make changes to this bug.
Description Sergey Dubovitskiy 2006-07-19 11:30:12 UTC

We are using POI 2.5 (2.5.1 shows the same problem) to parse Excel files and 
have encountered a problem with the SSTDeserializer class.

We have solved the problem but we would like to consult with you if our 
corrections are acceptable. 
This is also possible that this is a known problem and an official fix exists?

We’ll appreciate any help much.

Reproduce the problem:
Just try to open file sstTest.xls from attachment

Following exception occurs during opening of excel file: 
            at org.apache.poi.hssf.record.SSTRecord.getString
            at org.apache.poi.hssf.model.Workbook.getSSTString
            at org.apache.poi.hssf.usermodel.HSSFCell.<init>(HSSFCell.java:283)
            at org.apache.poi.hssf.usermodel.HSSFRow.createCellFromRecord
            at org.apache.poi.hssf.usermodel.HSSFSheet.setPropertiesFromSheet
            at org.apache.poi.hssf.usermodel.HSSFSheet.<init>
            at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>
            at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>
            at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>
Problem Description:
The problem reproduces when string in CONTINUE RECORD finishes at the end of 
record and extra CONTINUE RECORD exists. The readStringRemainder method reads 
remainder of current string but doesn’t update continuationCharsRead variable 
with new length. As a result when next CONTINUE record is called with 
processContinueRecord the function isStringFinished returns invalud result,and  
string is treated as unfinished one.

 We  suggest to correct this problem by modifying readStringRemainder code like 

private void readStringRemainder( final byte[] record )
        int stringRemainderSizeInBytes = calculateByteCount( charCount-
getContinuationCharsRead() );
        byte[] unicodeStringData = new byte[SSTRecord.STRING_MINIMAL_OVERHEAD
                + stringRemainderSizeInBytes];

        // write the string length
        LittleEndian.putShort( unicodeStringData, 0, (short) (charCount-
getContinuationCharsRead()) );

        // write the options flag
        unicodeStringData[LittleEndianConsts.SHORT_SIZE] = createOptionByte( 
wideChar, richText, extendedText );

        // copy the bytes/words making up the string; skipping
        // past all the overhead of the str_data array
        arraycopy( record, LittleEndianConsts.BYTE_SIZE, unicodeStringData,
                stringRemainderSizeInBytes );

        // use special constructor to create the final string
        UnicodeString string = new UnicodeString( UnicodeString.sid,
                (short) unicodeStringData.length, unicodeStringData,
                unfinishedString );
        Integer integer = new Integer( strings.size() );

        addToStringTable( strings, integer, string );

        int newOffset = offsetForContinuedRecord( stringRemainderSizeInBytes );

        // ----------------------- CORRECTIONS BEGIN-------------------------
        * This function doesn't update the continuationCharsRead variable
        * with new string length (unfinished string length + remaining string 
length )
        * Because string variable is a concatenation of unfinishedString and 
        * it 's length is characketers can be used as new value for 

        setContinuationCharsRead(string.getCharCount() );
         * If we didn't reach end of current record we have to call 
         * manufactureStrings to process other strings in this record. 
         * Because of manufactureStrings  checks if end of record is reached 
         * it can be called unconditionally.
         * But the problem is, manufactureStrings   
         * will call initVars first and reset the continuationCharsRead value,
         * which is necessary for isStringFinished to work correctly
         * when next processContinueRecord will be called.
        if (newOffset < record.length)
            manufactureStrings( record, newOffset);

        // ----------------------- CORRECTIONS END -------------------------



 Thank you.
Comment 1 Sergey Dubovitskiy 2006-07-19 11:32:10 UTC
Created attachment 18618 [details]
Comment 2 Jason Height 2006-07-24 08:32:20 UTC
Works against latest SVN. SST handling was completely rewritten in the current
SVN. Release pending.