Bug 32286

Summary: A file created by OpenOffice from a comma seperated field is very slow
Product: POI Reporter: Ian Jackson <ijackson>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: The version created by open office
The orginal

Description Ian Jackson 2004-11-17 22:48:03 UTC
I created a file of comma seperated values and used open office to write an
excel file.

I found in smaller data sets with java hprof that over 30% of the time was spent
in SSTderializer.addToStringTable and what it called mostly creating the
exception in put. Nothing else was over 5%. The precentage of time in
addToStringTable increased as my data set got larger.
   static public void addToStringTable( BinaryTree strings, Integer integer,
UnicodeString string )
   {

       if ( string.isRichText() )
           string.setOptionFlags( (byte) ( string.getOptionFlags() & ( ~8 ) ) );
       if ( string.isExtendedText() )
           string.setOptionFlags( (byte) ( string.getOptionFlags() & ( ~4 ) ) );

       boolean added = false;
       while ( added == false )
       {
           try
           {
               strings.put( integer, string );
               added = true;
           }
           catch ( Exception ignore )
           {
               string.setString( string.getString() + " " );
           }
       }

   }


Of course, if you are really expecting the values might be the same a different
data struture should be used like a straight hash map.
Comment 1 Ian Jackson 2004-11-17 22:49:32 UTC
Created attachment 13486 [details]
The version created by open office
Comment 2 Ian Jackson 2004-11-17 22:51:32 UTC
Created attachment 13487 [details]
The orginal

I cut 100 lines off the file to create the 900 records
Comment 3 Ian Jackson 2004-11-17 22:52:20 UTC
It shouldnot contain any rich text
Comment 4 Jason Height 2006-07-28 03:23:25 UTC
This is now corrected in SVN (or has been for quite some time). Previously we
didnt understand rich text, so the exception below was used to append a space on
the send of the string to preserve its uniqueness.

Now that rich text handling is correct, that block of code is now gone ie we
dont use exceptions at that low level, hence the code is much faster.

Marking as fixed.

Jason

*** This bug has been marked as a duplicate of 25039 ***