Bug 32286

Summary:	A file created by OpenOffice from a comma seperated field is very slow
Product:	POI	Reporter:	Ian Jackson <ijackson>
Component:	HSSF	Assignee:	POI Developers List <dev>
Status:	RESOLVED DUPLICATE
Severity:	normal
Priority:	P2
Version:	unspecified
Target Milestone:	---
Hardware:	PC
OS:	Linux
Attachments:	The version created by open office The orginal

Description Ian Jackson 2004-11-17 22:48:03 UTC

I created a file of comma seperated values and used open office to write an
excel file.

I found in smaller data sets with java hprof that over 30% of the time was spent
in SSTderializer.addToStringTable and what it called mostly creating the
exception in put. Nothing else was over 5%. The precentage of time in
addToStringTable increased as my data set got larger.
   static public void addToStringTable( BinaryTree strings, Integer integer,
UnicodeString string )
   {

       if ( string.isRichText() )
           string.setOptionFlags( (byte) ( string.getOptionFlags() & ( ~8 ) ) );
       if ( string.isExtendedText() )
           string.setOptionFlags( (byte) ( string.getOptionFlags() & ( ~4 ) ) );

       boolean added = false;
       while ( added == false )
       {
           try
           {
               strings.put( integer, string );
               added = true;
           }
           catch ( Exception ignore )
           {
               string.setString( string.getString() + " " );
           }
       }

   }


Of course, if you are really expecting the values might be the same a different
data struture should be used like a straight hash map.

Comment 1 Ian Jackson 2004-11-17 22:49:32 UTC

Created attachment 13486 [details]
The version created by open office

Comment 2 Ian Jackson 2004-11-17 22:51:32 UTC

Created attachment 13487 [details]
The orginal

I cut 100 lines off the file to create the 900 records

Comment 3 Ian Jackson 2004-11-17 22:52:20 UTC

It shouldnot contain any rich text

Comment 4 Jason Height 2006-07-28 03:23:25 UTC

This is now corrected in SVN (or has been for quite some time). Previously we
didnt understand rich text, so the exception below was used to append a space on
the send of the string to preserve its uniqueness.

Now that rich text handling is correct, that block of code is now gone ie we
dont use exceptions at that low level, hence the code is much faster.

Marking as fixed.

Jason

*** This bug has been marked as a duplicate of 25039 ***