Bug 32286 - A file created by OpenOffice from a comma seperated field is very slow
Summary: A file created by OpenOffice from a comma seperated field is very slow
Status: RESOLVED DUPLICATE of bug 25039
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2004-11-17 22:48 UTC by Ian Jackson
Modified: 2006-07-27 20:23 UTC (History)
0 users

The version created by open office (305.00 KB, application/octet-stream)
2004-11-17 22:49 UTC, Ian Jackson
The orginal (123.98 KB, text/plain)
2004-11-17 22:51 UTC, Ian Jackson

Note You need to log in before you can comment on or make changes to this bug.
Description Ian Jackson 2004-11-17 22:48:03 UTC
I created a file of comma seperated values and used open office to write an
excel file.

I found in smaller data sets with java hprof that over 30% of the time was spent
in SSTderializer.addToStringTable and what it called mostly creating the
exception in put. Nothing else was over 5%. The precentage of time in
addToStringTable increased as my data set got larger.
   static public void addToStringTable( BinaryTree strings, Integer integer,
UnicodeString string )

       if ( string.isRichText() )
           string.setOptionFlags( (byte) ( string.getOptionFlags() & ( ~8 ) ) );
       if ( string.isExtendedText() )
           string.setOptionFlags( (byte) ( string.getOptionFlags() & ( ~4 ) ) );

       boolean added = false;
       while ( added == false )
               strings.put( integer, string );
               added = true;
           catch ( Exception ignore )
               string.setString( string.getString() + " " );


Of course, if you are really expecting the values might be the same a different
data struture should be used like a straight hash map.
Comment 1 Ian Jackson 2004-11-17 22:49:32 UTC
Created attachment 13486 [details]
The version created by open office
Comment 2 Ian Jackson 2004-11-17 22:51:32 UTC
Created attachment 13487 [details]
The orginal

I cut 100 lines off the file to create the 900 records
Comment 3 Ian Jackson 2004-11-17 22:52:20 UTC
It shouldnot contain any rich text
Comment 4 Jason Height 2006-07-28 03:23:25 UTC
This is now corrected in SVN (or has been for quite some time). Previously we
didnt understand rich text, so the exception below was used to append a space on
the send of the string to preserve its uniqueness.

Now that rich text handling is correct, that block of code is now gone ie we
dont use exceptions at that low level, hence the code is much faster.

Marking as fixed.


*** This bug has been marked as a duplicate of 25039 ***