32286 – A file created by OpenOffice from a comma seperated field is very slow

Bug 32286 - A file created by OpenOffice from a comma seperated field is very slow

Summary: A file created by OpenOffice from a comma seperated field is very slow

Status:	RESOLVED DUPLICATE of bug 25039

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	HSSF (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-11-17 22:48 UTC by Ian Jackson
Modified:	2006-07-27 20:23 UTC (History)
CC List:	0 users

Attachments
The version created by open office (305.00 KB, application/octet-stream) 2004-11-17 22:49 UTC, Ian Jackson	Details
The orginal (123.98 KB, text/plain) 2004-11-17 22:51 UTC, Ian Jackson	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ian Jackson 2004-11-17 22:48:03 UTC

I created a file of comma seperated values and used open office to write an
excel file.

I found in smaller data sets with java hprof that over 30% of the time was spent
in SSTderializer.addToStringTable and what it called mostly creating the
exception in put. Nothing else was over 5%. The precentage of time in
addToStringTable increased as my data set got larger.
   static public void addToStringTable( BinaryTree strings, Integer integer,
UnicodeString string )
   {

       if ( string.isRichText() )
           string.setOptionFlags( (byte) ( string.getOptionFlags() & ( ~8 ) ) );
       if ( string.isExtendedText() )
           string.setOptionFlags( (byte) ( string.getOptionFlags() & ( ~4 ) ) );

       boolean added = false;
       while ( added == false )
       {
           try
           {
               strings.put( integer, string );
               added = true;
           }
           catch ( Exception ignore )
           {
               string.setString( string.getString() + " " );
           }
       }

   }


Of course, if you are really expecting the values might be the same a different
data struture should be used like a straight hash map.

Comment 1 Ian Jackson 2004-11-17 22:49:32 UTC

Created attachment 13486 [details]
The version created by open office

Comment 2 Ian Jackson 2004-11-17 22:51:32 UTC

Created attachment 13487 [details]
The orginal

I cut 100 lines off the file to create the 900 records

Comment 3 Ian Jackson 2004-11-17 22:52:20 UTC

It shouldnot contain any rich text

Comment 4 Jason Height 2006-07-28 03:23:25 UTC

This is now corrected in SVN (or has been for quite some time). Previously we
didnt understand rich text, so the exception below was used to append a space on
the send of the string to preserve its uniqueness.

Now that rich text handling is correct, that block of code is now gone ie we
dont use exceptions at that low level, hence the code is much faster.

Marking as fixed.

Jason

*** This bug has been marked as a duplicate of 25039 ***