Bug 52086 - [PATCH] SSTRecord.serialize() performance improvement patch for huge hssf output
Summary: [PATCH] SSTRecord.serialize() performance improvement patch for huge hssf output
Status: NEEDINFO
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.7-FINAL
Hardware: PC All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords: PatchAvailable
Depends on: 52084
Blocks:
  Show dependency tree
 
Reported: 2011-10-25 14:09 UTC by Shoji KUZUKAMI
Modified: 2015-10-29 09:11 UTC (History)
0 users



Attachments
zip file containing patch and chart image (23.90 KB, application/x-zip-compressed)
2011-10-25 14:09 UTC, Shoji KUZUKAMI
Details
verification of performance (21.58 KB, application/pdf)
2012-02-22 15:48 UTC, Shoji KUZUKAMI
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shoji KUZUKAMI 2011-10-25 14:09:57 UTC
Created attachment 27844 [details]
zip file containing patch and chart image

This bug report is an extracted abstract from the mail 'SSTRecord.serialize()
performance improvement patch for huge hssf output' in the 'dev@poi.apache.org'
ML.
Please use the 'patch' program to patch the files, such as 'patch -p 0
< HOGE.patch'.
I tried to use eclipse to patch them, but I found that any trial will fail.

---
This patch contains some output performance hacks
around the packages of org.apache.poi.hssf.record.cont and
org.apache.poi.util.
The patch provides both 2~4x performance improvement and some conveniences
in the serialization of SST.

The essential feature of patch is to extend the LittleEndianOutput (
and the implementation classes )
for itself (themselves) to write out the String in the both formats of
ASCII and UTF16LE.
This extension internalizes the frequent polymorphism calls of
UnknownLengthRecordOutput#writeShort() or writeByte()
in the  ContinuableRecord#writeCharacterData().
The call internalization enables the jvm to avoid the polymorphism
cost along the technique of code inlining per class .

Furthermore, the template adapters of this extension are provided by
LittleEndianOutputAdapter, LittleEndianOutputByteStreamAdatper and
LittleEndianOutputFilterAdapter
to ease to build up the implementation class of LittleEndianOutput.
By using the class tree, I implemented
LittleEndianOutputBufferedRandomAccessFile for the performance check
needs,
which uses the random access file coupled with buffers as the output
destination and
also supports the DelayableLittleEndianOutput interface.

The features of patch can be enabled and disabled by flipping the two
boolean flags of
ContinuableRecordOutput.useFasterWrite and
UnknownLengthRecordOutput.useFasterWrite.
The performances at changing these flags are investigated below.

You can verify the correctness of this patch
by running the test 'testSSTRecord_DigestCheck()' contained in the
previous 'first-sstser-verify37.patch'
for each  example serialization method.
The example methods are Memory, DirectRandomAccessFile, StreamFile and
 LZFCompressFile in 'SerializationFunction'.
Please be careful to use the LZFCompressFile serialization method because
it requires the compress-lzf-0.8.4.jar or upper which can be fetched
from the maven repository.
See http://www.jarvana.com/jarvana/archive-details/com/ning/compress-lzf/0.8.4/compress-lzf-0.8.4.jar
for the more details of jar.

The performance of this patch is investigated in the table below by
invoking the newly created method
TestSSTRecord.testSSTRecordPerformance()
with the small code change from 'int N = 1<<10' to 'int N= 1<<20'
under the jdk(1.6.0_26) of option '-Xmx1224m -server'.
The elapsed time of 2^20 SSTReocrds serialization in seconds are
measured 60 times.
Then, the statistics is calculated by excluding 40 extreme
measurements for each serialization, avoiding ill measurements.
The value and plus minus sign represent the mean and the standard
deviation of serialization time.


java: oracle jdk 1.6.0_26
option: -Xmx1224m -server
cpu: Intel core i5-2400
OS: windows 7

Optimization    enum_SerializeFunction  Mean Time               Standard Deviation
---     ---     ---     --      ---
U@T/C@T Memory  0.248   +-      0.002   secs
U@T/C@T DirectRandomAccessFile  0.94    +-      0.024   secs
U@T/C@T StreamFile      0.936   +-      0.072   secs
U@T/C@T LZFCompressFile 0.362   +-      0.002   secs
---     ---     ---     --      ---
U@F/C@T Memory  0.213   +-      0.002   secs
U@F/C@T DirectRandomAccessFile  0.881   +-      0.046   secs
U@F/C@T StreamFile      0.827   +-      0.039   secs
U@F/C@T LZFCompressFile 0.438   +-      0.004   secs
---     ---     ---     --      ---
U@T/C@F Memory  0.744   +-      0.001   secs
U@T/C@F DirectRandomAccessFile  0.939   +-      0.029   secs
U@T/C@F StreamFile      0.901   +-      0.031   secs
U@T/C@F LZFCompressFile 0.658   +-      0.002   secs
---     ---     ---     --      ---
U@F/C@F Memory  1.011   +-      0.005   secs
U@F/C@F DirectRandomAccessFile  1.29    +-      0.003   secs
U@F/C@F StreamFile      0.837   +-      0.039   secs
U@F/C@F LZFCompressFile 0.902   +-      0.003   secs
---     ---     ---     --      ---
MANUAL_HACK      Memory 0.237   +-      0.002   secs
MANUAL_HACK      DirectRandomAccessFile 1.174   +-      0.094   secs
MANUAL_HACK        StreamFile   0.806   +-      0.042   secs
MANUAL_HACK      LZFCompressFile        0.384   +-      0.002   secs

Please see the 'ser_perf.png' image attached, which
contains a chart of the mean time for each serialize method in the
optimizations.
The 'U@[TF]/C@[TF]' of 'Optimization' column indicates whether the
flags of UnknownLengthRecordOutput.useFasterWrite and
ContinuableRecordOutput.useFasterWrite
are TRUE or FALSE.
In this case, the 'U@F/C@F' is identical to the original method.
The values of 'MANUAL' rows mean the ones in which  the current poi
class trees  around the SSTSerializer are fully refactored and
optimized as well as possible.
#Note.. The 'MANUAL' code is not included in this patch because the
class tree is too much changed.

From the table and chart,
I concluded that  the method
'ContinuableRecordOutput.useFasterWrite=true' and
'UnknownLengthRecordOutput.useFasterWrite=true'
is 2~4 times faster than the original method in the cpu dependent
cases such as the Memory and LZFCompressedFile.
Furthermore, from the result of full manual hack, the performance is
reasonable and optimal for the small source code changes of patch.

On the other hand, the worse performances in the disk dependent cases,
such as DirectRandomAccessFile and StreamFile,  are required to be
fixed.
Nevertheless  the patch cannot solve the cases.
In my experience, the disk resource are more limited in a server
processing than the cpu resource
although it is required to improve the disk performance in the disk
dependent cases.
These facts indicate that the compressed writer such as
LZFCompressedFile is necessary to improve the throughput of huge
'.xls's in a server.
Comment 1 Yegor Kozlov 2011-10-31 07:27:15 UTC
Thanks for the patch. I put it in my TODO list, but it will take some time to review.

Regards,
Yegor
Comment 2 Yegor Kozlov 2012-01-14 13:18:28 UTC
Finally I had time to review this patch, thanks for your patience. 

I made a small change to initialize the useFasterWrite from a system property:

private static final boolean useFasterWrite = Boolean.getBoolean("org.apache.poi.sstFastWrite");

this way I can test both modes without re-compiling the code. 

The patch does improve performance but not that much as in your tests. In the best case I got 25% faster which is far from "2~4x performance improvement" observed by you. 

In my tests I ran TestSSTRecord#testSSTRecordPerformance() three times in two sets, either with org.apache.poi.sstFastWrite=true or org.apache.poi.sstFastWrite=false.

Below is the console output:

-Dorg.apache.poi.sstFastWrite=true
serializer	Memory	 time	0.328	+-	0.003	secs
serializer	Memory	 time	0.302	+-	0.004	secs
serializer	Memory	 time	0.319	+-	0.001	secs

-Dorg.apache.poi.sstFastWrite=false
serializer	Memory	 time	0.381	+-	0.002	secs
serializer	Memory	 time	0.364	+-	0.004	secs
serializer	Memory	 time	0.379	+-	0.001	secs


My test environment:

java: oracle jdk 1.6.0_29 64 bit
option: -Xmx1224m -server
cpu: Intel core i5-2400
OS: windows 7 64bit, 8GB RAM
size of SST: 1<<20
serializer function: Memory

If the performance gain is only 25% then I would stay with current code and not made such big changes. 
Also, can you provide some high-level tests that show how performance improves when saving real .xls files. How much does SST serialization take from the total time spent in workbook.write() ?  
 

Regards,
Yegor
Comment 3 Yegor Kozlov 2012-02-22 12:17:40 UTC
I'm changing the status to NEEDINFO until my questions are answered.

Yegor
Comment 4 Shoji KUZUKAMI 2012-02-22 15:45:35 UTC
I'll attach a comprehensive test document for this patch performance.
All of the tests on the document is based on the POI-3.7 release.
Although the situations of mine and yours are not identical,
the performance seems to improve 2x~4x by my patch  independent to some jvms and cpus of 32bit or 64bit.


(In reply to comment #2)
> Finally I had time to review this patch, thanks for your patience. 
> 
> I made a small change to initialize the useFasterWrite from a system property:
> 
> private static final boolean useFasterWrite =
> Boolean.getBoolean("org.apache.poi.sstFastWrite");
> 
> this way I can test both modes without re-compiling the code. 
> 
> The patch does improve performance but not that much as in your tests. In the
> best case I got 25% faster which is far from "2~4x performance improvement"
> observed by you. 
> 
> In my tests I ran TestSSTRecord#testSSTRecordPerformance() three times in two
> sets, either with org.apache.poi.sstFastWrite=true or
> org.apache.poi.sstFastWrite=false.
> 
> Below is the console output:
> 
> -Dorg.apache.poi.sstFastWrite=true
> serializer    Memory     time    0.328    +-    0.003    secs
> serializer    Memory     time    0.302    +-    0.004    secs
> serializer    Memory     time    0.319    +-    0.001    secs
> 
> -Dorg.apache.poi.sstFastWrite=false
> serializer    Memory     time    0.381    +-    0.002    secs
> serializer    Memory     time    0.364    +-    0.004    secs
> serializer    Memory     time    0.379    +-    0.001    secs
> 
> 
> My test environment:
> 
> java: oracle jdk 1.6.0_29 64 bit
> option: -Xmx1224m -server
> cpu: Intel core i5-2400
> OS: windows 7 64bit, 8GB RAM
> size of SST: 1<<20
> serializer function: Memory
> 
> If the performance gain is only 25% then I would stay with current code and not
> made such big changes. 
> Also, can you provide some high-level tests that show how performance improves
> when saving real .xls files. How much does SST serialization take from the
> total time spent in workbook.write() ?  
> 
> 
> Regards,
> Yegor

(In reply to comment #3)
> I'm changing the status to NEEDINFO until my questions are answered.
> 
> Yegor

(In reply to comment #2)
> Finally I had time to review this patch, thanks for your patience. 
> 
> I made a small change to initialize the useFasterWrite from a system property:
> 
> private static final boolean useFasterWrite =
> Boolean.getBoolean("org.apache.poi.sstFastWrite");
> 
> this way I can test both modes without re-compiling the code. 
> 
> The patch does improve performance but not that much as in your tests. In the
> best case I got 25% faster which is far from "2~4x performance improvement"
> observed by you. 
> 
> In my tests I ran TestSSTRecord#testSSTRecordPerformance() three times in two
> sets, either with org.apache.poi.sstFastWrite=true or
> org.apache.poi.sstFastWrite=false.
> 
> Below is the console output:
> 
> -Dorg.apache.poi.sstFastWrite=true
> serializer    Memory     time    0.328    +-    0.003    secs
> serializer    Memory     time    0.302    +-    0.004    secs
> serializer    Memory     time    0.319    +-    0.001    secs
> 
> -Dorg.apache.poi.sstFastWrite=false
> serializer    Memory     time    0.381    +-    0.002    secs
> serializer    Memory     time    0.364    +-    0.004    secs
> serializer    Memory     time    0.379    +-    0.001    secs
> 
> 
> My test environment:
> 
> java: oracle jdk 1.6.0_29 64 bit
> option: -Xmx1224m -server
> cpu: Intel core i5-2400
> OS: windows 7 64bit, 8GB RAM
> size of SST: 1<<20
> serializer function: Memory
> 
> If the performance gain is only 25% then I would stay with current code and not
> made such big changes. 
> Also, can you provide some high-level tests that show how performance improves
> when saving real .xls files. How much does SST serialization take from the
> total time spent in workbook.write() ?  
> 
> 
> Regards,
> Yegor
Comment 5 Shoji KUZUKAMI 2012-02-22 15:48:13 UTC
Created attachment 28363 [details]
verification of performance
Comment 6 Yegor Kozlov 2012-02-27 08:36:17 UTC
Thanks for the comprehensive report. It appears that the observed results depend on how you run the test: from IDE or from ant. I ran my tests from IDE and got only 25%, you ran from Ant and got a "2~4x" improvement .

I still want to see how this patch affects performance when saving real excel documents. The test that was used to generate the report is too "in vitro" : it does not tell how much time serialization of SST takes in comparison with total time spent in workbook.write(OutputStream). 

Yegor