Summary: | [PATCH] SSTRecord.serialize() performance improvement patch for huge hssf output | ||
---|---|---|---|
Product: | POI | Reporter: | Shoji KUZUKAMI <kuz+poi> |
Component: | HSSF | Assignee: | POI Developers List <dev> |
Status: | NEEDINFO --- | ||
Severity: | enhancement | Keywords: | PatchAvailable |
Priority: | P2 | ||
Version: | 3.7-FINAL | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All | ||
Bug Depends on: | 52084 | ||
Bug Blocks: | |||
Attachments: |
zip file containing patch and chart image
verification of performance |
Description
Shoji KUZUKAMI
2011-10-25 14:09:57 UTC
Thanks for the patch. I put it in my TODO list, but it will take some time to review. Regards, Yegor Finally I had time to review this patch, thanks for your patience. I made a small change to initialize the useFasterWrite from a system property: private static final boolean useFasterWrite = Boolean.getBoolean("org.apache.poi.sstFastWrite"); this way I can test both modes without re-compiling the code. The patch does improve performance but not that much as in your tests. In the best case I got 25% faster which is far from "2~4x performance improvement" observed by you. In my tests I ran TestSSTRecord#testSSTRecordPerformance() three times in two sets, either with org.apache.poi.sstFastWrite=true or org.apache.poi.sstFastWrite=false. Below is the console output: -Dorg.apache.poi.sstFastWrite=true serializer Memory time 0.328 +- 0.003 secs serializer Memory time 0.302 +- 0.004 secs serializer Memory time 0.319 +- 0.001 secs -Dorg.apache.poi.sstFastWrite=false serializer Memory time 0.381 +- 0.002 secs serializer Memory time 0.364 +- 0.004 secs serializer Memory time 0.379 +- 0.001 secs My test environment: java: oracle jdk 1.6.0_29 64 bit option: -Xmx1224m -server cpu: Intel core i5-2400 OS: windows 7 64bit, 8GB RAM size of SST: 1<<20 serializer function: Memory If the performance gain is only 25% then I would stay with current code and not made such big changes. Also, can you provide some high-level tests that show how performance improves when saving real .xls files. How much does SST serialization take from the total time spent in workbook.write() ? Regards, Yegor I'm changing the status to NEEDINFO until my questions are answered. Yegor I'll attach a comprehensive test document for this patch performance. All of the tests on the document is based on the POI-3.7 release. Although the situations of mine and yours are not identical, the performance seems to improve 2x~4x by my patch independent to some jvms and cpus of 32bit or 64bit. (In reply to comment #2) > Finally I had time to review this patch, thanks for your patience. > > I made a small change to initialize the useFasterWrite from a system property: > > private static final boolean useFasterWrite = > Boolean.getBoolean("org.apache.poi.sstFastWrite"); > > this way I can test both modes without re-compiling the code. > > The patch does improve performance but not that much as in your tests. In the > best case I got 25% faster which is far from "2~4x performance improvement" > observed by you. > > In my tests I ran TestSSTRecord#testSSTRecordPerformance() three times in two > sets, either with org.apache.poi.sstFastWrite=true or > org.apache.poi.sstFastWrite=false. > > Below is the console output: > > -Dorg.apache.poi.sstFastWrite=true > serializer Memory time 0.328 +- 0.003 secs > serializer Memory time 0.302 +- 0.004 secs > serializer Memory time 0.319 +- 0.001 secs > > -Dorg.apache.poi.sstFastWrite=false > serializer Memory time 0.381 +- 0.002 secs > serializer Memory time 0.364 +- 0.004 secs > serializer Memory time 0.379 +- 0.001 secs > > > My test environment: > > java: oracle jdk 1.6.0_29 64 bit > option: -Xmx1224m -server > cpu: Intel core i5-2400 > OS: windows 7 64bit, 8GB RAM > size of SST: 1<<20 > serializer function: Memory > > If the performance gain is only 25% then I would stay with current code and not > made such big changes. > Also, can you provide some high-level tests that show how performance improves > when saving real .xls files. How much does SST serialization take from the > total time spent in workbook.write() ? > > > Regards, > Yegor (In reply to comment #3) > I'm changing the status to NEEDINFO until my questions are answered. > > Yegor (In reply to comment #2) > Finally I had time to review this patch, thanks for your patience. > > I made a small change to initialize the useFasterWrite from a system property: > > private static final boolean useFasterWrite = > Boolean.getBoolean("org.apache.poi.sstFastWrite"); > > this way I can test both modes without re-compiling the code. > > The patch does improve performance but not that much as in your tests. In the > best case I got 25% faster which is far from "2~4x performance improvement" > observed by you. > > In my tests I ran TestSSTRecord#testSSTRecordPerformance() three times in two > sets, either with org.apache.poi.sstFastWrite=true or > org.apache.poi.sstFastWrite=false. > > Below is the console output: > > -Dorg.apache.poi.sstFastWrite=true > serializer Memory time 0.328 +- 0.003 secs > serializer Memory time 0.302 +- 0.004 secs > serializer Memory time 0.319 +- 0.001 secs > > -Dorg.apache.poi.sstFastWrite=false > serializer Memory time 0.381 +- 0.002 secs > serializer Memory time 0.364 +- 0.004 secs > serializer Memory time 0.379 +- 0.001 secs > > > My test environment: > > java: oracle jdk 1.6.0_29 64 bit > option: -Xmx1224m -server > cpu: Intel core i5-2400 > OS: windows 7 64bit, 8GB RAM > size of SST: 1<<20 > serializer function: Memory > > If the performance gain is only 25% then I would stay with current code and not > made such big changes. > Also, can you provide some high-level tests that show how performance improves > when saving real .xls files. How much does SST serialization take from the > total time spent in workbook.write() ? > > > Regards, > Yegor Created attachment 28363 [details]
verification of performance
Thanks for the comprehensive report. It appears that the observed results depend on how you run the test: from IDE or from ant. I ran my tests from IDE and got only 25%, you ran from Ant and got a "2~4x" improvement . I still want to see how this patch affects performance when saving real excel documents. The test that was used to generate the report is too "in vitro" : it does not tell how much time serialization of SST takes in comparison with total time spent in workbook.write(OutputStream). Yegor |