Bug 60325 - Poor performance in DirectoryNode.createDocument() for NPOIFSFileSystem
Summary: Poor performance in DirectoryNode.createDocument() for NPOIFSFileSystem
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: 3.15-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-01 04:40 UTC by Luke Quinane
Modified: 2019-03-10 20:46 UTC (History)
1 user (show)



Attachments
Sample project which compares NPOIFSFileSystem and OPOIFSFileSystem (87.86 KB, application/x-7z-compressed)
2016-11-01 04:40 UTC, Luke Quinane
Details
Screenshot from JVisualVM sampling (159.93 KB, image/png)
2017-03-19 11:50 UTC, Dominik Stadler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Quinane 2016-11-01 04:40:07 UTC
Created attachment 34413 [details]
Sample project which compares NPOIFSFileSystem and OPOIFSFileSystem

When adding lots of documents entries to the file system the performance of the NPOIFSFileSystem implementation is ~10x slower than OPOIFSFileSystem.

The attached sample program is often stuck with stacks like this:
	  at java.nio.Buffer.<init>(Buffer.java:202)
	  at java.nio.ByteBuffer.<init>(ByteBuffer.java:281)
	  at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:70)
	  at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
	  at org.apache.poi.poifs.nio.ByteArrayBackedDataSource.read(ByteArrayBackedDataSource.java:49)
	  at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:484)
	  at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:169)
	  at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:142)
	  at org.apache.poi.poifs.filesystem.NPOIFSMiniStore.getBlockAt(NPOIFSMiniStore.java:71)
	  at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:169)
	  at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:142)
	  at org.apache.poi.poifs.filesystem.NDocumentInputStream.readFully(NDocumentInputStream.java:248)
	  at org.apache.poi.poifs.filesystem.NDocumentInputStream.read(NDocumentInputStream.java:150)
	  at org.apache.poi.poifs.filesystem.DocumentInputStream.read(DocumentInputStream.java:125)
	  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	  at java.io.BufferedInputStream.skip(BufferedInputStream.java:380)
	  - locked <0x345> (a java.io.BufferedInputStream)
	  at org.apache.poi.poifs.filesystem.NPOIFSDocument.store(NPOIFSDocument.java:126)
	  at org.apache.poi.poifs.filesystem.NPOIFSDocument.<init>(NPOIFSDocument.java:84)
	  at org.apache.poi.poifs.filesystem.DirectoryNode.createDocument(DirectoryNode.java:422)
	  at Test.copyAllEntries(Test.java:83)
	  at Test.copyAllEntries(Test.java:77)
	  at Test.main(Test.java:49)

This problem crops up while creating MSG files with lots of recipients because each recipient requires several document entries.
Comment 1 Javen O'Neal 2017-02-17 05:15:59 UTC
Do you have any suggestions on how to improve the NPOIFSFileSystem implementation?
Comment 2 Dominik Stadler 2017-03-19 11:33:46 UTC
Hm, the ByteBuffer.wrap() call seems unlikely to be the time-culprit as it just populates some members, I'd try to use some profiler or APM tool here to find what is actually using up the time.
Comment 3 Dominik Stadler 2017-03-19 11:48:44 UTC
A quick analysis points more into the direction of NPOIFSMiniStore.getBlockAt() because it iterates block-by-block via an Iterator<ByteBuffer>, for large documents the offset can be high (i.e. in your sample between 500 and 1000 times for each call) and thus there are many loop-iterations with many it.next() calls to StreamBlockByteBufferIterator which has to perform more work to do these steps. 

Unfortunately this is quite core to the class, so not easily replaced with something more performing as far as I see :(.
Comment 4 Dominik Stadler 2017-03-19 11:50:16 UTC
Created attachment 34845 [details]
Screenshot from JVisualVM sampling
Comment 5 Nick Burch 2017-03-20 23:13:19 UTC
If we're able to change how we open NPOIFS from files (see my thread on dev@), we might be able to mmap in all file cases, and from that we might be able to mmap bigger blocks

That may then allow us to change NPOIFSMiniStore to avoid quite as much wrapping/buffering of the mini blocks too,

(Note the "might" in the above - this is untested and just a guess!)