Bug 55882 - Severe performance degradation when writing large numbers of custom properties to .DOCX/.XLSX files
Summary: Severe performance degradation when writing large numbers of custom propertie...
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.8-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-13 20:30 UTC by Eric
Modified: 2015-08-20 19:18 UTC (History)
0 users



Attachments
NetBeans profile snapshot of test of standard POI library (208.42 KB, application/octet-stream)
2013-12-13 20:30 UTC, Eric
Details
NetBeans Profile snapshot of custom POI without cotains() check (18.80 KB, application/octet-stream)
2013-12-13 21:29 UTC, Eric
Details
Screen cap of profile showing contains timing (71.83 KB, image/jpeg)
2013-12-13 22:28 UTC, Eric
Details
Screen cap of profile showing nextPid timing (69.38 KB, image/jpeg)
2013-12-13 22:29 UTC, Eric
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric 2013-12-13 20:30:57 UTC
Created attachment 31112 [details]
NetBeans profile snapshot of test of standard POI library

When writing very large numbers of properties to the Custom Properties collection of new style office documents, the performance degrades badly.
A test of writing 3k properties to a .DOCX file took 400 seconds.

In my time trials, monitoring timing each batch of 25 property writes, the first batch of 25 entries took 0.028 seconds, the batch at the 1k mark took 1.1 seconds, the batch at 2k took 4.3 seconds, and the batch at 3K took 10.1 seconds.  I have reason to believe the size of the dataset may reach much larger than 3k.

When profiling the code, most of the time in adding a property was spent in org.apache.poi.POIXMLProperties.addProperty()->org.apache.poi.POIXMLProperties.add()->org.apache.poi.POIXMLProperties.cotains().  
I tested a custom compile of POI with the call to .contains() removed, most of the time was spent in org.apache.poi.POIXMLProperties$CustomPropertiex.nextPid(). Within that, most of the time was to java.util.AbstractList$Itr.hasNext() followed by java.util.AbstractList$Itr.next()

Another test of the custom POI library without the call to .contains() for 3k properties took roughly 200 seconds, or half the time of the standard library.

My test was a loop appending the loop counter number to both the base name and base value text to be used in the property, forex "prop_Name_0001", "prop_value_0001"
Comment 1 Eric 2013-12-13 21:29:23 UTC
Created attachment 31113 [details]
NetBeans Profile snapshot of custom POI without cotains() check

This profile snapshot was also "Profile All classes"
Comment 2 Eric 2013-12-13 22:28:50 UTC
Created attachment 31114 [details]
Screen cap of profile showing contains timing

The full profile file is too large to attach so this is a screen shot showing the timing of the .contains()
Comment 3 Eric 2013-12-13 22:29:49 UTC
Created attachment 31115 [details]
Screen cap of profile showing nextPid timing

the profile file is too large to attach so this screen cap shows the timing of the nextPid which is second to .contains() in time taken.
Comment 4 Andreas Beeker 2013-12-22 00:12:22 UTC
To solve this, there are two ways:
- either we introduce some kind of hash map caching of the underlying properties, and have to make sure that the map is synced when somebody changes something underneath
- or you call the getUnderlyingProperties() method and fill the underlying properties yourself

Although always iterating over props.getProperties().getPropertyList() is probably the reason for that bad performance, I'm a bit reluctant to implement a caching method here, as the second method is probably sufficient for your case and I don't need to worry about syncing.

Please correct me, if I'm totally wrong, but I think, that these many properties aren't usual and to move the logic into users code is ok.
Comment 5 Dominik Stadler 2015-08-20 19:18:02 UTC
No update on this one and it seems there is a viable workaround that can b applied in user-code here. Therefore I am closing this as WORKSFORME for now.