Bug 61722

Summary: Review ways to avoid new Integer()/Integer.valueOf() in performance sensitive places
Product: POI Reporter: Dominik Stadler <dominik.stadler>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED WONTFIX    
Severity: enhancement    
Priority: P2    
Version: 4.0.x-dev   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Bug Depends on: 57840    
Bug Blocks:    

Description Dominik Stadler 2017-11-03 22:21:08 UTC
We did some changes in bug #57840 to use "new Integer()" instead of "Integer.valueOf()", because the caching for reduced memory use in "valueOf()" comes with a slight CPU penalty, see r1747837.

However later on, these were reverted again as some code cleanup tools flag these as "sub-optimal" and suggest to change them to "valueOf()". 

Before re-applying this again, let's do some research if there are better ways to do this as it seems we create lots of objects, mostly to look up values in maps. 

Maybe we should use something like Trove Maps which allows to use primitives in maps directly without the cost of converting to objects, see https://bitbucket.org/trove4j/trove for details.

See the performance test added in r1747878 for a starting point of an evaluation of the effect of changes.
Comment 1 Greg Woolsey 2017-11-03 23:32:18 UTC
This article[1] has an interesting set of tests, results, and summary info for multiple primitive collections frameworks, including Trove.

Most interesting points to me:
 * Trove isn't under active development, and hasn't been for a few years.  It was also slowest.
 * Fastutil is the overall winner for performance, although memory wasn't a consideration in these tests.
 * Fastutil is a 28M JAR file (!)

If we are only interested in Map<int, ? extends Object> type maps, perhaps a slimmed-down version of a stable build of Fastutil could be used.  It appears license compatible, and we could investigate an AutoJar or similar step added to the build to create an artifact bundled/released with POI.

[1] http://java-performance.info/hashmap-overview-jdk-fastutil-goldman-sachs-hppc-koloboke-trove-january-2015/
Comment 2 Dominik Stadler 2017-11-04 09:10:45 UTC
FYI, some quick and dirty test with the file from bug #57840, not representative as not done in clean environment:
* Origin: 3m35s
* With new Integer(): 3m22s
* With Trove and new Integer(): 2m28s
Comment 3 Dominik Stadler 2018-12-27 21:32:32 UTC
The difference between new Integer() and Integer.valueOf() were very small.

Collections like Trove could provide improvements in cases where very huge documents are handled, however I don't plan to invest more effort there for now.
Comment 4 Greg Woolsey 2019-01-03 00:07:52 UTC
My previous tests were with Java 6. With Java 8 (starting with 7, actually) things are a bit different, and I've found even better performance by increasing the size of the Integer cache, either via system property

-Djava.lang.Integer.IntegerCache.high=<size>

or JVM setting:

-XX:AutoBoxCacheMax=<size>

this makes Integer.valueOf(int) the fastest option.  I use a value greater than my expected maximum row count, typically 50000 to 100000, although I've not noticed any negative impacts of using 1000000 other than the expected time to load and memory to cache those simple objects.

I think rather than introduce a new dependency we stick with Integer.valueOf(int) now that we've moved to Java 8.