Bug 59383

Summary: Performance regression: DataFormatter no longer caches formats
Product: POI Reporter: Nick C <fxfixer>
Component: SS CommonAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: regression    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: All   

Description Nick C 2016-04-26 20:32:02 UTC
When bug 58532 was completed the line of code that adds formats to the cache was removed. I noticed this has caused Tika to take twice as long when processing some excel files with lots of dates/numbers. https://github.com/apache/poi/commit/e966499ad270cb4be32faf44df304bef212df632#diff-485693a9e07b752e358b6ea116d26e02L313
Comment 1 Javen O'Neal 2016-04-26 22:54:01 UTC
getFormat caches data format in r1741114.

Skimming the code, createFormat does not cache data format. Would caching the format returned by createFormat improve the speed over previous builds? If not, should createFormat be static?
Comment 2 Nick C 2016-04-26 23:43:28 UTC
I patched my local copy and one excel file with over 400K rows with dates and numbers went from taking 1.5 minutes to 30ish seconds. Sadly when you have lots of large excel files it adds up.
Comment 3 Javen O'Neal 2016-04-27 01:07:42 UTC
Could you attach your patch?
Comment 4 Nick C 2016-04-27 19:46:02 UTC
The patch I had was the same as what you applied in r1741114. Thanks for making the fix so quickly.
Comment 5 Javen O'Neal 2016-05-08 03:19:45 UTC
Resolved per comment 1.
Updated changelog in r1742764.
Comment 6 Kai G 2016-05-09 07:33:48 UTC
Was a released poi version affected by this? Or was it only the current trunk?
Comment 7 Javen O'Neal 2016-05-09 15:46:47 UTC
The regression was introduced on 2015-10-25, so POI 3.14-beta1 through 3.15-beta1 were affected. Search for bug 58532 on https://poi.apache.org/changes.html