Bug 59383 - Performance regression: DataFormatter no longer caches formats
Summary: Performance regression: DataFormatter no longer caches formats
Alias: None
Product: POI
Classification: Unclassified
Component: SS Common (show other bugs)
Version: unspecified
Hardware: PC All
: P2 regression (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2016-04-26 20:32 UTC by Nick C
Modified: 2016-05-09 15:46 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Nick C 2016-04-26 20:32:02 UTC
When bug 58532 was completed the line of code that adds formats to the cache was removed. I noticed this has caused Tika to take twice as long when processing some excel files with lots of dates/numbers. https://github.com/apache/poi/commit/e966499ad270cb4be32faf44df304bef212df632#diff-485693a9e07b752e358b6ea116d26e02L313
Comment 1 Javen O'Neal 2016-04-26 22:54:01 UTC
getFormat caches data format in r1741114.

Skimming the code, createFormat does not cache data format. Would caching the format returned by createFormat improve the speed over previous builds? If not, should createFormat be static?
Comment 2 Nick C 2016-04-26 23:43:28 UTC
I patched my local copy and one excel file with over 400K rows with dates and numbers went from taking 1.5 minutes to 30ish seconds. Sadly when you have lots of large excel files it adds up.
Comment 3 Javen O'Neal 2016-04-27 01:07:42 UTC
Could you attach your patch?
Comment 4 Nick C 2016-04-27 19:46:02 UTC
The patch I had was the same as what you applied in r1741114. Thanks for making the fix so quickly.
Comment 5 Javen O'Neal 2016-05-08 03:19:45 UTC
Resolved per comment 1.
Updated changelog in r1742764.
Comment 6 Kai G 2016-05-09 07:33:48 UTC
Was a released poi version affected by this? Or was it only the current trunk?
Comment 7 Javen O'Neal 2016-05-09 15:46:47 UTC
The regression was introduced on 2015-10-25, so POI 3.14-beta1 through 3.15-beta1 were affected. Search for bug 58532 on https://poi.apache.org/changes.html