|Summary:||Performance regression: DataFormatter no longer caches formats|
|Product:||POI||Reporter:||Nick C <fxfixer>|
|Component:||SS Common||Assignee:||POI Developers List <dev>|
Description Nick C 2016-04-26 20:32:02 UTC
When bug 58532 was completed the line of code that adds formats to the cache was removed. I noticed this has caused Tika to take twice as long when processing some excel files with lots of dates/numbers. https://github.com/apache/poi/commit/e966499ad270cb4be32faf44df304bef212df632#diff-485693a9e07b752e358b6ea116d26e02L313
Comment 1 Javen O'Neal 2016-04-26 22:54:01 UTC
getFormat caches data format in r1741114. Skimming the code, createFormat does not cache data format. Would caching the format returned by createFormat improve the speed over previous builds? If not, should createFormat be static?
Comment 2 Nick C 2016-04-26 23:43:28 UTC
I patched my local copy and one excel file with over 400K rows with dates and numbers went from taking 1.5 minutes to 30ish seconds. Sadly when you have lots of large excel files it adds up.
Comment 3 Javen O'Neal 2016-04-27 01:07:42 UTC
Could you attach your patch?
Comment 4 Nick C 2016-04-27 19:46:02 UTC
The patch I had was the same as what you applied in r1741114. Thanks for making the fix so quickly.
Comment 5 Javen O'Neal 2016-05-08 03:19:45 UTC
Comment 6 Kai G 2016-05-09 07:33:48 UTC
Was a released poi version affected by this? Or was it only the current trunk?