When bug 58532 was completed the line of code that adds formats to the cache was removed. I noticed this has caused Tika to take twice as long when processing some excel files with lots of dates/numbers. https://github.com/apache/poi/commit/e966499ad270cb4be32faf44df304bef212df632#diff-485693a9e07b752e358b6ea116d26e02L313
getFormat caches data format in r1741114. Skimming the code, createFormat does not cache data format. Would caching the format returned by createFormat improve the speed over previous builds? If not, should createFormat be static?
I patched my local copy and one excel file with over 400K rows with dates and numbers went from taking 1.5 minutes to 30ish seconds. Sadly when you have lots of large excel files it adds up.
Could you attach your patch?
The patch I had was the same as what you applied in r1741114. Thanks for making the fix so quickly.
Resolved per comment 1. Updated changelog in r1742764.
Was a released poi version affected by this? Or was it only the current trunk?
The regression was introduced on 2015-10-25, so POI 3.14-beta1 through 3.15-beta1 were affected. Search for bug 58532 on https://poi.apache.org/changes.html