Bug 48425

Summary: DateUtil.isCellDateFormatted() method is slow
Product: POI Reporter: Jan <jan.stette>
Component: POI OverallAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 3.6-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Jan 2009-12-21 05:18:30 UTC
I have done some performance testing for code reading data from large spreadsheets using POI.  In this use case, I found that half of the CPU time was spent in a single method in POI: DateUtil.isCellDateFormatted(cell).  We call this method every time we extract a value from a cell in order to correctly create Date objects when cells contain dates.

Looking at this method, it spends most of its time in DateUtil.isADateFormat().  This method is very slow, as it performs seven regular expression substitutions on the formatString parameter and one additional regex match.  None of the regexes are precompiled, so they're all compiled on every call to this method.

I would suggest replacing the first five regexes with calls to a string substitution method that doesn't require regexes, as they are simple replacements.  For the remaining three regexes, I would suggest precompiling them instead of just calling String.replaceAll() and String.matches().
Comment 1 Yegor Kozlov 2009-12-22 00:03:45 UTC
A good catch, thanks. 

As you suggested, I replaced the first five regexes with a loop collecting characters into a buffer. The remaining three regexes are pre-compiled at class initialization time. 

In my benchmark I measured the number of calls to DateUtil.isCellDateFormatted() made in ten seconds. The reworked code is significantly faster: the throughput is at least five times greater. 

I committed the fix in r893105. 

Comment 2 Jan 2009-12-22 02:12:24 UTC
Great, thanks for the very quick response!