POI has a fair amount of code that reads for(int i=0; i<getNumberOfStyles(); i++) { // do something } The call to getNumberOfStyles is called for every loop iteration. At best this is a lot of function calls. At worst, this is a lot of expensive function calls. We can pretty easily find cases where we can make POI faster by moving loop invariants (value doesn't change over the for loop, function doesn't have any side-effects) outside the loop. grep -r --exclude-dir=".svn" -P "for\s*\([^:\(]+\(\)[^:\)]+\)" This finds function calls within a for loop, excluding for-each loops. There are currently 514 instances. Some may be for-loops over iterators or functions with side-effects. The rest could probably be made faster without harming readability. There are likely expressions or functions that are re-evaluated with loops that could be pulled out of an inner loop or loops entirely, but this is a bit trickier to find with a reg ex.
Applied some changes (mostly to HSSF and XSSF classes) in r1751086.
More changes to SS, XSSF, POI util, and SignatureInfo in r1751131.
Updated VisioTextExtractor in r1751193
Updated PowerPointExtractor in r1756345
I am closing this as resolved for now, we should do some micro-benchmark to verify that more changes are actually improving execution speed, maybe the java vm already can optimize away most of the overhead anyway nowadays.