Bug 52834 - ColumnHelper.getColumnWidth does not scale to large numbers of rows, with large numbers of merged regions
Summary: ColumnHelper.getColumnWidth does not scale to large numbers of rows, with lar...
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.7-FINAL
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on: 58896
Blocks:
  Show dependency tree
 
Reported: 2012-03-06 02:02 UTC by Aron N
Modified: 2016-06-17 06:30 UTC (History)
0 users



Attachments
Patch to resolve issue against http://svn.apache.org/repos/asf/poi/tags/REL_3_7 (1.55 KB, patch)
2012-03-06 02:02 UTC, Aron N
Details | Diff
Equivalent patch for release 3.11 (REL_3_11_FINAL tag) (5.09 KB, patch)
2015-09-29 17:40 UTC, Aron N
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Aron N 2012-03-06 02:02:38 UTC
Created attachment 28423 [details]
Patch to resolve issue against http://svn.apache.org/repos/asf/poi/tags/REL_3_7

Attempting to run ColumnHelper.getColumnWidth on a large worksheet (>110,000 rows) with a large number of merged regions (>40,000) runs for a very long time (ran for an entire weekend before it was noticed and aborted).

This method contains a loop across merged regions nested within a loop across rows, and several substantial methods are being called within the inner loop, possibly on the order of a billion times in my example.

However, many of these time-consuming calls are retrieving static data in the context of this method, so the code can be refactored such that they're only run once (or once per row), and the results cached. The main problem calls are:

getNumMergedRegions()
getMergedRegion(i)
getRowNum()

With the code refactored thusly, the method ran within a few minutes.

A patch is attached with these suggested changes. The patch is based on the REL_3_7 tag.
Comment 1 Dominik Stadler 2015-09-29 13:34:28 UTC
As far as I see this method is not in ColumnHelper any more and XSSFSheet.getColumnWidth() does not handle merged regions specially, so I this seems to not be applicable any more. 

Please post an updated patch and if possible some accompanying unit tests if this is still a problem for you.
Comment 2 Aron N 2015-09-29 17:40:37 UTC
Created attachment 33152 [details]
Equivalent patch for release 3.11 (REL_3_11_FINAL tag)

This patch provides equivalent functionality to the old one targeting REL_3_7.
Comment 3 Aron N 2015-09-29 17:42:36 UTC
@Dominik - while the code has been refactored, the problem remains. Some time ago I created an equivalent patch against version 3.11; I thought I'd posted it here, but apparently not. Anyhow, I just did so.

Sorry, no relevant unit tests at this point.