Bug 54283 - Very slow opening XSSF spreadsheets
Summary: Very slow opening XSSF spreadsheets
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.9-dev
Hardware: PC Linux
: P2 normal with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-12 11:08 UTC by Jan
Modified: 2015-09-05 14:59 UTC (History)
2 users (show)



Attachments
Screenshot from profiler, showing large number of calls when opening spreadsheet (212.11 KB, image/png)
2012-12-12 11:08 UTC, Jan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan 2012-12-12 11:08:51 UTC
Created attachment 29746 [details]
Screenshot from profiler, showing large number of calls when opening spreadsheet

I'm seeing very slow load times for XSSF spreadsheets. One example spreadsheet takes several minutes to open, whereas the same spreadsheet same as an xls/HSSF opens in a fraction of a second.

This is about the same spreadsheets mentioned in bug 54282, but this one looks harder to fix.

Attached is a screenshot from a profiler session that highlights the problem. Basically, when opening a single spreadsheet that contains two sheets, there is behaviour that looks like it's O(N^3) with respects to the number of columns in the spreadsheet. The sequence is roughly as follows:

- 1 call to XSSFWorkbook.onDocumentRead()
- 2 calls to XSSFSheet.onDocumentRead()
<...>
- 2 calls to ColumnHelper.cleanColumns()
- 5,317 calls to ColumnHelper.addCleanColIntoCols()
- 5,317 calls to ColumnHelper.sortColumns()
- 7,812,463 calls to Xobj.find_element_user() 
- 8,243,994,339 calls to Xobj.isElem() and QName.equals().

There's a similar bottleneck that goes like this:

- 1 call to XSSFWorkbook.onDocumentRead()
- 2 calls to XSSFSheet.onDocumentRead()
- 2 calls to ColumnHelper.cleanColumns()
- 5,317 calls to ColumnHelper.addCleanColIntoCols()
- 7,807,146 calls to ColumHelper.getColArray()
- 8,243,994,339 calls to Xobj.isElem() and QName.equals().

I realise that this code bottoms out in in XMLBeans so maybe this is partly an issue there. I did find this bug report on XMLBeans which sounds relevant, but it's been open for a couple of years: https://issues.apache.org/jira/browse/XMLBEANS-438

Still, I wonder if there's something that could be done in the POI code to avoid hitting the XMLBeans data structures so hard. As it is, it unfortunately renders the API unusable for certain spreadsheets.
Comment 1 vladk 2013-09-05 12:14:19 UTC
rolled back to version 3.9-dev that was changed accidentally
Comment 2 Dominik Stadler 2014-08-31 19:07:46 UTC
Any chance you could attach a spreadsheet that is taking a long time to load for you?
Comment 3 Dominik Stadler 2015-09-05 14:59:42 UTC
There have been some performance-fixes regarding XMLBeans lists, likely this also made the case described here run faster.

Also no response for a long time, therefore I am closing this as WORKSFORME for now, please reopen if you can attach a sample file which shows the problem.