Bug 54283

Summary: Very slow opening XSSF spreadsheets
Product: POI Reporter: Jan <jan.stette>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED WORKSFORME    
Severity: normal CC: vladk.dev, warwick_burrows
Priority: P2    
Version: 3.9-dev   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Screenshot from profiler, showing large number of calls when opening spreadsheet

Description Jan 2012-12-12 11:08:51 UTC
Created attachment 29746 [details]
Screenshot from profiler, showing large number of calls when opening spreadsheet

I'm seeing very slow load times for XSSF spreadsheets. One example spreadsheet takes several minutes to open, whereas the same spreadsheet same as an xls/HSSF opens in a fraction of a second.

This is about the same spreadsheets mentioned in bug 54282, but this one looks harder to fix.

Attached is a screenshot from a profiler session that highlights the problem. Basically, when opening a single spreadsheet that contains two sheets, there is behaviour that looks like it's O(N^3) with respects to the number of columns in the spreadsheet. The sequence is roughly as follows:

- 1 call to XSSFWorkbook.onDocumentRead()
- 2 calls to XSSFSheet.onDocumentRead()
<...>
- 2 calls to ColumnHelper.cleanColumns()
- 5,317 calls to ColumnHelper.addCleanColIntoCols()
- 5,317 calls to ColumnHelper.sortColumns()
- 7,812,463 calls to Xobj.find_element_user() 
- 8,243,994,339 calls to Xobj.isElem() and QName.equals().

There's a similar bottleneck that goes like this:

- 1 call to XSSFWorkbook.onDocumentRead()
- 2 calls to XSSFSheet.onDocumentRead()
- 2 calls to ColumnHelper.cleanColumns()
- 5,317 calls to ColumnHelper.addCleanColIntoCols()
- 7,807,146 calls to ColumHelper.getColArray()
- 8,243,994,339 calls to Xobj.isElem() and QName.equals().

I realise that this code bottoms out in in XMLBeans so maybe this is partly an issue there. I did find this bug report on XMLBeans which sounds relevant, but it's been open for a couple of years: https://issues.apache.org/jira/browse/XMLBEANS-438

Still, I wonder if there's something that could be done in the POI code to avoid hitting the XMLBeans data structures so hard. As it is, it unfortunately renders the API unusable for certain spreadsheets.
Comment 1 vladk 2013-09-05 12:14:19 UTC
rolled back to version 3.9-dev that was changed accidentally
Comment 2 Dominik Stadler 2014-08-31 19:07:46 UTC
Any chance you could attach a spreadsheet that is taking a long time to load for you?
Comment 3 Dominik Stadler 2015-09-05 14:59:42 UTC
There have been some performance-fixes regarding XMLBeans lists, likely this also made the case described here run faster.

Also no response for a long time, therefore I am closing this as WORKSFORME for now, please reopen if you can attach a sample file which shows the problem.