Bug 61034 - Call to XSSFReader.getSheetsData() returns duplicate sheets
Summary: Call to XSSFReader.getSheetsData() returns duplicate sheets
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.16-FINAL
Hardware: PC Mac OS X 10.1
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-24 22:05 UTC by Mauricio Eastmond
Modified: 2017-05-05 14:04 UTC (History)
0 users



Attachments
use this xlsx file to reproduce the issue. (32.01 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2017-04-24 22:05 UTC, Mauricio Eastmond
Details
Proposed fix for XSSFReader.XMLSheetRefReader (675 bytes, patch)
2017-04-29 11:25 UTC, Sebastian Wikalinski
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mauricio Eastmond 2017-04-24 22:05:30 UTC
Created attachment 34947 [details]
use this xlsx file to reproduce the issue.

Call to XSSFReader.getSheetsData() returns duplicate sheets.

The attached xlsx file contains 6 sheets: Sheet1, Sheet2, ..., Sheet6.

The call to XSSFReader.getSheetsData() should return the 6 sheets in the iterator but it is returning 12 sheets. Each sheet is duplicated.

Steps to reproduce:

Run this code using the attached xlsx file:

OPCPackage p = OPCPackage.open(sourceFilePath);
XSSFReader reader = new XSSFReader(p);
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) reader.getSheetsData();
while (iter.hasNext()) {
  InputStream stream = iter.next();
  String sheetName = iter.getSheetName();
  stream.close();
  System.out.println(sheetName);
}

The output is:
Sheet1
Sheet1
Sheet2
Sheet2
Sheet3
Sheet3
Sheet4
Sheet4
Sheet5
Sheet5
Sheet6
Sheet6

The expected output is:
Sheet1
Sheet2
Sheet3
Sheet4
Sheet5
Sheet6
Comment 1 Sebastian Wikalinski 2017-04-29 11:25:45 UTC
Created attachment 34964 [details]
Proposed fix for XSSFReader.XMLSheetRefReader

Hi,

We've run into the same issue with the new version of the library. The problem seems to be caused only by those xlsx files which have a specific order of the attributes inside the <sheet> tag of workbook.xml 

Example (which causes the problems):
<sheet name="Sheet6" r:id="rId6" sheetId="4"/>

While this one works correctly:
<sheet name="Sheet6" sheetId="4" r:id="rId6"/>

I've traced the root cause to a possible coding error in XMLSheetRefReader, for which I'm now providing a patch. I haven't tested it very thoroughly, but it seems to fix the problem.
Comment 2 Javen O'Neal 2017-04-29 17:28:58 UTC
Mauricio, thanks for the Excel test file, test case!
Sebastian, thanks for the fix!

Applied in r1793223. Will be included in POI 3.17-beta1.

https://svn.apache.org/viewvc?view=revision&revision=1793223
Comment 3 Tim Allison 2017-05-05 14:04:55 UTC
Sorry, all.  1) shouldn't have introduced this and 2) should have caught this with tika-eval...argh.

Thank you for reporting this and supplying a patch.  Thank you, Javen!