Bug 61034

Summary: Call to XSSFReader.getSheetsData() returns duplicate sheets
Product: POI Reporter: Mauricio Eastmond <mauricio.eastmond>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 3.16-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X 10.1   
Attachments: use this xlsx file to reproduce the issue.
Proposed fix for XSSFReader.XMLSheetRefReader

Description Mauricio Eastmond 2017-04-24 22:05:30 UTC
Created attachment 34947 [details]
use this xlsx file to reproduce the issue.

Call to XSSFReader.getSheetsData() returns duplicate sheets.

The attached xlsx file contains 6 sheets: Sheet1, Sheet2, ..., Sheet6.

The call to XSSFReader.getSheetsData() should return the 6 sheets in the iterator but it is returning 12 sheets. Each sheet is duplicated.

Steps to reproduce:

Run this code using the attached xlsx file:

OPCPackage p = OPCPackage.open(sourceFilePath);
XSSFReader reader = new XSSFReader(p);
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) reader.getSheetsData();
while (iter.hasNext()) {
  InputStream stream = iter.next();
  String sheetName = iter.getSheetName();
  stream.close();
  System.out.println(sheetName);
}

The output is:
Sheet1
Sheet1
Sheet2
Sheet2
Sheet3
Sheet3
Sheet4
Sheet4
Sheet5
Sheet5
Sheet6
Sheet6

The expected output is:
Sheet1
Sheet2
Sheet3
Sheet4
Sheet5
Sheet6
Comment 1 Sebastian Wikalinski 2017-04-29 11:25:45 UTC
Created attachment 34964 [details]
Proposed fix for XSSFReader.XMLSheetRefReader

Hi,

We've run into the same issue with the new version of the library. The problem seems to be caused only by those xlsx files which have a specific order of the attributes inside the <sheet> tag of workbook.xml 

Example (which causes the problems):
<sheet name="Sheet6" r:id="rId6" sheetId="4"/>

While this one works correctly:
<sheet name="Sheet6" sheetId="4" r:id="rId6"/>

I've traced the root cause to a possible coding error in XMLSheetRefReader, for which I'm now providing a patch. I haven't tested it very thoroughly, but it seems to fix the problem.
Comment 2 Javen O'Neal 2017-04-29 17:28:58 UTC
Mauricio, thanks for the Excel test file, test case!
Sebastian, thanks for the fix!

Applied in r1793223. Will be included in POI 3.17-beta1.

https://svn.apache.org/viewvc?view=revision&revision=1793223
Comment 3 Tim Allison 2017-05-05 14:04:55 UTC
Sorry, all.  1) shouldn't have introduced this and 2) should have caught this with tika-eval...argh.

Thank you for reporting this and supplying a patch.  Thank you, Javen!