Created attachment 34947 [details] use this xlsx file to reproduce the issue. Call to XSSFReader.getSheetsData() returns duplicate sheets. The attached xlsx file contains 6 sheets: Sheet1, Sheet2, ..., Sheet6. The call to XSSFReader.getSheetsData() should return the 6 sheets in the iterator but it is returning 12 sheets. Each sheet is duplicated. Steps to reproduce: Run this code using the attached xlsx file: OPCPackage p = OPCPackage.open(sourceFilePath); XSSFReader reader = new XSSFReader(p); XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) reader.getSheetsData(); while (iter.hasNext()) { InputStream stream = iter.next(); String sheetName = iter.getSheetName(); stream.close(); System.out.println(sheetName); } The output is: Sheet1 Sheet1 Sheet2 Sheet2 Sheet3 Sheet3 Sheet4 Sheet4 Sheet5 Sheet5 Sheet6 Sheet6 The expected output is: Sheet1 Sheet2 Sheet3 Sheet4 Sheet5 Sheet6
Created attachment 34964 [details] Proposed fix for XSSFReader.XMLSheetRefReader Hi, We've run into the same issue with the new version of the library. The problem seems to be caused only by those xlsx files which have a specific order of the attributes inside the <sheet> tag of workbook.xml Example (which causes the problems): <sheet name="Sheet6" r:id="rId6" sheetId="4"/> While this one works correctly: <sheet name="Sheet6" sheetId="4" r:id="rId6"/> I've traced the root cause to a possible coding error in XMLSheetRefReader, for which I'm now providing a patch. I haven't tested it very thoroughly, but it seems to fix the problem.
Mauricio, thanks for the Excel test file, test case! Sebastian, thanks for the fix! Applied in r1793223. Will be included in POI 3.17-beta1. https://svn.apache.org/viewvc?view=revision&revision=1793223
Sorry, all. 1) shouldn't have introduced this and 2) should have caught this with tika-eval...argh. Thank you for reporting this and supplying a patch. Thank you, Javen!