Bug 56274 - SXSSF produces corrupt xlsx file from valid xlsx template
Summary: SXSSF produces corrupt xlsx file from valid xlsx template
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 3.10-FINAL
Hardware: PC All
: P2 blocker (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-16 15:10 UTC by Yaniv Kunda
Modified: 2014-05-21 17:08 UTC (History)
0 users



Attachments
Sample input file exhibiting the problem (15.84 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2014-03-16 15:10 UTC, Yaniv Kunda
Details
Patch for fixing the bug (and other minor optimizations) (24.67 KB, patch)
2014-03-19 09:26 UTC, Yaniv Kunda
Details | Diff
Patch for fixing the bug (1.41 KB, patch)
2014-04-27 10:58 UTC, Yaniv Kunda
Details | Diff
Unit test for bug #56274 (4.62 KB, patch)
2014-05-12 07:16 UTC, Yaniv Kunda
Details | Diff
Sample input file exhibiting the problem (required by unit test) (12.48 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2014-05-12 07:18 UTC, Yaniv Kunda
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yaniv Kunda 2014-03-16 15:10:42 UTC
Created attachment 31393 [details]
Sample input file exhibiting the problem

Upgrading from poi-3.9 to poi-3.10 caused a bug where xlsx files loaded as a template using XSSF, added with some data and written with SXSSF resulted in corrupt files.

Opening the resulting files with Excel 2013 results in the following Yes/No dialog:
"We found a problem with some content in 'Filename.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes."
Clicking yes shows the recovery result dialog, with a list of repaired table*.xml files under /xl/tables in the internal excel structure.

Attached is a sample input xlsx file exhibiting the problem, which can be reproduced by the following reduced test:

public static void main(String[] args) throws Exception {
    File inputFile = new File("ExecutiveReportTemplate.xlsx");
    File outputFile = new File("ExecutiveReportTemplate-new.xlsx");
    XSSFWorkbook inputWorkbook = new XSSFWorkbook(new FileInputStream(inputFile));
    SXSSFWorkbook outputWorkbook = new SXSSFWorkbook(inputWorkbook);
    outputWorkbook.write(new FileOutputStream(outputFile));
}

I examined the internal xml files, and the differences seem to be in the /xl/tables/table*.xml files -
The tableColumn elements have the same id and dataDxfId attributes, but different name attributes:
in poi-3.9 the name attributes correspond to the order of the columns in the template worksheet,
in poi-3.10 the name attributes seem to be out-of-order, and one of the columns have a duplicate name.
Comment 1 Yaniv Kunda 2014-03-19 09:26:59 UTC
Created attachment 31403 [details]
Patch for fixing the bug (and other minor optimizations)

I've located the bug XSSFTable.updateHeaders(), where the xmlFragment id is used as a cellnum - which will only be the same in the special case of adding columns to a sheet one by one, in order; inserting/moving columns will create xmlFragment ids not consistent with cell numbers.

I have fixed it by using a counter (starting from firstHeaderColumn) that advances along the table columns, and then used to fetch sheet cells by their actual cellnum.

The additional minor optimizations are not related, but I deemed them appropriate while reviewing XSSFTable - if a commiter wishes to discard them for the purpose of fixing this bug, then only the changes in the updateHeaders() method should be included.
Comment 2 Dominik Stadler 2014-04-18 15:06:06 UTC
I tried to take a look, but unfortunately your patch changes lots of whitespaces and other unrelated things like imports and thus is hard to apply cleanly.

Any chance you can produce a minimal patch with only the actually required changes? This will make it easier for others to review/apply the patch.

Thanks!
Comment 3 Yaniv Kunda 2014-04-27 10:58:11 UTC
Created attachment 31564 [details]
Patch for fixing the bug
Comment 4 Nick Burch 2014-04-28 11:24:06 UTC
Thanks for this updated patch

Is there any chance you could write a short unit test, which triggers the problem on 3.10, and shows it's fixed with the patch applied? That way, we can be sure it's solved, and also be sure it remains fixed into the future!
Comment 5 Yaniv Kunda 2014-05-12 07:16:26 UTC
Created attachment 31605 [details]
Unit test for bug #56274
Comment 6 Yaniv Kunda 2014-05-12 07:18:39 UTC
Created attachment 31606 [details]
Sample input file exhibiting the problem (required by unit test)

This is the sample input file for the attached unit test -
it should be placed in the /test-data/spreadsheet/ directory of the distribution
Comment 7 Nick Burch 2014-05-21 17:08:30 UTC
Thanks, patch and unit test applied in r1596624.