Bug 62906 - XSSFSheet.createTable(AreaReference) corrupts worksheet
Summary: XSSFSheet.createTable(AreaReference) corrupts worksheet
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 4.0.0-FINAL
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 63372 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-11-13 03:53 UTC by Gili
Modified: 2019-06-22 06:26 UTC (History)
1 user (show)



Attachments
Proposed fix to bug 62906 (2.32 KB, patch)
2019-04-23 12:18 UTC, David Gauntt
Details | Diff
Test case to illustrate bug 62906 (1.23 KB, text/plain)
2019-04-23 12:26 UTC, David Gauntt
Details
Illustration of different kinds of table name conflicts (4.00 KB, text/plain)
2019-04-29 12:38 UTC, David Gauntt
Details
Test case to illustrate bug 62906 (7.02 KB, text/plain)
2019-05-04 16:20 UTC, David Gauntt
Details
Patch to XSSFTable.setTableName() (1.82 KB, patch)
2019-05-04 16:24 UTC, David Gauntt
Details | Diff
Patch to XSSSheet.createTable (3.10 KB, patch)
2019-05-04 16:28 UTC, David Gauntt
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gili 2018-11-13 03:53:34 UTC
Per https://stackoverflow.com/q/53173526/14731 (I am not the author) passing a non-null value into XSSFSheet.createTable(AreaReference) always generates a corrupt XLSX file.

I ran into issue for hours until I ran across this Stackoverflow post and I can confirm that passing in null works well. Obviously I'd like to pass a non-null value into the method (as the Javadoc explicitly indicates that the parameter should not be null) but no one seems to have figured out how to make this work cleanly.

Consider updating your sample code / testcases to demonstrate how this is meant to work.
Comment 1 PJ Fanning 2018-11-13 21:38:09 UTC
I've nade a few changes in the sample code and the CreateTable xlsx is produced ok and I can open it in Mac Numbers and LibreOffice without issues. I don't have M$ producta installed on my laptop.
Comment 2 David Gauntt 2019-04-22 11:04:03 UTC
*** Bug 63372 has been marked as a duplicate of this bug. ***
Comment 3 Gili 2019-04-23 01:16:46 UTC
PJ Fanning,

Please post your source-code and the output file. I will test both against Microsoft Office under Windows.
Comment 4 David Gauntt 2019-04-23 12:18:39 UTC
Created attachment 36545 [details]
Proposed fix to bug 62906

Since XSSF.createTable will corrupt a worksheet unless setDisplayName is called, this code calls setDisplayName at the end of createTable.  The name that is used is "TableN", where N is the smallest positive integer such that "TableN" is not used by a previously defined name.
Comment 5 David Gauntt 2019-04-23 12:21:46 UTC
Comment on attachment 36545 [details]
Proposed fix to bug 62906

XSSFSheet.createTable will corrupt a workbook unless CTTable.setDisplayName is called.  This patch to XSSFSheet.createTable calls setDisplayName(tableName), where tableName has the text "Table<N>", where <N> is the smallest positive integer such that "Table<N>" is not used by a previously defined name.
Comment 6 David Gauntt 2019-04-23 12:26:52 UTC
Created attachment 36546 [details]
Test case to illustrate bug 62906

This method returns an XSSFWorkbook with a single XSSFSheet containing a single XSSFTable.

If it is called with bApplyBugFix set to false, the resulting workbook will be reported by Microsoft Excel 2010 as corrupt.  If bApplyBugFix is set to true, the resulting workbook is opened by Microsoft Excel 2010 without problem.
Comment 7 PJ Fanning 2019-04-23 13:41:57 UTC
Comment on attachment 36545 [details]
Proposed fix to bug 62906

the tableNumber is already set by `int tableNumber = getPackagePart().getPackage().getPartsByContentType(XSSFRelation.TABLE.getContentType()).size() + 1` - any reason not to use this when setting the displayName?
Comment 8 David Gauntt 2019-04-23 15:50:40 UTC
That is a good idea; we can just remove the line "tableNumber=0".  It is still important to test that the name does not conflict with another name.  For example, one could have a non-table range named "Table1", and then add a table.  It will corrupt the file if the table is named "Table1".
Comment 9 PJ Fanning 2019-04-25 17:22:11 UTC
The CreateTable example (https://github.com/apache/poi/blob/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/CreateTable.java) has an explicit setDisplayName - is this not enough? I'm not sure if it makes sense for us to generate this name.
Comment 10 David Gauntt 2019-04-25 20:47:32 UTC
The table absolutely, positively has to have a display name set or the workbook will be corrupted.  It seems to me that it is best to set it to a default value, and allow the user to change that value if desired, than to let the user write code, run it, produce corrupted workbooks, and then look around to see if there is a code example that will fix the problem.  Think of it as a fail-safe approach to coding.

We may want to add another overload of createTable to allow the user to define a different table name instead of calling setDisplayName() after the fact.
Comment 11 Gili 2019-04-26 00:42:50 UTC
Absolutely!

David nailed it on the head. Debugging corrupt files is extremely annoying/difficult because Excel does not provide any meaningful hints of what is wrong.

You should have reasonable defaults, let users override them (if so desired), and I'd even go as far as preventing users from assigning tables a null name. It should be impossible to export a corrupt file, period.
Comment 13 David Gauntt 2019-04-26 13:01:48 UTC
I just looked at the patch to createTable that PJ posted.  It does not check to see if the table's display name conflicts with an existing name.

I just tried the following in Excel 2010:

1) Create a new workbook
2) Create a defined name in the workbook "Table1"
3) Add a table to Sheet1

The table created in step (3) has the name "Table2" in order to avoid conflicting with the previously defined name "Table1".  I haven't checked the patch, but I would bet that following the above procedure using POI would result in a corrupted workbook because there is named range called "Table1" and a table called "Table1".

I realize that this would be a rare occurrance (who names a non-table range "Table1"?) if if we can foresee it and the prevention is simple, I would recommend the prevention.
Comment 14 PJ Fanning 2019-04-26 14:26:25 UTC
there is existing code upping the tableNumber - so I don't want 2 blocks of code upping the number - so if you can provide a real test case that proves the existing code to up the number is wrong, then we can fix it.

https://github.com/apache/poi/blob/trunk/src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFSheet.java#L4116 - up to 4130
Comment 15 Gili 2019-04-28 01:03:55 UTC
PJ Fanning,

Here is a simpler testcase than what David mentioned:

1. Sheet contains "table1" (generated automatically or manually)
2. User adds another table and manually names it "table1"
3. User exports worksheet. Exported file is flagged as corrupt.

Expected behavior: Setting a table to a conflicting name should throw an exception (fail fast) instead of failing much later on (e.g. when the worksheet is exported and reopened in Excel).
Comment 16 Gili 2019-04-28 01:13:39 UTC
Another testcase:

1. User creates a table using sheet.createTable(reference). POI assigns it the name "table1".
2. User changes the table name to "table2" using setDisplayName()
3. User creates a second table using sheet.createTable(reference). POI assigns it the name "table2".
4. Worksheet is now in a bad state.
Comment 17 PJ Fanning 2019-04-28 15:33:21 UTC
Apache POI is a volunteer project with no affiliation to Microsoft.

If Microsoft Excel does not provide details about corruption in documents, then that is an issue you should bring to Microsoft.
Comment 18 Gili 2019-04-28 15:44:25 UTC
Sure... but the fact remains that POI is generating invalid worksheets. Nicer error message or not, this is a bug.
Comment 19 David Gauntt 2019-04-29 12:38:58 UTC
Created attachment 36555 [details]
Illustration of different kinds of table name conflicts

This code can be used to demonstrate conflicts between the name of a new table and previously defined tables or previously defined Name objects.  Details are included in the javadoc.

In brief, the workbook will be corrupted if two Tables have the same name, or a Table has the same name as a defined Name (e.g. a named range).  This is true whether the Name has workbook scope or worksheet scope, and whether the tables are in the same worksheet or different worksheets.

So long as no table has a null name, Excel 2010 can repair the corruption by renaming a Table.  However, there is no guarantee that this is true for other versions of Excel, or for other ooxml compliant spreadsheet programs such as OpenOffice.
Comment 20 David Gauntt 2019-05-04 16:20:34 UTC
Created attachment 36559 [details]
Test case to illustrate bug 62906

This JUnit test class runs 3 tests:

testDefaultTableName: fails if the the default name conflicts with existing XSSFTables or XSSFNames

testRangeNameConflict: fails if it is possible to rename an XSSFTable using the name of an existing XSSFName

testTableNameConflict: fails if it is possible to rename an XSSFTable using the name of an existing XSSFTable

The test also writes xlsx files to "build/custom-reports-test".  Open these files to ensure that they are not corrupted.
Comment 21 David Gauntt 2019-05-04 16:24:48 UTC
Created attachment 36560 [details]
Patch to XSSFTable.setTableName()

This patch includes a new static function, XSSFTable.isInvalidTableName, that returns true if the argument conflicts with an existing XSSFTable or XSSFName.

In addition, setDisplayName will throw an InvalidArgumentException if isInvalidTableName returns true.
Comment 22 David Gauntt 2019-05-04 16:28:46 UTC
Created attachment 36561 [details]
Patch to XSSSheet.createTable

This patch calls XSSFTable.isInvalidTableName (see previous patch) before calling setDisplayName.  If isInvalidTableName returns true, then tableNumber is incremented until isInvalidTableName returns false.
Comment 23 Dominik Stadler 2019-06-22 06:26:27 UTC
Implemented a fix via r1861819 which is a bit simpler and which does not introduce any conflicting name at all, hopefully this captures even more cases of possible inconsistencies.