Bug 57423 - shiftRows() produces a corrupted xlsx file
Summary: shiftRows() produces a corrupted xlsx file
Status: REOPENED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.11-FINAL
Hardware: PC All
: P2 blocker with 24 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 59581 (view as bug list)
Depends on:
Blocks: 62711
  Show dependency tree
 
Reported: 2015-01-07 15:35 UTC by Andrew Wdziekonski
Modified: 2019-08-07 13:27 UTC (History)
3 users (show)



Attachments
java test code (2.76 KB, text/x-java-source)
2015-01-07 15:35 UTC, Andrew Wdziekonski
Details
xlsx file used for testing (9.21 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-01-07 15:37 UTC, Andrew Wdziekonski
Details
JUnit test class. (1.92 KB, text/plain)
2017-12-10 16:51 UTC, Luca
Details
Test case to illustrate bug 62906 (10.29 KB, text/plain)
2019-08-07 12:22 UTC, David Gauntt
Details
Partial bug fix for bug 57423 (1.05 KB, patch)
2019-08-07 12:42 UTC, David Gauntt
Details | Diff
Full bug fix for 57423 (2.07 KB, patch)
2019-08-07 13:27 UTC, David Gauntt
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Wdziekonski 2015-01-07 15:35:34 UTC
Created attachment 32353 [details]
java test code

XSSF implementation of Sheet.shiftRows() causes corrupted output xlsx file when the shift value is bigger than the number of row(s) being shifted. This is regardless of whether the shift value is negative or positive.

Excel 2010 on opening the output file says:
"Excel found unreadable content" and offers recovering the file by removing the unreadable content.

This can be observed in examples like the following:
negative shift of 1 row by less than -1
negative shift of 2 rows by less than -2
positive shift of 1 row by 2 or more 
positive shift of 2 rows by 3 or more

Please find java test code attached along with xlsx template file used for testing.
Comment 1 Andrew Wdziekonski 2015-01-07 15:37:07 UTC
Created attachment 32354 [details]
xlsx file used for testing
Comment 2 Dominik Stadler 2015-03-01 18:40:49 UTC
In a quick test on Linux with LibreOffice I could not reproduce this, the shifted rows looked fine with (4, 5, -3), let's try if it is reproducible on Windows with Excel...
Comment 3 Andrew Wdziekonski 2015-03-06 09:50:43 UTC
This is the behaviour I am having on windows excel 2010.
Comment 4 Dominik Stadler 2015-04-30 09:11:47 UTC
I verified that Excel complains about such a sheet. Analysis indicates that the cause is that we do not fix the order of rows in the data structure XSSFSheet->worksheet->sheetdata, which holds an array of rows. 

The shifting currently leaves this array unordered, which seems to be fine for LibreOffice/OpenOffice, but causes Excel to complain.

Unfortunately fixing seems to be a bit harder as the XmlBeans interfaces does not easily allow to change array contents without actually cloning the whole object, let's see if there is a way to do it correctly.
Comment 5 Dominik Stadler 2015-04-30 11:55:18 UTC
I have now added a testcase to class TestUnfixedBugs which verifies this bug, unfortunately this is complicated to fix because XmlBeans makes it hard to reorder things in an array...
Comment 6 Mark Murphy 2016-05-19 19:40:02 UTC
*** Bug 59581 has been marked as a duplicate of this bug. ***
Comment 7 Dominik Stadler 2016-11-08 21:27:12 UTC
Probably one of the XmlCursor.move... methods could be useful here... 

See https://xmlbeans.apache.org/docs/2.0.0/reference/org/apache/xmlbeans/XmlCursor.html#moveXml(org.apache.xmlbeans.XmlCursor)
Comment 8 Luca 2017-12-08 21:07:28 UTC
Not fixed in 3 years?
Comment 9 Javen O'Neal 2017-12-09 00:36:11 UTC
Feel free to submit a patch! Everything is open source.
Comment 10 Luca 2017-12-10 16:51:46 UTC
Created attachment 35597 [details]
JUnit test class.

Add JUnit test class.
Comment 11 Javen O'Neal 2017-12-13 05:15:03 UTC
Before:    After:
  A        A
1 a        <empty>
2 b        <empty>
3 c        c
4          a
5          b

sh.createRow(0).createCell(0).setCellValue("a");
sh.createRow(1).createCell(0).setCellValue("b");
sh.createRow(2).createCell(0).setCellValue("c");
sh.shiftRows(0, 1, 3); //move "a" and "b" 3 rows down

Output: xl/sheet1.xlsx
<?xml version="1.0" encoding="UTF-8"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <dimension ref="A3:B5"/>
  <sheetViews><sheetView workbookViewId="0" tabSelected="true"/></sheetViews>
  <sheetFormatPr defaultRowHeight="15.0"/>
  <sheetData>
    <row r="4"><c r="A4" t="s"><v>0</v></c></row> <!-- "a" -->
    <row r="5"><c r="A5" t="s"><v>1</v></c></row> <!-- "b" -->
    <row r="3"><c r="A3" t="s"><v>2</v></c></row> <!-- "c" -->
  </sheetData>
  <pageMargins bottom="0.75" footer="0.3" header="0.3" left="0.7" right="0.7" top="0.75"/>
</worksheet>
Comment 12 Javen O'Neal 2017-12-13 06:23:07 UTC
(In reply to Luca from comment #10)
> Created attachment 35597 [details]
> JUnit test class.
> 
> Add JUnit test class.

Thanks for the unit test. Applied in r1817975 using `sh.shiftRows(0, 1, 3)`
Comment 13 David Gauntt 2018-09-21 17:04:33 UTC
This bug has reappeared with version 4.0.0.  The following code demonstrates the bug, and demonstrates a workaround.  If bFixBug is set to false, Excel 2011 for Mac indicates that the destination file is corrupted.  If bFixBug is set to true, the file opens without trouble.

	public static void doUnitTest(File file) {
		final XSSFWorkbook workbook = new XSSFWorkbook();
		final XSSFSheet sheet = workbook.createSheet();
		final boolean bFixBug = false;
		final int numRows = 5;
		final int numCols = 5;

		System.out.println("doUnitTest: starting");

		for (int nRow = 0; nRow < numRows; ++nRow) {
			final XSSFRow row = sheet.createRow(nRow);
			for (int nCol = 0; nCol < numCols; ++nCol) {
				final XSSFCell cell = row.createCell(nCol);
				cell.setCellType(CellType.STRING);
				cell.setCellValue(String.format("Row %d col %d", nRow, nCol));
			}
		}

		final int nFirstRow = 3;
		final int nLastRow = 4;
		final int nNumToShift = 1;
		final int nFirstDstRow = nFirstRow + nNumToShift;
		final int nLastDstRow = nLastRow + nNumToShift;

		sheet.shiftRows(nFirstRow, nLastRow, nNumToShift);
		if (bFixBug) {
			for (int nRow = nFirstDstRow; nRow <= nLastDstRow; ++nRow) {
				final XSSFRow row = sheet.getRow(nRow);
				if (row != null) {
					String msg = "Row[rownum=" + row.getRowNum()
							+ "] contains cell(s) included in a multi-cell array formula. "
							+ "You cannot change part of an array.";
					for (Cell c : row) {
						((XSSFCell) c).updateCellReferencesForShifting(msg);
					}
				}
			}

		}

		try (OutputStream fileOut = new FileOutputStream(file)) {
			workbook.write(fileOut);
		} catch (Exception e) {
			System.err.println(e.getMessage());
		} finally {
			try {
				workbook.close();
			} catch (IOException e) {
				System.err.println(e.getMessage());
			}
		}
		System.out.println("doUnitTest: done");
	}
Comment 14 Dominik Stadler 2019-07-01 14:55:47 UTC
Cannot reproduce this any longer with latest trunk, seems to have been fixed via r1860384
Comment 15 David Gauntt 2019-08-07 12:22:34 UTC
Created attachment 36711 [details]
Test case to illustrate bug 62906

This JUnit test case tests both shiftRows and shiftColumns, with nShift both larger than the size of the shifted region (gapped shift) and smaller (overlapped shift).  The shifts are done on a worksheet that has cells with both absolute and relative formulas in the source and destination regions that refer to cells outside the source and destination regions, and absolute and relative formulas outside the source and destination that refer to cells inside the source and destination.  There are no merge areas in the test worksheet.
Comment 16 David Gauntt 2019-08-07 12:31:06 UTC
As of 6 Aug 2019, running the test case that I just posted (attachment 36711 [details]) causes 2 failures; these are when rows are shifted up with a gap between the shift and the destination, and when rows are shifted down with a gap.

The immediate cause is a bug in XSSFSheet.rebuildRows.  This sorts the _rows map in XSSFSheet, but does not sort the row list in the CTWorksheet (getCTWorksheet().getSheetData().getRowList())
Comment 17 David Gauntt 2019-08-07 12:42:23 UTC
Created attachment 36712 [details]
Partial bug fix for bug 57423

In this patch to XSSFSheet.rebuildRows, if bRebuildCTSheetData==false, then the XSSFRow map (_rows) is sorted by row number, but the CTRow list (getCTWorksheet().getSheetData().getRowList()) is not.  This is the current behavior in trunk (as of 6 Aug 2019).

If bRebuildCTSheetData==true then both the list and the map are sorted by row number.  However, the test case in attachment 36711 [details] still fails because of the following error:

Testcase: testShiftUpGapped took 0.027 sec
	Caused an ERROR
null
org.apache.xmlbeans.impl.values.XmlValueDisconnectedException
	at org.apache.xmlbeans.impl.values.XmlObjectBase.check_orphaned(XmlObjectBase.java:1258)
	at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.isSetF(Unknown Source)
	at org.apache.poi.xssf.usermodel.helpers.XSSFRowColShifter.updateRowFormulas(XSSFRowColShifter.java:104)
	at org.apache.poi.xssf.usermodel.helpers.XSSFRowColShifter.updateSheetFormulas(XSSFRowColShifter.java:88)
	at org.apache.poi.xssf.usermodel.helpers.XSSFRowColShifter.updateFormulas(XSSFRowColShifter.java:74)
	at org.apache.poi.xssf.usermodel.helpers.XSSFRowShifter.updateFormulas(XSSFRowShifter.java:46)
	at org.apache.poi.xssf.usermodel.XSSFSheet.shiftRows(XSSFSheet.java:3032)
	at org.apache.poi.xssf.usermodel.XSSFSheet.shiftRows(XSSFSheet.java:2999)
	at org.apache.poi.ss.unfixedBugs.TestXSSFSheetShiftRowsAndColumns.runTest(TestXSSFSheetShiftRowsAndColumns.java:272)
	at org.apache.poi.ss.unfixedBugs.TestXSSFSheetShiftRowsAndColumns.testShiftUpGapped(TestXSSFSheetShiftRowsAndColumns.java:112)
Comment 18 David Gauntt 2019-08-07 13:27:48 UTC
Created attachment 36713 [details]
Full bug fix for 57423

This patch fixes the bug introduce by the patch in attachment 36712 [details].  If the CTRow list is sorted by the call to rebuildRows in shiftCommentsAndRows, then any call to some of the XSSFRow objects will throw an XmlValueDisconnectedException.  However, if the sort does not happen until the end of XSSFSheet.shiftRows, there is no problem.

This patch adds an overload to rebuildRows().  When rebuildRows() or rebuildRows(false) is called, the CTRow list is not sorted.  When rebuildRows(true) is called, the CTRow list is sorted.  The call to rebuildRows() at the end of shiftRows is then changed to rebuildRows(true).

The test case in attachment 36711 [details] now runs without failure, and I think that we are ready to close this bug again.

"I have killed two mummies!"
"It was the same mummy".
"Yes, but I killed it twice!"