Bug 63463 - Corrupt excel file using POI api ver 4.1 with shiftRows()
Summary: Corrupt excel file using POI api ver 4.1 with shiftRows()
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 4.0.0-FINAL
Hardware: PC All
: P2 regression with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-24 20:47 UTC by Adil Rashid
Modified: 2019-06-25 12:51 UTC (History)
2 users (show)



Attachments
Zip file contains Non corrupt file created using POI 3.17 and corrupt file created using POI 4.1 (105.18 KB, application/x-zip-compressed)
2019-05-24 20:47 UTC, Adil Rashid
Details
JUnit test case for bug 63463 (7.74 KB, text/plain)
2019-05-30 03:32 UTC, David Gauntt
Details
Patch to XSSFRow.shift(int) to fix bug 63463 (562 bytes, patch)
2019-05-30 03:36 UTC, David Gauntt
Details | Diff
Fix to incorrect value of procName (7.75 KB, text/plain)
2019-05-30 03:55 UTC, David Gauntt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Adil Rashid 2019-05-24 20:47:47 UTC
Created attachment 36599 [details]
Zip file contains Non corrupt file created using POI 3.17 and corrupt file created using POI 4.1

We were successfully able to generate XLSX file using an XLSX template with POI 3.17. We upgraded our development environment to 4.1 and now it is creating corrupt XLSX file. We are using POI api to insert rows dynamically and then writing data in the cells.

Issue: We are adding dynamic rows and writing data in cells but the cells are empty and file is corrupt.

I have attached non-corrupt excel file created in POI 3.17 and corrupt excel file created using POI 4.1. It might help figuring out the issue.

//Function adds a row in sheet
public static void addRow(Sheet worksheet, int rowNum) {

              Row newRow = worksheet.getRow(rowNum);
              Row sourceRow = worksheet.getRow(rowNum - 1);

              // If the row exist in destination, push down all rows by 1 else create a new row
              if (newRow != null) {
                     worksheet.shiftRows(rowNum, worksheet.getLastRowNum() + 1, 1);
                     newRow = worksheet.createRow(rowNum);
              } else {
                     newRow = worksheet.createRow(rowNum);
              }

              // Loop through source columns to add to new row
              for (int i = 0; i < sourceRow.getLastCellNum(); i++) {
                     Cell oldCell = sourceRow.getCell(i);
                     Cell newCell = newRow.createCell(i);

                     if (oldCell == null) {
                           newCell = null;
                           continue;
                     }

                     newCell.setCellStyle(oldCell.getCellStyle());
                     newCell.setCellType(oldCell.getCellTypeEnum());
              }

              // If there are are any merged regions in the source row, copy to new row
              for (int i = 0; i < worksheet.getNumMergedRegions(); i++) {
                     CellRangeAddress cellRangeAddress = worksheet.getMergedRegion(i);
                     if (cellRangeAddress.getFirstRow() == sourceRow.getRowNum()) {
                           CellRangeAddress newCellRangeAddress = new CellRangeAddress(
                                         newRow.getRowNum(),
                                         (newRow.getRowNum() + (cellRangeAddress.getLastRow() - cellRangeAddress
                                                       .getFirstRow())), cellRangeAddress
                                                       .getFirstColumn(), cellRangeAddress
                                                       .getLastColumn());
                           worksheet.addMergedRegion(newCellRangeAddress);
                     }
              }
       }


Thanks
Comment 1 Chris Dufour 2019-05-28 21:44:01 UTC
We also ran into this issue when upgrading from 3.1.5 4.1.0.

It appears the cause is the sheet.shiftRows function.

Our code shifted some rows up one column after removing some templating fields.

There is a snippet I extracted from within an (unzipped) XLSX sheet:

myDoc/xl/worksheets/sheet1.xml

<row r="11" spans="1:27" ht="19">
    <c r="A12" s="15" t="s">
        <v>67</v>
    </c>
    <c r="B12" s="16" t="s">
        <v>68</v>
    </c>
    <c r="C12" s="15" t="s">
        <v>69</v>
    </c>
    ...
</row>

It appears that the shift function is correctly updating the row property of the rows (r="11") but not the row/column value for the cell (r="A12"). This results in the XLSX file being in an invalid state, which Microsoft Excel, and likely other Spreadsheet programs, treat as corrupted.

All both of 4.1.0 and 4.0.1 appear to have this issue, meaning that it was probably caused by some change in the 4.0.0 release.

We are using XSSFWorkbook (workbook.getSheetAt(sheet).shiftRows(x, y, -1)), but this may also affect other workbook formats.
Comment 2 David Gauntt 2019-05-29 12:56:09 UTC
I ran into this problem a while back, developed a workaround, and forgot to submit a bug report.  My workaround is listed below; I will submit a patch sometime in the next week or two.

  sheet.shiftRows(nFirstRow, nLastRow, nNumToShift);
  if (bApplyBugFix) {
   for (int nRow = nFirstDstRow; nRow <= nLastDstRow; ++nRow) {
    final XSSFRow row = sheet.getRow(nRow);
    if (row != null) {
     String msg = "Row[rownum=" + row.getRowNum()
       + "] contains cell(s) included in a multi-cell array formula. "
       + "You cannot change part of an array.";
     for (Cell c : row) {
      ((XSSFCell) c).updateCellReferencesForShifting(msg);
     }
    }
   }
  }
Comment 3 David Gauntt 2019-05-30 03:27:57 UTC
Oops!  Left a couple of lines out of my workaround

  sheet.shiftRows(nFirstRow, nLastRow, nNumToShift);
  if (bApplyBugFix) {
   final int nFirstDstRow = nFirstRow + nNumToShift;
   final int nLastDstRow = nLastRow + nNumToShift;

   for (int nRow = nFirstDstRow; nRow <= nLastDstRow; ++nRow) {
    final XSSFRow row = sheet.getRow(nRow);
    if (row != null) {
     String msg = "Row[rownum=" + row.getRowNum()
       + "] contains cell(s) included in a multi-cell array formula. "
       + "You cannot change part of an array.";
     for (Cell c : row) {
      ((XSSFCell) c).updateCellReferencesForShifting(msg);
     }
    }
   }
  }
Comment 4 David Gauntt 2019-05-30 03:32:52 UTC
Created attachment 36603 [details]
JUnit test case for bug 63463

This test case tests both XSSFSheet.shiftRows and XSSFSheet.shiftColumns().  It verifies both that the cell addresses are consistent after shiftRows and after shiftColumns, and that a merge area below and to the right of the shift point remains correct.
Comment 5 David Gauntt 2019-05-30 03:36:46 UTC
Created attachment 36604 [details]
Patch to XSSFRow.shift(int) to fix bug 63463

This patch fixes bug 63464.  Before the fix, XSSFRow.shift(int) called setRowNum(int) after updateCellReferencesForShifting(String).  With this fix, setRowNum is called before updateCellReferencesForShifting.
Comment 6 David Gauntt 2019-05-30 03:50:59 UTC
Comment on attachment 36604 [details]
Patch to XSSFRow.shift(int) to fix bug 63463

>    /**
>     * update cell references when shifting rows
>     *
>     * @param n the number of rows to move
>     */
>    protected void shift(int n) {
>        int rownum = getRowNum() + n;
>        String msg = "Row[rownum=" + getRowNum() + "] contains cell(s) included in a multi-cell array formula. " +
>                "You cannot change part of an array.";
>        setRowNum(rownum);  //  Bug 63463
>        for(Cell c : this){
>            ((XSSFCell)c).updateCellReferencesForShifting(msg);
>          }
>        //setRowNum(rownum); Bug 63463
>    }
Comment 7 David Gauntt 2019-05-30 03:55:37 UTC
Created attachment 36605 [details]
Fix to incorrect value of procName

This is a minor fix to my earlier test case, correcting the messages written by testShiftOneColumnAndTestMergeRegion.
Comment 8 PJ Fanning 2019-05-30 07:58:04 UTC
I merged https://svn.apache.org/viewvc?view=revision&revision=1860384

I'll see about trying to merge in the new test code later.
Comment 9 Adil Rashid 2019-05-30 19:27:46 UTC
Thanks a lot...patch worked
We are facing another issue after upgrade. Previously formulas were getting evaluated automatically but now we have add following line of code after each setCellFormula

cellTotal.setCellFormula(collectionTotal); // existing line
evaluator.evaluateFormulaCell(cellTotal); // we got to add line to evaluate the formula


Where evaulator is instance of FormulaEvaluator.

We do not want to add above lines of code because it will make us to change code any many places.

Please advise.

Thanks
Comment 10 PJ Fanning 2019-05-30 21:00:04 UTC
David - I merged your test class but 2 tests fail so I have them set to ignore, for now.
Comment 11 David Gauntt 2019-06-03 12:06:43 UTC
The failing tests were probably the concerning merge regions.  I am preparing to release a patch to fix problems in shiftMergeRegions, and a more complete test case.
Comment 12 David Gauntt 2019-06-03 12:09:57 UTC
There are some policy issues to decide before finishing my patch of shiftMergeRegions.

My current version passes the test cases that I wrote for merge regions that are unaffected by a shift, merge regions that are entirely overwritten by a shift, and merge regions that are moved but not resized.  However, there are some ambiguous cases for which I have a couple of proposed rules:

    1) If it is possible to define a merge area after the shift so it contains all cells that have not been overwritten and that were in the original merge area, and it does not contain any cells that were not in the original region, then the new merge area is kept.  However, if it is not possible to define a new merge area this way, the merge area is deleted.
    2) If the shift has caused the first cell in the merge area to be overwritten, or if the first cell in the merge area after the shift is different than the first cell below the shift, the value in the first cell before the shift is copied to the first cell after the shift. 

Are there any objections to using these rules while implementing shiftMergeRegions?


I have some examples to illustrate these rules (the row indices in these examples are one-based):


1) The merge area is A5:A8, and rows 1 to 4 are shifted downward 2 rows, overwriting A5 and A6.  The value in A5 would be copied to A7, and the merge range be shrunk to A7:A8.


2) The merge area is A1:A4, and rows 5 to 7 are shifted upward 2 rows overwriting rows 3 and 4, the merge are would be shrunk to A1:A2.


3) The merge area is A1:A4, and rows 1 to 2 are shifted downward 4 rows, overwriting rows 5 and 6.  The merge area is now A3:A6, and the value in the original A1 is copied to A3.


4) The merge area is A1:A3, and rows 1 to 2 are shifted downward 4 rows.  The cells in the original merge area are now at locations A3, A5, and A6.  Since they are not contiguous, the merge area is deleted.


5) The merge area is A4:A7, and rows 5 and 6 are shifted upward 4 rows, overwriting rows 1 and 2.  The cells in the original merge area are now at A1, A2, A4, and A7.  The merge area is deleted.
Comment 13 Norman 2019-06-25 12:50:11 UTC
I'd like to point out that we already have tickets for the broken shiftRows in 4.x

https://bz.apache.org/bugzilla/show_bug.cgi?id=62711
https://bz.apache.org/bugzilla/show_bug.cgi?id=57423

These should possibly be taken into consideration, and maybe closed along with this ticket.