Bug 65464 - removeFormula() on main shared formula corrupts Excel workbook
Summary: removeFormula() on main shared formula corrupts Excel workbook
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 5.0.0-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-26 15:02 UTC by Nico Seegert
Modified: 2021-10-09 12:31 UTC (History)
0 users



Attachments
Minimal Java Source File (3.05 KB, text/plain)
2021-07-26 15:02 UTC, Nico Seegert
Details
Stripped down excel template (13.34 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2021-07-26 15:04 UTC, Nico Seegert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nico Seegert 2021-07-26 15:02:56 UTC
Created attachment 37968 [details]
Minimal Java Source File

Dear Apache community,

we noticed the following bug reproducible with the attached source file and template:

Using removeFormula() on the defining / main cell of a shared formula array breaks the resulting Excel file and it cannot be repaired by Excel upon opening.

This issue seems related, but not identical to the reported issue 

Bug 65306 - shiftRows remove row contains shared formula master cause corrupt file
Comment 1 Nico Seegert 2021-07-26 15:04:22 UTC
Created attachment 37969 [details]
Stripped down excel template
Comment 2 PJ Fanning 2021-07-26 17:31:13 UTC
looks related to https://bz.apache.org/bugzilla/show_bug.cgi?id=58106 and that that fix did not fix this use case
Comment 3 PJ Fanning 2021-07-26 17:51:35 UTC
for some reason, V16 is stored in original doc without being marked as using a shared formula (t="shared")

```
         <c r="V16" s="5">
            <f>U16/R16</f>
            <v>0.75738345902523652</v>
         </c>
```

so when V15 is no longer a shared formula, the code makes V16 the new master but doesn't mark it as t="shared"

```
         <c r="V16" s="5">
            <f ref="V16:V29">U16/R16</f>
            <v>0.75738345902523652</v>
         </c>
```
Comment 4 PJ Fanning 2021-07-26 18:08:37 UTC
if you add 

```
        writer.insertIntoWorkbook("SheetWithSharedFormula", 15, 21, 98765.);
```

this results in an out xlsx that is still corrupt (as far as Excel is concerned) but it repairs this xlsx ok.

The problem does appear to be with the input xlsx and its strange V16 cell.
Comment 5 PJ Fanning 2021-07-26 19:24:30 UTC
I have a PR that fixes this issue (Excel is still not entirely happy with the calc-chain, as per my previous comment) but the file can be opened and automatically repaired. https://github.com/apache/poi/pull/241
Comment 6 PJ Fanning 2021-07-28 09:37:29 UTC
committed with r1891849
Comment 7 Nico Seegert 2021-07-28 09:59:47 UTC
Hello PJ Fanning,

thank you very much for your quick response and the implementation of the fix. 

I do not know how Excel was able to produce such a workbook with missing t="shared" entry, but as we are not in control of creating the workbook, it is important for us that manipulation of the workbook with Apache POI does not produce a corrupt workbook / or at least one that be repaired and opened.

Do I understand correctly then that this fix will go into the next POI release (5.1)?

Best regards
Nico
Comment 8 PJ Fanning 2021-07-28 13:25:13 UTC
Release info is kept at https://poi.apache.org/changes.html
Comment 9 PJ Fanning 2021-07-28 13:25:22 UTC
Release info is kept at https://poi.apache.org/changes.html
Comment 10 Nico Seegert 2021-08-02 16:20:07 UTC
Hello PJ Fanning,

as you said, the created Excel File is still corrupt upon opening but can be repaired now.
However, the following code fix in XSSFSheet should create a non-corrupt Excel (at least when I tested it locally) without the need to be repaired:

Replace

```
    nextF.setT(STCellFormulaType.SHARED); //https://bz.apache.org/bugzilla/show_bug.cgi?id=65464
```

by

```
    nextF.setT(STCellFormulaType.SHARED); //https://bz.apache.org/bugzilla/show_bug.cgi?id=65464
    if (!nextF.isSetSi()) {
        nextF.setSi(f.getSi());
    }
```

I do not know whether the if-check is necessary, as I do not know the code base too well. However, the addition above created non-corrupt Excel files for us.

It would be great if this fix (or similar) can be added to 5.0.1. - is a new bug ticket needed?

Best regards,
Nico
Comment 11 PJ Fanning 2021-08-02 17:22:43 UTC
Thanks Nico - committed with r1891962
Comment 12 Nico Seegert 2021-08-03 06:48:21 UTC
No problem, thank you for implementing it so quickly :-)
Comment 13 PJ Fanning 2021-10-09 12:31:24 UTC
I had to revert this change - it was causing problems with https://bz.apache.org/bugzilla/show_bug.cgi?id=65268

The test case was retained from this patch was retained and seems to behave ok except the anomalous cell in original file that did not have the right t=shared value - that is no longer corrected - but the cell still ends up with the right formula