Bug 65464

Summary: removeFormula() on main shared formula corrupts Excel workbook
Product: POI Reporter: Nico Seegert <nico.seegert>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 5.0.0-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Minimal Java Source File
Stripped down excel template

Description Nico Seegert 2021-07-26 15:02:56 UTC
Created attachment 37968 [details]
Minimal Java Source File

Dear Apache community,

we noticed the following bug reproducible with the attached source file and template:

Using removeFormula() on the defining / main cell of a shared formula array breaks the resulting Excel file and it cannot be repaired by Excel upon opening.

This issue seems related, but not identical to the reported issue 

Bug 65306 - shiftRows remove row contains shared formula master cause corrupt file
Comment 1 Nico Seegert 2021-07-26 15:04:22 UTC
Created attachment 37969 [details]
Stripped down excel template
Comment 2 PJ Fanning 2021-07-26 17:31:13 UTC
looks related to https://bz.apache.org/bugzilla/show_bug.cgi?id=58106 and that that fix did not fix this use case
Comment 3 PJ Fanning 2021-07-26 17:51:35 UTC
for some reason, V16 is stored in original doc without being marked as using a shared formula (t="shared")

```
         <c r="V16" s="5">
            <f>U16/R16</f>
            <v>0.75738345902523652</v>
         </c>
```

so when V15 is no longer a shared formula, the code makes V16 the new master but doesn't mark it as t="shared"

```
         <c r="V16" s="5">
            <f ref="V16:V29">U16/R16</f>
            <v>0.75738345902523652</v>
         </c>
```
Comment 4 PJ Fanning 2021-07-26 18:08:37 UTC
if you add 

```
        writer.insertIntoWorkbook("SheetWithSharedFormula", 15, 21, 98765.);
```

this results in an out xlsx that is still corrupt (as far as Excel is concerned) but it repairs this xlsx ok.

The problem does appear to be with the input xlsx and its strange V16 cell.
Comment 5 PJ Fanning 2021-07-26 19:24:30 UTC
I have a PR that fixes this issue (Excel is still not entirely happy with the calc-chain, as per my previous comment) but the file can be opened and automatically repaired. https://github.com/apache/poi/pull/241
Comment 6 PJ Fanning 2021-07-28 09:37:29 UTC
committed with r1891849
Comment 7 Nico Seegert 2021-07-28 09:59:47 UTC
Hello PJ Fanning,

thank you very much for your quick response and the implementation of the fix. 

I do not know how Excel was able to produce such a workbook with missing t="shared" entry, but as we are not in control of creating the workbook, it is important for us that manipulation of the workbook with Apache POI does not produce a corrupt workbook / or at least one that be repaired and opened.

Do I understand correctly then that this fix will go into the next POI release (5.1)?

Best regards
Nico
Comment 8 PJ Fanning 2021-07-28 13:25:13 UTC
Release info is kept at https://poi.apache.org/changes.html
Comment 9 PJ Fanning 2021-07-28 13:25:22 UTC
Release info is kept at https://poi.apache.org/changes.html
Comment 10 Nico Seegert 2021-08-02 16:20:07 UTC
Hello PJ Fanning,

as you said, the created Excel File is still corrupt upon opening but can be repaired now.
However, the following code fix in XSSFSheet should create a non-corrupt Excel (at least when I tested it locally) without the need to be repaired:

Replace

```
    nextF.setT(STCellFormulaType.SHARED); //https://bz.apache.org/bugzilla/show_bug.cgi?id=65464
```

by

```
    nextF.setT(STCellFormulaType.SHARED); //https://bz.apache.org/bugzilla/show_bug.cgi?id=65464
    if (!nextF.isSetSi()) {
        nextF.setSi(f.getSi());
    }
```

I do not know whether the if-check is necessary, as I do not know the code base too well. However, the addition above created non-corrupt Excel files for us.

It would be great if this fix (or similar) can be added to 5.0.1. - is a new bug ticket needed?

Best regards,
Nico
Comment 11 PJ Fanning 2021-08-02 17:22:43 UTC
Thanks Nico - committed with r1891962
Comment 12 Nico Seegert 2021-08-03 06:48:21 UTC
No problem, thank you for implementing it so quickly :-)
Comment 13 PJ Fanning 2021-10-09 12:31:24 UTC
I had to revert this change - it was causing problems with https://bz.apache.org/bugzilla/show_bug.cgi?id=65268

The test case was retained from this patch was retained and seems to behave ok except the anomalous cell in original file that did not have the right t=shared value - that is no longer corrected - but the cell still ends up with the right formula