Issue 126381

Summary: OO corrupts formulas when reading Excel spreadsheet
Product: Calc Reporter: J.Ty. <jty>
Component: open-importAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Normal    
Priority: P5 (lowest) CC: jty, oliver.brinzing, orcmid
Version: 4.1.1   
Target Milestone: ---   
Hardware: PC   
OS: Windows 7   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
an example spreadsheet corrupted by OO none

Description J.Ty. 2015-06-23 07:56:59 UTC

    
Comment 1 J.Ty. 2015-06-23 08:07:04 UTC
Created attachment 84806 [details]
an example spreadsheet corrupted by OO

OO corrupts spreadsheets created under Excel, which contain references to whole rows in the form '1:1'.

E.g., Excel's meaningful =INDEX(1:1,INDEX(2:2,3))
becomes OO's =INDEX($A1:$AMJ1INDEX($A2:$AMJ2)), which makes no sense and produces an error.

The most pervert form is that Excel's =INDEX(2:2,2)
is turned into OO's =INDEX($A2:$AMJ2), which is syntactically correct, but produces generally a different value.
Comment 2 orcmid 2015-06-23 20:52:09 UTC
SUMMARY (the TLDR)
I changed this to be a problem with open-import, since it involves conversion.
I also found that the problem is with the conversion to and/or recording of OpenFormula incorrectly, leading to errors of various kinds.  The use of full-row references in these INDEX formulas is not interoperable and involves a combination of defects.  It doubtless extends to cases beyond use in INDEX.

DEMONSTRATION

I confirmed J.Ty's example XSLX using Microsoft Office 2013 Excel.  It reopens fine.

 - - - Making an ODS from Excel and interchanging that - - - 

I used Excel 2013 to save the example as an .ODS file. Excel reopens the saved .ODS just fine.

Note, what is shown to users and what is recorded in the file can be different.
For example, the XSLX cell A4 formula entry "=INDEX(1:1,INDEX(2:2,3))" is *recorded* in the Excel-produced ODS file as "of:=INDEX([.1:.1];INDEX([.2:.2];3))". This is perfectly valid in accordance with section 5.8 References of the OpenFormula Specification part of ODF 1.2.  It might also be valid to record it as "of:=INDEX([.1];INDEX([.2];3)" since the start and end rows are the same each time.  Details at 6.14.6 INDEX determine that the second parameter is the number of the column to select since the data source is a single row vector.

Using the Excel-produced ODS, LibreOffice 4.3.5.2 *presents* the A4 formula as "=INDEX(1:1,INDEX(2:2,3))" and the results for all of A4:A6 are correct. 

However, on performing a Save As of the opened file, that LibreOffice saved version is not reopened correctly.  The re-saved A4:A6 on opening in LibreOffice *present* the same formulas but the cells show value "Err:502". 

When LibreOffice recorded that  ODS, the A4 formula is *recorded* as "of:=INDEX(1:1;INDEX(2:2;3))" and similarly for A5:A6.  That is, the formulas are rewritten in the Save As result to a form where LibreOffice fails.  When that file is opened in Excel 2013, Excel 2013 reports that the file is defective.  If Excel is allowed to repair the file, it will have the formulas dropped and the values preserved.

Using the Excel-produced ODS, Apache OpenOffice 4.1.1 Calc "presents" the A4 formula as "=INDEX(1:1;INDEX(2:2;3))" and similarly for the rest of A4:A6.  However, AOO shows "#NAME?" for the value of each of those cells. When the saved version from LibreOffice is opened in AOO, the results are the same ("Err:502") as when LibreOffice opens that file it saved.

 - - - Importing the .XSLX - - -

Now that we see there are serious asymmetries in how OpenFormula (recorded in the ODS) INDEX (and full row-/column selection) is interpreted by three different products, where some products can't even read what they have written, and AOO can't read any of it, let's go back to the import of .XSLX.

On opening of the J.Ty's original XSLX file in LibreOffice, LibreOffice 4.3.5.2 *presents* mangled formulas for A4:A6, showing "=INDEX($A1:$AMJ1INDEX($A2:$AMJ2))", "=INDEX($A2:$AMJ2)", and "=INDEX($A1:$AMJ1A5)".  The cell values are shown as "Err:508", "2", and "Err:509".  It is cleared that the presented formulas are a defective rewriting of the row references that corrupts the formula as presented.  On saving of that file to a .ODS from LibreOffice, the A4:A6 formulas as saved are "of:=INDEX([.$A1:.$AMJ1]INDEX([.$A2:.$AMJ2]))", "of:=INDEX([.$A2:.$AMJ2])", and "of:=INDEX([.$A1:.$AMJ1][.A5])".  Excel 2013 reports that file as corrupted.

On opening of the J.Ty's original XSLX file in Apache OpenOffice 4.1.1, the A4:A6 values are "Err:508", 3, and "Err:509" but the *presented* formulas are the same as presented by LibreOffice.  When the opened file is saved to a .ODS file by AOO, the formulas in the file are the same as those from LibreOffice also.  Excel 2013 reports that file as corrupted.

The defect is confirmed.  There appear to be multiple problems involved.
Comment 3 Oliver Brinzing 2015-06-24 18:48:42 UTC
.