Issue 123429

Summary: Calc is unacceptably slow in opening the attached XLSX file (Cornell)
Product: Calc Reporter: Andrea Pescetti <pescetti>
Component: open-importAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Normal    
Priority: P3 CC: Armin.Le.Grand, boltthrower, issues, rainerbielefeld_ooo_qa, rb.henschel, villeroy
Version: 3.4.0Keywords: performance, regression
Target Milestone: ---   
Hardware: All   
OS: All   
See Also: https://issues.apache.org/ooo/show_bug.cgi?id=123919
Issue Type: DEFECT Latest Confirmation in: 4.0.1
Developer Difficulty: ---
Attachments:
Description Flags
another testcase none

Description Andrea Pescetti 2013-10-05 20:56:34 UTC
The XLSX file available at
http://www.birds.cornell.edu/clementschecklist/download/
takes extremely long to open with OpenOffice 4.0.1.

Some users report that OpenOffice seems to freeze, Rob and me managed to open the file but we had to wait about 15 minutes with the progress bar stuck at about 25% (and no particular CPU/RAM usage).

I'll make a copy available for reference.

3.4.1 has a similar behavior.

The issue is discussed here:
http://comments.gmane.org/gmane.comp.apache.openoffice.user/2246
Comment 1 Andrea Pescetti 2013-10-05 20:58:37 UTC
A copy of the document (just in case the original one is replaced) is available at
http://people.apache.org/~pescetti/tmp/2013-10-i123429/
Comment 2 Andreas S├Ąger 2013-10-05 23:42:50 UTC
This "spreadsheet" as a first sheet with 107,817 format ranges in 75 unique format ranges. After removing the formatting overkill the file loads at acceptable speed.
Nothing to bother about.
Comment 3 Armin Le Grand 2013-10-11 11:40:13 UTC
ALG: Checked, indeed it loads after a loooong time. This would need to be measured to see if (and what) could be optimized and where the time is spent.
Comment 4 Armin Le Grand 2013-10-11 14:11:28 UTC
ALG: Looks as if the 1st part of the load the 69248 strings get read; the long part after this is actually adding them to the sheet. There seems to be a lot of processing involved, to get from 'RichStrings' to cell contents. I do not know much about it (yet)...
Comment 5 giuseppe d'ambrosio 2013-10-23 10:29:04 UTC
Created attachment 81800 [details]
another testcase

1 sheet, 9 rows x 11 cols: opening with excel 2010 takes 0.3 sec.,
while OO takes ~10min.
Comment 6 Regina Henschel 2013-10-23 11:31:02 UTC
The document does not only contain the rows with real content, but thousands of rows of the kind
<row r="15" spans="1:11" x14ac:dyDescent="0.25" ><c r="A15" s="4" /></row>
Comment 7 Regina Henschel 2013-10-23 13:50:26 UTC
Following Microsoft's tip in http://msdn.microsoft.com/en-us/library/ff726673%28v=office.14%29.aspx#xlMinUsedRange I can see, that the last used cell is K65536 in the attached document.
Comment 8 Andrea Pescetti 2013-12-26 11:39:06 UTC
Comment on attachment 81800 [details]
another testcase

I moved the "BG8" file and related discussion (comments #5-#6-#7) to https://issues.apache.org/ooo/show_bug.cgi?id=123919 since it's unclear whether the root cause is the same.

Let's keep this issue open for the "Cornell" file only.
Comment 9 Andrea Pescetti 2013-12-26 11:49:40 UTC
Note that 3.3.0 opens the "Cornell" file (Clements Checklist) in <30 seconds, which is acceptable considering the size, and much different than the 15+ minutes needed on 4.x. Regression.
Comment 10 Rainer Bielefeld 2014-04-18 07:34:55 UTC
Additional Info:
----------------
(a) time for opening the document in WIN 7
(a1) Gnumeric 1.12.9:      5s
(a2) LibO 4.2:             4s
(a3) MS Excel Viewer:      4s
(a4) Kingsoft:             4s
(a5) Softmaker FreeOffice: 5s
(b) Already a problem with AOO 3.4.0, I killed the process after 4 minutes