Issue 128552 - oox::xls::SheetDataBuffer::setCellFormat() takes over 3 hours to load XSLX spreadsheet
Summary: oox::xls::SheetDataBuffer::setCellFormat() takes over 3 hours to load XSLX sp...
Status: CONFIRMED
Alias: None
Product: Calc
Classification: Application
Component: open-import (show other issues)
Version: 4.2.0-dev
Hardware: All All
: P5 (lowest) Normal (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: interop_OOXML, performance
Depends on:
Blocks:
 
Reported: 2023-01-07 04:33 UTC by damjan
Modified: 2023-01-07 04:35 UTC (History)
0 users

See Also:
Issue Type: DEFECT
Latest Confirmation in: 4.2.0-dev
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description damjan 2023-01-07 04:33:09 UTC
A large XSLX spreadsheet takes over 3 hours to load. The time appears to be spent in the method oox::xls::SheetDataBuffer::setCellFormat() of main/oox/source/xls/sheetdatabuffer.cxx, doing linear searches on the std::map maXfIdRanges.

Since there are 100000+ rows of cells, and the linear search over ranges is done per cell, we have an O(n*m) algorithmic complexity, resulting in terrible performance.

We really need to improve the data structure and algorithm. Perhaps the ranges should be stored in some kind of spatial index, like a R-tree, so the cell can rapidly find the range it fits in without having to traverse them all.
Comment 1 damjan 2023-01-07 04:35:00 UTC
When I commented out the contents of oox::xls::SheetDataBuffer::setCellFormat(), the spreadsheet loaded in only 8 minutes, although obviously without any formatting.