Apache OpenOffice (AOO) Bugzilla – Issue 128552
oox::xls::SheetDataBuffer::setCellFormat() takes over 3 hours to load XSLX spreadsheet
Last modified: 2023-01-07 04:35:00 UTC
A large XSLX spreadsheet takes over 3 hours to load. The time appears to be spent in the method oox::xls::SheetDataBuffer::setCellFormat() of main/oox/source/xls/sheetdatabuffer.cxx, doing linear searches on the std::map maXfIdRanges. Since there are 100000+ rows of cells, and the linear search over ranges is done per cell, we have an O(n*m) algorithmic complexity, resulting in terrible performance. We really need to improve the data structure and algorithm. Perhaps the ranges should be stored in some kind of spatial index, like a R-tree, so the cell can rapidly find the range it fits in without having to traverse them all.
When I commented out the contents of oox::xls::SheetDataBuffer::setCellFormat(), the spreadsheet loaded in only 8 minutes, although obviously without any formatting.