Summary: | Performance regression on addition of merged regions | ||
---|---|---|---|
Product: | POI | Reporter: | David North <dtn-asfbugs> |
Component: | XSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux | ||
Bug Depends on: | |||
Bug Blocks: | 59212 |
Description
David North
2016-01-18 14:43:47 UTC
Would Sheet.addMergedRegion(CellRangeAddress region, boolean checkForOverlapping) (not implemented) solve this problem? Adding a merged region *safely* (without creating a potentially corrupted workbook) is inherently a O(n) operation, as it need to check all merged regions in the sheet for potential overlap. A merged region could be added unsafely (potential workbook corruption), but this should not be the default behavior. If adding multiple merged regions, it still requires O(n) time for each merged region addition. The checking could be deferred until a later time before saving the workbook, but wouldn't save any CPU cycles unless merged regions were added and removed between opening and saving the workbook. > which showed up as quite a sharp performance regression in one of our apps Could I get a rough number of merged regions in a sheet so I could profile this? I don't think there's much we can do to make the code faster, so we'll probably need to provide a method that bypasses validation. To make it crystal clear that using the bypassed-validation option can produce a corrupt workbook, I was thinking of calling it public int Sheet.addMergedRegionUnsafe(CellRangeAddress region), as the signature from comment 1 does not stress the consequences of checkForOverlapping=False. Thanks Javen. I agree blindly adding merged regions without the guards is an inherently unsafe operation, but in our case we're authoring the workbook from scratch in one run, so I think it's reasonable to give us API to say "we're confident we're adding non-overlapping regions". In terms of API, we could perhaps call the alternative method addMergedRegionUnsafe(CellRangeAddress region) or use an enum as an alternative to the boolean. The number of merged regions on the sheet in question is 23,002 - which I'll admit is quite a lot! Fixed in r1727776. Use XSSFSheet.addMergedRegionsUnsafe(CellRangeAddress) if you want to add a merged region without incurring the cost of checking for potential intersecting regions. XSSFSheet.validateMergedRegions can be called after the fact if you want to defer the validation. Updated changelog in r1727779. |