Summary: | Quadratic complexity: org.apache.poi.POIXMLDocumentPart#findExistingRelation(POIXMLDocumentPart part) | ||
---|---|---|---|
Product: | POI | Reporter: | Tim Helmstedt <tim.helmstedt> |
Component: | XSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | ||
Priority: | P2 | ||
Version: | 3.14-FINAL | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: | Test Case |
Fixed in https://github.com/apache/poi/pull/38 not sure the usual way to get commit access. Thanks for finding this performance inefficiency and submitting a patch. Applied in r1765935. These changes will be included in 3.16 beta 1. Pull requests are fine.[1] [1] https://poi.apache.org/guidelines.html#SubmittingPatches |
Created attachment 34395 [details] Test Case This was found through adding 2000 images to a sheet via org.apache.poi.xssf.usermodel.XSSFWorkbook#addPicture(byte[], int). This isn't the only path to findExistingRelation so there's likely more ways to hit this issue. Attached is a test case which exercises this path. The source of the quadratic slowdown comes from the defensive copy we do in org.apache.poi.openxml4j.opc.PackagePart#getRelationships. In findExistingRelation we iterate over this copy, and for each element we end up incurring another defensive copy via getRelatedPart->isRelationshipExists. I can't see a clear reason why org.apache.poi.openxml4j.opc.PackagePart#isRelationshipExists should create a copy of _relationships, since we're not leaking them - just returning a boolean. Using the field directly solves my issue and won't have any functional impact as we're not filtering at this stage.