Summary: | createName is slow when there are many Names (10000) | ||
---|---|---|---|
Product: | POI | Reporter: | GuoHui Chen <cgh_chen> |
Component: | XSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED DUPLICATE | ||
Severity: | enhancement | ||
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All |
Description
GuoHui Chen
2016-05-06 03:29:43 UTC
Adding a name requires checking the name manager for existing names to avoid defining the same name at the same scope (my guess is this would result in a corrupt workbook). POI uses a naïve implementation, shown in comment 0, which requires O(n) time for a naïve implementation. We could perform this check in O(1) using a hash table with a hashable tuple (scope, name) as the key. A less elegant, inferior solution that runs in O(1) uses nested hash tables: the first layer having scope (sheet name or global) keys and the second layer having name keys (inner and outer key could swapped). This comes at the cost of higher memory consumption and increasing the complexity of the code (and therefore higher chance for bugs). Given the code from comment 0, I'm not surprised that adding N names is slow, as it is performing O(N²) operations. Here's what you could do: 1) Provide a patch with a speed-optimized implementation with 100% test coverage. 2) Provide a patch with a non-validating version of setNameName (probably called setNameNameUnsafe) [1] 3) access the CT* classes yourself, either with introspection, subclassing, or forking POI, which gives you direct access to the CTName data structure. This would complicate upgrading POI in the future. [1] Relevant discussion on dev@poi mailing list http://apache-poi.1045710.n5.nabble.com/Preventing-corrupt-workbooks-td5722973.html class NameManagerKeyTuple() { public final String scope, name; public constructor NameManagerKeyTuple(String scope, String name) { this.scope = scope; // how will you represent global scope? Null?, How will you update this data structure if a sheet is renamed or re-ordered? Keeping this data structure in sync with sheet renames is the biggest hurdle. this.name = name; // and assert not null and convert to canonical case a la String.toLower(Locale.ROOT) } @Override public int hashCode() { return scope.hashCode() + name.hashCode(); // whatever you use for global scope, make sure it has a consistent hash code. Null requires null check to prevent NPE. } @Override public interest equals(Object other) { if (!(other instanceof NameManagerKeyTuple)) { return false; } NameManagerKeyTuple o = (NameManagerKeyTuple) other; return scope.equals(o.scope) && name.equals(o.name); // again, check for NPE if values could be null; } } NameManager private final Map<NameManagerKeyTuple, CTName> names = new HashMap<>(); Moved getNumberOfNames outside of the loop in r1748832. Using a StringBuilder as suggested in comment 0 provides no performance benefit. |