The XMLBeans recommendation is that when accessing long XML documents, or when performance matters, one should use XmlCursor and not getXXXList(). In order to provide a short-term solution, I propose we sweep the code and replace all usages of getXXXList() with getXXXArray() - which is much faster since it creates a snapshot of the xml data instead of a live view. This will be trivial for read-only operations (e.g. bug #56556) and easy for write operations – just read the array, work on it or create a new one, and set the result back to the proxy. Also, some attempts to minimize this – using toArray() on the returned list, can be reduced to getXXXArray(), since toArray() iterates over the elements, which is very inefficient in the XmlBeans impl. We should decide how to document the new (and old) usages of getXXXArray(), which are deprecated (and probably will be as long as the XMLBeans project is dead) - either by ignoring it altogether, or by adding @SuppressWarnings("deprecation") on all usages (thus always using a variable) optionally with a comment describing why the deprecated method is used (e.g. a link to this bug). In addition, other methods (e.g. CTCol.getMax()) are sometimes called several times in the same scope when their result is not changing, which is another redundant piece of XMLBeans code being called.
Created attachment 31921 [details] 56854-1 This is a first sweep of proxy method usages, which includes mostly usage of array-based proxy methods instead of list-based when appropriate, and other repeat calls of methods inlined. test-all runtime dropped from ~80 seconds to ~60 on my machine, mostly due to the changes in XSSF and ColumnHelper.
Thank you for the patch - applied with r1620997 Most changes were trivial and the tests for ColumnHelper were successful, so should be ok. Added a few deprecation-ignores. for the time being, I'll leave the bug entry open, waiting for the next diff.
Created attachment 32020 [details] 56854-2 Additional optimizations, mainly in XSSFSheet
Thank you for the second patch - applied with r1624922 XSSFSheet.removeMergedRegion(int) looked too complicated, I haven't seen a reason, why not to use the default remove-element-by-index impl.
You're right - sometimes it takes development to visit the "complicated" station when travelling from "a mess" to "elegant" :-) Note: the patch for XSSFSheet.removeMergedRegion/s is what I originally done in https://issues.apache.org/bugzilla/attachment.cgi?id=31960&action=diff for issue 55280 (https://issues.apache.org/bugzilla/show_bug.cgi?id=55280) (In reply to Andreas Beeker from comment #4) > Thank you for the second patch - applied with r1624922 > XSSFSheet.removeMergedRegion(int) looked too complicated, I haven't seen a > reason, why not to use the default remove-element-by-index impl.
Thanks for linking the other bug - I've already had a quick look yesterday where these XSSFSheet.removeMergedRegion(s) methods came from ... :) I haven't setup a profile environment for the test cases, yet (... the builds are now faster, when looking at builds.apache.org, but this only a very rough estimation ...) Yaniv, in case you already have it in place (and also have some old stats to compare to), it would be good, if you also validate/profile the derivation from your patch - sometimes it's hard to decide if something is just refurbished old code or if it's like this on purpose.
It seems the things discussed here are mostly done and applied since some time already, I think it best to open new bugs for any additional optimizations that are proposed.