Created attachment 31802 [details] patch to add a map from name to sheet XSSFWorkbook.getSheet(String) showed up as a hot-spot in profiling, as the project I'm on calls it very often to resolve cross-sheet formula references. Currently it simply walks the list of sheets looking for one with a matching name. The attached patch adds a map from name to sheet, which reduces the time our operation spends in this method by about 80%.
Looking at the patch, we already have a list of XSSFSheet objects, and this adds a map too by name. I can't help but wonder if we couldn't have a special kind of sorted map that provides both the old "give me the sheets" of the list plus the new "lookup by name" It's possible that my approach would be more complex or slower though, even if it would mean only one object to update instead of two Thoughts?
Having just looked at this again: the map is a LinkedHashMap and most of the required operations are possible using only the map with no worse efficiency than before (walk the map's values() which is no worse than walking the previous List). Unfortunately it all falls down at setSheetOrder(sheetname, pos). With LinkedHashMap it would have to remove and reinsert every sheet between its desired position and the end. You would need a kind of LinkedHashMap with the ability to reorder its internal list. I don't know of one and I'm not sure adding one and maintaining it would be an improvement over the patch as it stands.
I'm not sure if this is a big problem in almost all other use cases of POI as usually there are only a few sheets and getSheet() is not called that often as well. So by optimizing this one specific case we would actually add a bit more to some other methods and also make the code more complex and harder to maintain. How many sheets do you have in those workbooks? And how often do you actually call it? I wouldn't optimize this inside POI if it is simply called much too often in your case. Would it maybe be better to solve your specific case here with a small cache in your code which remembers these sheet-name to index mapping outside of POI core code and then uses getSheetAt() to have quick access?
Further review at our end suggests this optimization is no longer necessary, so the simplest thing to do is WONTFIX this one for now. I had pondered a TreeSet or some other ordered set implementation as a single point of truth for the sheets in a workbook, but it can wait for concrete requirements.