Created attachment 36475 [details] XSSFFont.java revised to speed hashCode, toString, and equals by ~1000 I recently discovered that when writing Excel workbooks, 75% of my CPU time was occupied by the method XSSFFont.equals. I worked around this by the following: 1) Add instance fields _hashCode and _text to XSSFFont, initially set to null. 2) When either hashCode() or toString() are called and _hashCode==null, then both _hashCode and _text are calculated and saved, and the value of _hashCode or _text are returned. 3) Set the value of each of these fields to null whenever a setter (or getCTFont) is called. 3) When equals is called, it uses the saved values of _hashCode and _text. The result is significant speed improvement of more than x1000. The following method calculates the time required to calculate font.equals(font), and the time required to calculate font.oldEquals(font) (where oldEquals is the original equals method). The results on my iMac are equals: timePerCall=139 ns oldEquals: timePerCall=216113 ns The test code is below; my revised XSSFFont.java source is attached. private static void testXSSFFontEquals(XSSFWorkbook workbook) { final int numTests = 1000; final XSSFFont font = workbook.getFontAt(0); long start = System.nanoTime(); for (int n = 0; n < numTests; ++n) { font.equals(font); } long end = System.nanoTime(); System.out.println(String.format("\r\nequals: timePerCall=%d ns", (end - start) / numTests)); start = System.nanoTime(); for (int n = 0; n < numTests; ++n) { font.oldEquals(font); } end = System.nanoTime(); System.out.println(String.format("oldEquals: timePerCall=%d ns", (end - start) / numTests)); }
Comment on attachment 36475 [details] XSSFFont.java revised to speed hashCode, toString, and equals by ~1000 Wouldn't it be better to have: private void updateHashCode() { _text = _ctFont.toString(); _hashCode = _text.hashCode(); }
I hang my head in shame; yes, of course that would be better. The whole point of this suggestion is to avoid repeating the same calculation over and over. I have made the change in my code.
Thanks David. I'm still a little bit worried that the old code and the new code are a little flawed in that the hashCode logic is meant to match the equals logic, ie if you change state that means the equals state will change should mean that the hashCode changes.
PJ, Thank you for taking the time to look at this. The only time that the behavior of equals() changes is when the value of _text changes. The only time that _text changes is when updateHashCode() is called, and the only time that happens is when hashCode() or toString() are called after there is a change of state. Thus, the only time that the behavior of equals() changes is when _hashCode has been recalculated, and the only time that _hashCode is recalculated is after a change of state. Does that address your concern?
I found a bug in my proposed code for XSSFFont.equals. If hashCode() had not been called before calling equals(), a NullPointException would have been thrown. The code below replaces the reference to _hashCode with a call to hashCode() @Override public boolean equals(Object o) { if (this == o) { return true; } if (!(o instanceof XSSFFont)) { return false; } //if (_hashCode != o.hashCode()) { if (hashCode() != o.hashCode()) { return false; } XSSFFont cf = (XSSFFont) o; return toString().equals(cf.toString()); }
Unfortunately your changed file contains many unrelated code-style changes, are you able to produce a patch with only the changes that are necessary?
Created attachment 36483 [details] XSSFFont.java with code-style changes reverted This is the same as the original patch, but with the code style changes reverted and one bug fixed. • Every tab character in the file has been replaced with 4 spaces • The references to _hashCode in equals() has been replaced by calls to hashCode()
Sorry, but this is still not better, there are still many changes, it seems long-lines are cut, e.g. * By default, Microsoft Office Excel 2007 uses the Calibri font in font size 11 becomes * By default, Microsoft Office Excel 2007 uses the Calibri font in font * size 11 in your file. Also things like * @param font CTFont become * @param font * CTFont These make reviewing the patch hard. Maybe you can adjust your IDE settings to not reformat the code and then send the changes as a minimal patch, see http://poi.apache.org/devel/guidelines.html#SubmittingPatches for how you can create those.
Created attachment 36487 [details] Patch directly from version 4.0.1 This version of the patch was generated as follows: 1) Download poi source into Eclipse using git. 2) Create a branch at tag REL_4_0_1 3) Make changes in XSSFFont.java 4) Commit changes I have compared the code to tag REL_4_0_1, and verified that the only changes are those associated with the bug fix.
The attachment is not a patch file - it is a very out of date java file. I also wonder why we need to optimise the equals and hashCode. I ran some of the XSSFWorkbook unit tests and found no evidence of repeated calls to XSSFFont.equals. I think it would be better to track down the use case where the excessive calls to equals are made and to try to avoid repeated equals calls if possible. In theory, if we have a use case for calling equals a lot on our XSSF classes, then we have a potential issue on many other XSSF classes.