Bug 63234 (is, slow, XSSFont.equals) - XSSFFont.equals, hashCode very slow; revised code presented
Summary: XSSFFont.equals, hashCode very slow; revised code presented
Status: NEW
Alias: is, slow, XSSFont.equals
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 4.0.x-dev
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-05 20:33 UTC by David Gauntt
Modified: 2021-10-10 10:59 UTC (History)
0 users



Attachments
XSSFFont.java revised to speed hashCode, toString, and equals by ~1000 (20.38 KB, text/plain)
2019-03-05 20:33 UTC, David Gauntt
Details
XSSFFont.java with code-style changes reverted (22.42 KB, patch)
2019-03-12 21:28 UTC, David Gauntt
Details | Diff
Patch directly from version 4.0.1 (23.67 KB, patch)
2019-03-13 22:31 UTC, David Gauntt
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Gauntt 2019-03-05 20:33:35 UTC
Created attachment 36475 [details]
XSSFFont.java revised to speed hashCode, toString, and equals by ~1000

I recently discovered that when writing Excel workbooks, 75% of my CPU time was occupied by the method XSSFFont.equals.

I worked around this by the following:
1) Add instance fields _hashCode and _text to XSSFFont, initially set to null.
2) When either hashCode() or toString() are called and _hashCode==null, then both _hashCode and _text are calculated and saved, and the value of _hashCode or _text are returned.
3) Set the value of each of these fields to null whenever a setter (or getCTFont) is called.
3) When equals is called, it uses the saved values of _hashCode and _text.

The result is significant speed improvement of more than x1000.  The following method calculates the time required to calculate font.equals(font), and the time required to calculate font.oldEquals(font) (where oldEquals is the original equals method).  The results on my iMac are

equals: timePerCall=139 ns
oldEquals: timePerCall=216113 ns

The test code is below; my revised XSSFFont.java source is attached.

	private static void testXSSFFontEquals(XSSFWorkbook workbook) {
		final int numTests = 1000;
		final XSSFFont font = workbook.getFontAt(0);

		long start = System.nanoTime();
		for (int n = 0; n < numTests; ++n) {
			font.equals(font);
		}
		long end = System.nanoTime();

		System.out.println(String.format("\r\nequals: timePerCall=%d ns", (end - start) / numTests));

		start = System.nanoTime();
		for (int n = 0; n < numTests; ++n) {
			font.oldEquals(font);
		}
		end = System.nanoTime();
		System.out.println(String.format("oldEquals: timePerCall=%d ns", (end - start) / numTests));

	}
Comment 1 PJ Fanning 2019-03-05 22:23:02 UTC
Comment on attachment 36475 [details]
XSSFFont.java revised to speed hashCode, toString, and equals by ~1000

Wouldn't it be better to have:


	private void updateHashCode() {
		_text = _ctFont.toString();
		_hashCode = _text.hashCode();
	}
Comment 2 David Gauntt 2019-03-05 22:36:20 UTC
I hang my head in shame; yes, of course that would be better.  The whole point of this suggestion is to avoid repeating the same calculation over and over.

I have made the change in my code.
Comment 3 PJ Fanning 2019-03-06 10:32:35 UTC
Thanks David. I'm still a little bit worried that the old code and the new code are a little flawed in that the hashCode logic is meant to match the equals logic, ie if you change state that means the equals state will change should mean that the hashCode changes.
Comment 4 David Gauntt 2019-03-06 15:07:45 UTC
PJ,

Thank you for taking the time to look at this.

The only time that the behavior of equals() changes is when the value of _text changes.  The only time that _text changes is when updateHashCode() is called, and the only time that happens is when hashCode() or toString() are called after there is a change of state.

Thus, the only time that the behavior of equals() changes is when _hashCode has been recalculated, and the only time that _hashCode is recalculated is after a change of state.

Does that address your concern?
Comment 5 David Gauntt 2019-03-08 15:55:14 UTC
I found a bug in my proposed code for XSSFFont.equals.  If hashCode() had not been called before calling equals(), a NullPointException would have been thrown.

The code below replaces the reference to _hashCode with a call to hashCode()

	@Override
	public boolean equals(Object o) {
		if (this == o) {
			return true;
		}
		if (!(o instanceof XSSFFont)) {
			return false;
		}

		//if (_hashCode != o.hashCode()) {

		if (hashCode() != o.hashCode()) {
			return false;
		}

		XSSFFont cf = (XSSFFont) o;
		return toString().equals(cf.toString());
	}
Comment 6 Dominik Stadler 2019-03-10 09:27:20 UTC
Unfortunately your changed file contains many unrelated code-style changes, are you able to produce a patch with only the changes that are necessary?
Comment 7 David Gauntt 2019-03-12 21:28:17 UTC
Created attachment 36483 [details]
XSSFFont.java with code-style changes reverted

This is the same as the original patch, but with the code style changes reverted and one bug fixed.

• Every tab character in the file has been replaced with 4 spaces

• The references to _hashCode in equals() has been replaced by calls to hashCode()
Comment 8 Dominik Stadler 2019-03-13 21:54:09 UTC
Sorry, but this is still not better, there are still many changes, it seems long-lines are cut, e.g.

     * By default, Microsoft Office Excel 2007 uses the Calibri font in font size 11

becomes 

     * By default, Microsoft Office Excel 2007 uses the Calibri font in font
     * size 11

in your file. 

Also things like

     * @param font CTFont

become 

     * @param font
     *            CTFont


These make reviewing the patch hard. 

Maybe you can adjust your IDE settings to not reformat the code and then send the changes as a minimal patch, see http://poi.apache.org/devel/guidelines.html#SubmittingPatches for how you can create those.
Comment 9 David Gauntt 2019-03-13 22:31:00 UTC
Created attachment 36487 [details]
Patch directly from version 4.0.1

This version of the patch was generated as follows:

1) Download poi source into Eclipse using git.
2) Create a branch at tag REL_4_0_1
3) Make changes in XSSFFont.java
4) Commit changes

I have compared the code to tag REL_4_0_1, and verified that the only changes are those associated with the bug fix.
Comment 10 PJ Fanning 2021-10-10 10:59:34 UTC
The attachment is not a patch file - it is a very out of date java file.

I also wonder why we need to optimise the equals and hashCode. I ran some of the XSSFWorkbook unit tests and found no evidence of repeated calls to XSSFFont.equals.

I think it would be better to track down the use case where the excessive calls to equals are made and to try to avoid repeated equals calls if possible.

In theory, if we have a use case for calling equals a lot on our XSSF classes, then we have a potential issue on many other XSSF classes.