68119 – Significant overhead in javax.el.CompositeELResolver.convertToType

Bug 68119 - Significant overhead in javax.el.CompositeELResolver.convertToType

Summary: Significant overhead in javax.el.CompositeELResolver.convertToType

Status:	RESOLVED FIXED

Alias:	None

Product:	Tomcat 9
Classification:	Unclassified
Component:	EL (show other bugs)
Version:	9.0.x
Hardware:	All All

Importance:	P2 normal (vote)
Target Milestone:	-----
Assignee:	Tomcat Developers Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-11-09 18:20 UTC by John Engebretson
Modified:	2024-02-07 17:24 UTC (History)
CC List:	1 user (show)

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John Engebretson 2023-11-09 18:20:59 UTC

A high-volume, latency-sensitive application shows that approximately 0.8% of cpu is spent executing javax.el.CompositeELResolver.convertToType.  Of this, > 95% is spent locally within the method, and the remainder is spent within javax.el.ELResolver.convertToType.

The source code for CompositeELResolver.convertToType is:


    @Override
    public Object convertToType(ELContext context, Object obj, Class<?> type) {
        context.setPropertyResolved(false);
        int sz = this.size;
        for (int i = 0; i < sz; i++) {
            Object result = this.resolvers[i].convertToType(context, obj, type);
            if (context.isPropertyResolved()) {
                return result;
            }
        }
        return null;
    }

The source code for ELResolver.convertToType is:

    public final Object convertToType(ELContext context, Object obj, Class<?> type) {
        context.setPropertyResolved(false);
        return null;
    }

The high cost of CompositeELResolver.convertToType is because the loop uses vtable lookup for method invocations (INVOKEVIRTUAL); the calls cannot be inlined because ELResolver.convertToType can be overridden by any of approximately 15 implementations.

However, within the Tomcat codebase, the only implementation that actually DOES anything is this CompositeELResolver.  In effect, we're paying method invocation overhead that is guaranteed to do nothing except, perhaps, recursively do more nothing.

Recognizing the potential need for custom implementations of ELResolvert.convertToType, we may want to maintain the current method but strip out the "known" implementations, such as those defined in ELContextImpl.getDefaultResolver().  This would allow the JIT compiler to optimize away loops over arrays of length zero, one, or two while still accommodating applications that directly call CompositeResolver.add().  

Note that the array CompositeELResolver.resolvers is also used in other methods (for better reasons) and this may require a distinct array for convertToType.


ELResolver.convertToType is relatively cheap, but is called so often that the aggregate is detectable by our tools.  While the task of setting false to false is truly redundant, the changes suggested above are sufficient to eliminate the penalty with no further effort.

Performance tests with a zero-length array show a 95% reduction in runtime; removing the code completely shows a 100% reduction in runtime.

Comment 1 Mark Thomas 2023-11-27 18:45:17 UTC

Fixed in:
- 11.0.x for 11.0.0-M15 onwards
- 10.1.x for 10.1.17 onwards
-  9.0.x for  9.0.84 onwards
-  8.5.x for  8.5.97 onwards

Some simple tests show similar (~20x) improvement but any additional feedback (it would mean building from source) would be welcome.

I'm assuming you are comfortable building from source but if there are any difficulties, let me know and I can provide a binary for testing.

Comment 2 John Engebretson 2023-11-28 15:33:46 UTC

Thanks, I was indeed able to build from source, and 9.84 shows a *dramatic* decrease in latency under high cpu.  The data is from a low-quality test in the development environment but I'm quite happy.  Will update with prod data when we get it deployed, probably in January.

Heap dump confirms that the array size is zero.

Comment 3 John Engebretson 2024-02-07 17:24:55 UTC

This optimization was effective in production and reduced the method cost by approximately 2/3rds, saving more than 0.5% of cpu.

The remaining time comes from another invokevirtual in the method which I overlooked.  I've opened a separate issue to address that.  https://bz.apache.org/bugzilla/show_bug.cgi?id=68596