A high-volume, latency-sensitive application shows that approximately 0.8% of cpu is spent executing javax.el.CompositeELResolver.convertToType. Of this, > 95% is spent locally within the method, and the remainder is spent within javax.el.ELResolver.convertToType. The source code for CompositeELResolver.convertToType is: @Override public Object convertToType(ELContext context, Object obj, Class<?> type) { context.setPropertyResolved(false); int sz = this.size; for (int i = 0; i < sz; i++) { Object result = this.resolvers[i].convertToType(context, obj, type); if (context.isPropertyResolved()) { return result; } } return null; } The source code for ELResolver.convertToType is: public final Object convertToType(ELContext context, Object obj, Class<?> type) { context.setPropertyResolved(false); return null; } The high cost of CompositeELResolver.convertToType is because the loop uses vtable lookup for method invocations (INVOKEVIRTUAL); the calls cannot be inlined because ELResolver.convertToType can be overridden by any of approximately 15 implementations. However, within the Tomcat codebase, the only implementation that actually DOES anything is this CompositeELResolver. In effect, we're paying method invocation overhead that is guaranteed to do nothing except, perhaps, recursively do more nothing. Recognizing the potential need for custom implementations of ELResolvert.convertToType, we may want to maintain the current method but strip out the "known" implementations, such as those defined in ELContextImpl.getDefaultResolver(). This would allow the JIT compiler to optimize away loops over arrays of length zero, one, or two while still accommodating applications that directly call CompositeResolver.add(). Note that the array CompositeELResolver.resolvers is also used in other methods (for better reasons) and this may require a distinct array for convertToType. ELResolver.convertToType is relatively cheap, but is called so often that the aggregate is detectable by our tools. While the task of setting false to false is truly redundant, the changes suggested above are sufficient to eliminate the penalty with no further effort. Performance tests with a zero-length array show a 95% reduction in runtime; removing the code completely shows a 100% reduction in runtime.
Fixed in: - 11.0.x for 11.0.0-M15 onwards - 10.1.x for 10.1.17 onwards - 9.0.x for 9.0.84 onwards - 8.5.x for 8.5.97 onwards Some simple tests show similar (~20x) improvement but any additional feedback (it would mean building from source) would be welcome. I'm assuming you are comfortable building from source but if there are any difficulties, let me know and I can provide a binary for testing.
Thanks, I was indeed able to build from source, and 9.84 shows a *dramatic* decrease in latency under high cpu. The data is from a low-quality test in the development environment but I'm quite happy. Will update with prod data when we get it deployed, probably in January. Heap dump confirms that the array size is zero.
This optimization was effective in production and reduced the method cost by approximately 2/3rds, saving more than 0.5% of cpu. The remaining time comes from another invokevirtual in the method which I overlooked. I've opened a separate issue to address that. https://bz.apache.org/bugzilla/show_bug.cgi?id=68596