Bug 64872 - Inefficient enum resolution in JSPs
Summary: Inefficient enum resolution in JSPs
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 10
Classification: Unclassified
Component: Jasper (show other bugs)
Version: unspecified
Hardware: All Linux
: P2 enhancement (vote)
Target Milestone: ------
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-04 19:48 UTC by John Engebretson
Modified: 2021-01-22 17:22 UTC (History)
0 users



Attachments
Extract from Tomcat8 optimizations (5.21 KB, text/plain)
2020-11-12 20:17 UTC, John Engebretson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John Engebretson 2020-11-04 19:48:08 UTC
Our large public-facing application uses many custom JSP tags with enum properties, for example:

<my:tag enumProperty="${'hotFudge'}"/>

The tag implements MyTag contains a corresponding property:

 public void setEnumProperty(MyEnum enumProperty)

and the enum looks like:

public enum MyEnum {
  hotFudge, whippedCream, cherries;
}

The compiled JSP performs the full resolution of 'hotFudge' -> MyEnum.hotFudge on *every* request, even though the JSP compiler has enough information to directly reference the enum.  This results in a significant amount of needless work; profiling suggests it is around 10% of our JSP processing time.

Note that the same optimization applies when the datatype is a hard-coded String (like above) or the datatype is boolean/Boolean.

Replacing these evaluations with direct references results in:
1. Lower cpu from not re-processing
2. Lower heap usage because the expression caches are smaller
3. Lower object creation rates because fewer helper objects are created
4. Possible error catching during JSP compilation: if someone refers to a non-existent enum, that can be caught immediately.
Comment 1 Mark Thomas 2020-11-09 08:21:42 UTC
Moving this to an enhancement.

Section JSP.2.9 of the JSP specification sets out how references such as "hotFudge" are to be resolved. Enums are resolved at step 4 (static fields). For the optimisation to be valid, the compiler needs to be sure that "hotFudge" will not be resolved in steps 1, 2 or 3.

Step 1 is simple to eliminate. It is the ImplicitObjectELResolver as long as the name does not match one of the implicit objects then the compiler knows in advance that this step won't resolve the reference.

Step 2 is potentially problematic. For on-request compilation the compiler can check whether any resolvers have been added and - possibly - whether one of those resolvers would resolve the reference or not. Pre-compilation is more difficult as the compiler cannot know in advance if a resolver is going to be added via JspApplicationContext.addELResolver(ELResolver) as resolvers may be added by components of which the JSP compiler is not aware.

Step 3 is simple to eliminate as, for a single attribute, the Stream EL resolver will not be used.

What is the proposed solution for step 2?
Comment 2 John Engebretson 2020-11-09 16:01:16 UTC
I think you're right that the letter of step #2 proscribes this change.  My subsequent text in no way attempts to undermine or change that statement.

In the vast majority of cases, when a JSP author uses the value "${true}" and the property type is Boolean, the author intends Boolean.TRUE.  A case that violates that would be closer to a bug than desired behavior.

In the vast majority of cases, when a JSP author refers to an enum by implicit name using the value "${'hotFudge'}" and such an enum exists, the author intends to use that.  I can conceive of a situation when centralized code wants to override all values of 'hotFudge' but that is rare at best, and alternative solutions exist.

In the vast majority of cases, when a JSP author uses a constant string "${'constant'}" and the targeted property is a String, they intend the property to contain the string "constant".  Here again I can conceive of a desire to override all strings with the value "constant" but this would also be rare, and alternative solutions exist.

Most applications would choose significant efficiency improvements over the loss of these specific capabilities.

Which leaves us in the situation of a very clear specification versus a very clear performance win.  How can we resolve this?  My rough ideas:

1. Provide a configuration flag, in which one value enables these optimizations and one value maintains strict compliance with the spec.  (default value is a separate question)
2. Fudge the spec, with the argument that the efficiency gains outweigh a few edge cases.

Thoughts?  Thank you!
Comment 3 Mark Thomas 2020-11-10 16:37:34 UTC
Fudging the spec is not an option. Providing an option that provides non-spec compliant behaviour and is disabled by default is an option.

That said, I do wonder why expression language is being used in these instances rather than a scriptlet.

<my:tag enumProperty="<%= hotFudge %>"/>

rather than

<my:tag enumProperty="${'hotFudge'}"/>

should give you exactly the behaviour you desire shouldn't it?

I'd also expect it be possible to implement with some form of global search and replace. That approach strikes me as a rather less risky solution (both from an implementation point of view and a testing one) than an optimisation in the JSP compiler. It also means you would not be reliant on a Tomcat specific performance tweak should you decide to switch containers at some point.
Comment 4 John Engebretson 2020-11-10 16:51:04 UTC
"Fudging the spec is not an option." Understood.

"That said, I do wonder why expression language is being used in these instances rather than a scriptlet."  EL was originally used for this because a) it hid the details of the enums, b) it worked, c) no one understood the performance impacts vs. scriptlets, d) scriptlets are discouraged internally because of maintainability issues.

Your search-and-replace suggestion could work, but given the size of our codebase and the sheer number of variations, it isn't feasible.

My view of this suggestion is a mechanism for making the existing behavior much faster - with the unfortunate exception that you pointed out.  Making adoption free or trivial (config flag) will maximize that benefit for Tomcat users and would give Tomcat an efficiency advantage over similar containers; in our case performance matters most and we wouldn't consider switching off Tomcat unless similar features were available on a competitor.

It sounds like you could support an "accelerated JSP" mode that defaults to off, and documentation clearly describes how it violates spec?
Comment 5 Mark Thomas 2020-11-10 17:44:05 UTC
Yes, I could support a feature like that.

The tricky part is going to be implementing it. The JSP engine doesn't have access to the internals of the EL parser. Just musing on that, are all the uses of the form property="${'literal-string'}" so the process we are trying to short-cut is the literal to appropriate type coercion?
Comment 6 John Engebretson 2020-11-10 18:43:55 UTC
We have a Tomcat8 implementation deployed that addresses the following cases:

Boolean/boolean properties:
value="true"
value="${true}"
value="${'true'}"

String properties:
value="${'myString'}"

Enum properties:
value="hotFudge"
value="${'hotFudge'}"

We considered the literal conversions to be quite safe, but the others were debated. 
- For booleans we decided that anyone who wrote true but wanted false could fix their own problem.
- For enums, our codebase contained many places where engineers forgot to use the literal syntax, so there was substantial value in taking a risk.  We settled on applying the optimization when the string matched the name of an enum value, else allow the current EL evaluation to occur as normal.  The optimized JSPs would clearly reflect the author's intent when valid but fallback to previous behavior when intent wasn't clear.

We are satisfied with the balance we struck, but I'm interested in your take.
Comment 7 Mark Thomas 2020-11-11 09:54:02 UTC
That all seems reasonable to me.

How big is the patch? Generally, the smaller the patch the fewer the concerns around size of patch vs how widely used the feature is likely to be.
Comment 8 John Engebretson 2020-11-11 14:05:10 UTC
Our internal patch contained more than one change (the release process was challenging) but separating out this change comes to around 60 lines of code across two files.

However, that patch is not up to Tomcat's standards for inclusion (unit tests, a few design considerations, etc.) so for this purpose I consider it "inspiration" rather than something we can directly integrate.  And to be clear, I am personally responsible for those deficiencies.  :)
Comment 9 Mark Thomas 2020-11-12 18:42:42 UTC
Understood. Any chance you could provide that patch anyway? Saves re-inventing the wheel.
Comment 10 John Engebretson 2020-11-12 18:45:42 UTC
I'll have to run it past the lawyers, etc.  I'll submit the request immediately.  Previous submissions have taken between one day and two weeks but this should be an easy one.
Comment 11 Mark Thomas 2020-11-12 18:48:20 UTC
Tx.
Comment 12 John Engebretson 2020-11-12 20:17:12 UTC
Created attachment 37561 [details]
Extract from Tomcat8 optimizations

Approval was rapid, and the extract is attached.  It is not a patch file because of the issues described previously, however I added comments that I hope will help.  During this process I noticed what appears to be double-coverage of the Boolean/boolean case; that could be an error during development or perhaps I can't remember the good reason for it.
Comment 13 Mark Thomas 2020-11-13 14:09:22 UTC
Thanks for that.

I think it should be possible to use the extension point added for bug 54239 to implement this. I'll take a look.
Comment 14 Mark Thomas 2020-11-16 17:35:33 UTC
Here is a first pass at a possible approach using ELInterpreter.

This doesn't address the attr="enumValue" case but my reading of the attached patch is that it doesn't handle that case either. Since that isn't EL a different approach is required. Enum isn't explicitly listed in JSP.1-11 but it works because there is a default PropertyEditor for Enums. I suspect bypassing that would be safe in most cases but there is always the possibility of custom PropertyEditors so I think a similar hook to ELInterpreter (StringInterpreter?) would be required.

https://github.com/markt-asf/tomcat/commit/8e7cfc707124b3bb86967d86f9220ed5ab0e5f49

Let me know what you think.
Comment 15 John Engebretson 2020-11-16 17:46:15 UTC
Reading over your commit, it is vastly superior to my implementation, and I am grateful for your efforts.  :)


The attached patch addresses the attr="enumValue" case using a code branch towards the bottom of the file, with this comment:

// jengebr - added a branch for enums that are not exact literals, for example "hotFudge" 



Two other points:
1. Good call on the inverted quotes, attr='${"hotFudge"}'.  That never even crossed my mind.
2. How is this activated?  System property, host context, etc.
Comment 16 Mark Thomas 2020-11-16 18:00:28 UTC
Glad you think it is heading in the right direction. It might not look it but the important parts are heavily based on the patch you provided.

I am only going from source code rather than testing but I don't think that enum branch will have an effect. If I am reading the nested if statements correctly it is in a "if (attr.isELInterpreterInput())" which means "hotFudge" won't enter that branch because it isn't EL. The "hotFudge" case is handled later by the PropertyEditor branch.

Activation is via:
- ServletContext attribute (set programmatically)
- ServletContext attribute init parameter (set in web.xml)

Full details at:
https://github.com/markt-asf/tomcat/blob/master/java/org/apache/jasper/compiler/ELInterpreterFactory.java#L50
Comment 17 John Engebretson 2020-11-16 18:08:18 UTC
Thanks for the activation info, that's perfect.

I've verified the enum case works as I intended on our production servers; the compiled java code correctly points directly to the enum rather than any runtime evaluation.  That suggests my code extract missed something relevant.

This particular case is *not* a must-have, but certainly is a nice-to-have.  Is there a practical way to cover it without overly complicating what you've already written?
Comment 18 Mark Thomas 2020-11-17 09:09:26 UTC
ACK re the Enum handling. I'll take another look at options for that. I'm still leaning towards something like a "StringInterpreter".
Comment 19 John Engebretson 2020-11-17 14:57:21 UTC
A StringInterpreter is a good idea.  I've had the nagging feeling that there are other places where this pattern is useful and I just haven't found them yet.... centralizing the logic will make that easier.

Now that I mention other places, etc. - constant numbers?
Comment 20 Mark Thomas 2020-11-17 17:48:10 UTC
Most of the String conversion (outside of EL) should be reasonably optimal already. Enums look to be by far the most likely potential optimisation there unless folks are using custom PropertyEditors.

The custom ELInterpreter could easily be extended for numerics. char / Character is the only remaining standard EL conversion. Could easily add that too. Actually, aligning with the standard type conversion rules in the EL spec is likely the way to go.
Comment 21 John Engebretson 2020-11-18 22:39:44 UTC
Sounds like a good plan!
Comment 22 Mark Thomas 2020-11-25 12:22:55 UTC
Updated branch:
https://github.com/markt-asf/tomcat/tree/bz-64872

The tests for the optimised ELInterpreter show about 10% improvement. The tests for the custom StringInterpreter (that handles attr="EnumConstant") show an order of magnitude improvement.

If this looks good to you, the next step will be a discussion on the dev@ list to figure out which bits ship as part of standard Tomcat.
Comment 23 John Engebretson 2020-11-25 15:23:21 UTC
My constant reaction is "I'm glad you're doing this and not me."  Your expertise is creating a far better implementation than I did!  Thank you for your hard work.

The code looks solid; the only thing that caught my eye was the non-static logger.  That's typically bad practice for production code.  Are you sure it's necessary?
Comment 24 Mark Thomas 2020-11-25 19:17:18 UTC
There can be issues around loggers and class loading with Jasper. Tomcat has a custom LogManager that ID's logger by a combination of name and ThreadContextClassLoader (rather than just name).

Depending on when the class is loaded you can see issues. These classes were affected but there are ways around that. I'll take a look.
Comment 25 Christopher Schultz 2020-11-26 04:12:57 UTC
I noticed that some of the boxed/primitive handling (e.g. boolean, char) are handled in separate branches and others (e.g. long, int, short) are handled together in a single branch (with sub-branches using type.isPrimitive).

Is there any reason to use the two separate styles or did you just change your mind halfway through the implementation?
Comment 26 Mark Thomas 2020-11-26 09:51:45 UTC
The duplication of the try/catch and Type.valueOf() struck me as too much to duplicate for the numerical types.

char and boolean were much more borderline whether it there was enough code duplication to merit merging them. In the end I opted not to but you could probably make a case for either approach.
Comment 27 John Engebretson 2020-12-07 18:07:09 UTC
For planning purposes, approximately when can we expect to see this in Tomcat 9 and 10?
Comment 28 Mark Thomas 2020-12-08 15:13:05 UTC
There needs to be agreement from the Tomcat committers to add all of this. I'm confident the new interface will be accepted. I'm not sure about the optimised implementations.

I can't be any moire precise that 2021Q1 at the moment.
Comment 29 John Engebretson 2020-12-08 15:16:55 UTC
Thank you!  :)
Comment 30 Mark Thomas 2021-01-22 16:44:04 UTC
Fixed in:
- 10.0.x for 10.0.1 onwards
- 9.0.x for 9.0.42 onwards
- 8.5.x for 8.5.62 onwards

Regarding the non-static logger, nearly all the logs in Jasper are non-static to avoid issues with memory leaks on web application reload (the explanation for that is a longish story - ask on dev@ if interested). The impact is a small increase in memory footprint if more than one web application is deployed on a Tomcat instance.
Comment 31 John Engebretson 2021-01-22 17:22:21 UTC
Wondeful, thank you very much!  And the non-static Logger does make sense as a safety measure.