A high-volume, latency sensitive application reports that `ImplicitObjectELResolverImpl.getValue()` uses approximately 0.2% of cpu, of which most (0.18%) comes from the call to `Arrays.binarySearch` for an array of strings. This problem was previously noted on https://bz.apache.org/bugzilla/show_bug.cgi?id=53895 but the solution was rejected for various reasons. I instead propose replacing the binary search with a switch statement on the string values. This requires Java 7 and was not an option when the previous issue was closed. The attached speed test shows that the `switch` statement is approximately 66% faster. I don't think it matters but I ordered the case statements in a rough approximation of their frequency in our application. The attached speed test returns results such as the following: Done with old in 1522 Done with old in 1636 Done with old in 1606 Done with old in 1641 Done with old in 1583 Done with old in 1587 Done with old in 1629 Done with old in 1627 Done with old in 1607 Done with old in 1569 Done with new in 474 Done with new in 453 Done with new in 450 Done with new in 452 Done with new in 449 Done with new in 450 Done with new in 454 Done with new in 452 Done with new in 459 Done with new in 447
Created attachment 38935 [details] Reproducer & speed test
I haven't studied what the Java compiler does with switch(String) statements, but I believe that numeric-based switch statements have two flavors: lookup vs table. One of them is faster but is only possible under certain circumstances. Introducing Strings likely complicates things further. If we just need to convert a String value into a number, there's a data structure pretty much purpose-built for that: a hash table. What if you implement the lookup as HashMap<String,Integer>.get()?
I used javap to examine the bytecode of the reproducer - the full output is attached but the critical portion is below. In short, it hashes each constant and switches based on that hash. This is superior to binary search because it's constant-time and superior to HashMap because there is no object instantiation/traversal. protected void runNewTest(); descriptor: ()V flags: (0x0004) ACC_PROTECTED Code: stack=5, locals=5, args_size=1 0: iconst_m1 1: istore_1 2: iconst_0 3: istore_2 4: goto 363 7: getstatic #40 // Field TEST_SCOPES:[Ljava/lang/String; 10: iload_2 11: aaload 12: astore_3 13: aload_3 14: dup 15: astore 4 17: invokevirtual #62 // Method java/lang/String.hashCode:()I 20: lookupswitch { // 11 -1354757532: 120 -1221270899: 133 -697829026: 146 -272077475: 159 -214877152: 172 -10806556: 185 106436749: 198 872295685: 211 889144335: 224 1291689221: 237 2073857551: 250 default: 324 } 120: aload 4 122: ldc #18 // String cookie 124: invokevirtual #66 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 127: ifne 294 130: goto 324 133: aload 4 135: ldc #20 // String header 137: invokevirtual #66 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 140: ifne 283 143: goto 324 146: aload 4
Created attachment 38938 [details] decompiled reproducer
Create Pull Request for the proposed change, have also shared the performance comparison for Old vs Switch implementation vs Hashmap implementation - https://github.com/apache/tomcat/pull/666
Anurag's solution is clearly faster than mine, I like it.
Hey Christopher Schultz, can you help out with the Pull Request i raised.
The alternative approach is not faster. The test that showed it was was not a fair test. I will be applying a variation of the original patch.
Created attachment 39050 [details] JMH test to compare binarySearch to switch(scope) # JMH version: 1.37 # VM version: JDK 21, OpenJDK 64-Bit Server VM, 21+35-2513 # VM invoker: /Library/Java/JavaVirtualMachines/jdk-21.jdk/Contents/Home/bin/java # VM options: <none> # Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) # Warmup: 5 iterations, 5 s each # Measurement: 5 iterations, 5 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: org.sample.ScopeLookup.binarySearch # Run progress: 0.00% complete, ETA 00:01:40 # Fork: 1 of 1 # Warmup Iteration 1: 32873899.937 ops/s # Warmup Iteration 2: 32037578.845 ops/s # Warmup Iteration 3: 32101040.458 ops/s # Warmup Iteration 4: 32619813.449 ops/s # Warmup Iteration 5: 32459923.794 ops/s Iteration 1: 32878096.534 ops/s Iteration 2: 32782993.884 ops/s Iteration 3: 32802020.501 ops/s Iteration 4: 32357564.918 ops/s Iteration 5: 34414429.950 ops/s Result "org.sample.ScopeLookup.binarySearch": 33047021.158 ±(99.9%) 3046287.685 ops/s [Average] (min, avg, max) = (32357564.918, 33047021.158, 34414429.950), stdev = 791111.238 CI (99.9%): [30000733.472, 36093308.843] (assumes normal distribution) # JMH version: 1.37 # VM version: JDK 21, OpenJDK 64-Bit Server VM, 21+35-2513 # VM invoker: /Library/Java/JavaVirtualMachines/jdk-21.jdk/Contents/Home/bin/java # VM options: <none> # Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) # Warmup: 5 iterations, 5 s each # Measurement: 5 iterations, 5 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: org.sample.ScopeLookup.caseSearch # Run progress: 50.00% complete, ETA 00:00:50 # Fork: 1 of 1 # Warmup Iteration 1: 45942620.330 ops/s # Warmup Iteration 2: 43150765.515 ops/s # Warmup Iteration 3: 44932581.558 ops/s # Warmup Iteration 4: 45360554.270 ops/s # Warmup Iteration 5: 43506970.095 ops/s Iteration 1: 44393187.971 ops/s Iteration 2: 46287069.970 ops/s Iteration 3: 46461809.778 ops/s Iteration 4: 45619371.053 ops/s Iteration 5: 46167433.097 ops/s Result "org.sample.ScopeLookup.caseSearch": 45785774.374 ±(99.9%) 3233722.237 ops/s [Average] (min, avg, max) = (44393187.971, 45785774.374, 46461809.778), stdev = 839787.396 CI (99.9%): [42552052.137, 49019496.610] (assumes normal distribution) # Run complete. Total time: 00:01:41 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise extra caution when trusting the results, look into the generated code to check the benchmark still works, and factor in a small probability of new VM bugs. Additionally, while comparisons between different JVMs are already problematic, the performance difference caused by different Blackhole modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons. Benchmark Mode Cnt Score Error Units ScopeLookup.binarySearch thrpt 5 33047021.158 ± 3046287.685 ops/s ScopeLookup.caseSearch thrpt 5 45785774.374 ± 3233722.237 ops/s
This test seems to show that there is indeed a speed improvement by using a switch statement instead of a binary search, somewhere in the 30% range, and it seems consistent. I didn't get the kind of speedup that John sees, but that may be due to environmental differences. Scope lookups probably occur many times in a given JSP, so this can probably add-up, especially when seeing a report that this is a hot-spot. In order to get a good randomized test where the JIT and/or CPU were not cheating by discovering that we were e.g. always searching for "applicationScope", I used java.util.Random.nextInt to randomly-choose one of the scopes for each test. I did not check to see how much of the testing time was taken up by the Random.nextInt invocation, but both compared methods are using the same random technique. I'm happy to receive a critique of my JMH test if it does not show what I think it shows.
(In reply to Mark Thomas from comment #8) > The alternative approach is not faster. The test that showed it was was not > a fair test. I will be applying a variation of the original patch. To be clear the original patch/approach from John Engebretson is faster than alternative patch/approach proposed by Anurag Dubey. Both are faster than binary search.
Created attachment 39051 [details] Alternative enum lookup approach JMH
(In reply to Mark Thomas from comment #11) > (In reply to Mark Thomas from comment #8) > > The alternative approach is not faster. The test that showed it was was not > > a fair test. I will be applying a variation of the original patch. > > To be clear the original patch/approach from John Engebretson is faster than > alternative patch/approach proposed by Anurag Dubey. > > Both are faster than binary search. I agree that the earlier test shared for Alternative enum lookup approach didn't involve a switch lookup. I have created another test involving switch after the enum lookup, which looks faster than other 2, thoughts on this ? Benchmark Mode Cnt Score Error Units LookupVsSwitch.binarySearch thrpt 15 12.175 ? 0.472 ops/us LookupVsSwitch.fromLookupAndSwitch thrpt 15 19.759 ? 0.878 ops/us LookupVsSwitch.fromSwitch thrpt 15 15.106 ? 0.520 ops/us
The fromLookupAndSwitch technique seems to be implemented in an awkward way. Why bother converting from String -> Enum and then using a switch on the enum, instead of just (a) implementing a getScopeNumber() method in each enum instance or (b) just using convenient ordinal values which match the value you actually want? I would think this would be the best implementation if you wanted to use Enums: enum Scope { applicationScope(0), cookie(1), header(2), ... public static int fromLookupAndSwitch(String scope) { Scope scope = lookupMap.get(scope); if(null == scope) { } else { return scope.ordinal(); } } } I understand that this test is no longer "from lookup and switch" but I decided to keep the method name the same. I guess the point is if you use a hash, you don't have to use a switch at all.
... also what about Scope.valueOf(scopeName) and then you don't have to implement your own map? It's possible that a manually-constructed Map would be faster than the default mapping provided by java.lang.Enum, but there is less code required.
(In reply to Christopher Schultz from comment #14) > The fromLookupAndSwitch technique seems to be implemented in an awkward way. > Why bother converting from String -> Enum and then using a switch on the > enum, instead of just (a) implementing a getScopeNumber() method in each > enum instance or (b) just using convenient ordinal values which match the > value you actually want? > > I would think this would be the best implementation if you wanted to use > Enums: > > enum Scope { > applicationScope(0), cookie(1), header(2), ... > > public static int fromLookupAndSwitch(String scope) { > Scope scope = lookupMap.get(scope); > if(null == scope) { > } else { > return scope.ordinal(); > } > } > } > > I understand that this test is no longer "from lookup and switch" but I > decided to keep the method name the same. > > I guess the point is if you use a hash, you don't have to use a switch at > all. That is to mimic the outer switch which anyways applies in the getValue method. ref- https://github.com/anuragdy/tomcat/blob/4257836d0930c853f862dfe5f0aad4c24cefec31/java/javax/servlet/jsp/el/ImplicitObjectELResolver.java#L90
Fair enough, but if we are going to try to squeeze as much performance out of this, then the correct process is with delegated dispatch: enum Scope { applicationScope(0) { public Object getScopedValue(ELContext ctx, Object base, Object property) { PageContext page = (PageContext) context .getContext(JspContext.class); context.setPropertyResolved(base, property); return ScopeManager.get(page).getApplicationScope(); } }, cookie(1) { public Object getScopedValue(ELContext ctx, Object base, Object property) { PageContext page = (PageContext) context .getContext(JspContext.class); context.setPropertyResolved(base, property); return ScopeManager.get(page).getCookie(); } }, ... } Then: Scope scope = Scope.valueOf(property); if(null != scope) { return scope.getScopedValue(ctx, base, property); } else { return null; } If you are going to perform a lookup, make sure you get the best bang for your buck.
(In reply to Christopher Schultz from comment #17) > If you are going to perform a lookup, make sure you get the best bang for > your buck. +1 I'll put together a patch along those lines.
Fixed in: - 11.0.x for 11.0.0-M12 onwards - 10.1.x for 10.1.14 onwards - 9.0.x for 9.0.81 onwards - 8.5.x for 8.5.94 onwards
Thank you to everyone involved!
This change reached production and decreased the cost of javax.servlet.jsp.el.ImplicitObjectELResolver.getValue() by approximately 80%.