Bug 66178 - Reduce garbage from strings in AbstractArchiveResourceSet.getResource
Summary: Reduce garbage from strings in AbstractArchiveResourceSet.getResource
Status: RESOLVED WORKSFORME
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 8.5.x-trunk
Hardware: Macintosh All
: P3 normal (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-25 21:25 UTC by Joe Jackson
Modified: 2022-11-21 14:04 UTC (History)
0 users



Attachments
possible patch to reduce garbage (3.55 KB, patch)
2022-07-25 21:25 UTC, Joe Jackson
Details | Diff
after fix (731.05 KB, image/png)
2022-07-27 13:40 UTC, Joe Jackson
Details
before fix (568.57 KB, image/png)
2022-07-27 13:41 UTC, Joe Jackson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Jackson 2022-07-25 21:25:13 UTC
Created attachment 38343 [details]
possible patch to reduce garbage

If you have a large-ish number of jars (something like 1200 in my case) then we end up calling the getResource method of AbstractArchiveResourceSet for each jar every time that we load a new resource. This code does a few substrings and string concatenation that ends up generating a lot of garbage due to the fact that it's called constantly.

This happens even if the resource is cached because of the code in Cache.getResource()

If the entry is found in the cache we call it here
        if (cacheEntry != null && !cacheEntry.validateResource(useClassLoaderResources)) {


Using JFR to capture some profiling data I was able to see that replacing the string concatenation with a StringBuilder reduced the garbage by about 3x - 4x. In the original, it created something like 20 mb when loading a page that serves multiple resources and after the change, it only allocated about 7 mb. 

Attached is a git patch showing a possible fix. I'd be happy to make a PR as well.
Comment 1 Joe Jackson 2022-07-25 21:26:51 UTC
This is most likely happening in the newer versions of tomcat, this just happens to be the one I am using. I believe we are going to 9 soon.
Comment 2 Christopher Schultz 2022-07-25 23:52:33 UTC
Is this patch trying to reduce the use of temporary Strings in the heap?

I recently had conversation with someone about string concatenation in Java and it's clear that modern compilers are more efficient when using source-level concatenation instead of programmer-imposed uses of StringBuilder.

The code is less clear (to read, and thus less maintainable) when StringBuilder is used and I'm not convinced that this will practically improve anything.

I'm a -0 on this for the time being.
Comment 3 Konstantin Kolinko 2022-07-26 09:59:03 UTC
1. What version of Java have you tested with? (And what version of Tomcat?)

String handling rather differs between versions of JRE.


2. I spotted one bug within the patch (attachment 38343 [details]). Maybe there are others:

> if (pathInJar.equals("")) {

As the pathInJar variable has been changed from a String to a StringBuilder, the above comparison is now broken.

It could be `if (pathInJar.length() == 0)`


3. I do not follow:

> 
> This happens even if the resource is cached because of the code in
> Cache.getResource()
> 
> If the entry is found in the cache we call it here
>         if (cacheEntry != null &&
> !cacheEntry.validateResource(useClassLoaderResources)) {
>

How does a CachedResource#validateResource(...) call map to path calculations in AbstractArchiveResourceSet?

- Looking into the code of CachedResource#validateResource(...), the validation is performed only once in awhile, depending on the TTL of a cache entry. The TTL of a Cache is configurable. Does changing the value of ttl improve your use case?

- I have not tested, but it sounds odd to delve into path calculations to validate an entry in a jar file. It would be better to validate the jar itself, as a whole, just once.
Comment 4 Joe Jackson 2022-07-26 13:07:30 UTC
Thanks for the quick feedback. I was also concerned about the readability trade-off. For my use case, we have a big focus on the garbage generation of the app and this showed up as a contributor so I figured it would be worth bringing up. 

We are using java 11.13 and tomcat 8.5.72.


No worries if you don't think the trade-off has value because of newer compilers. 

I do have a concern about the comments on TTL and it is something I looked into because as far as I can tell (maybe something is misconfigured on my end) we do validate regardless of it being returned from the cache or not. Increasing the TTL did not help with garbage generation. A better re-write may be a method to target a single jar instead of having to scan all of them. In our use case, I believe we already know the path to the jar which contains the resource we are trying to access. 


Here is the code showing we validate even when we get from the cache. 
https://github.com/apache/tomcat/blob/8.5.x/java/org/apache/catalina/webresources/Cache.java#L69

Here is the stack trace, note line 69 is the one where we did find the cached resource.

getResourceInternal:280, StandardRoot (org.apache.catalina.webresources)
validateResource:128, CachedResource (org.apache.catalina.webresources)
getResource:69, Cache (org.apache.catalina.webresources)
getResource:215, StandardRoot (org.apache.catalina.webresources)
getClassLoaderResource:224, StandardRoot (org.apache.catalina.webresources)
getResourceAsStream:1168, WebappClassLoaderBase (org.apache.catalina.loader)
Comment 5 Christopher Schultz 2022-07-26 14:55:24 UTC
(In reply to Joe Jackson from comment #4)
> For my use case, we have a big focus on the garbage generation of
> the app and this showed up as a contributor so I figured it would be worth
> bringing up. 

It seems like you are trading String garbage for StringBuilder garbage. Where is the savings? In most cases, you must convert SB to String anyway. Do you have metrics before/after the change?
Comment 6 Joe Jackson 2022-07-27 13:40:48 UTC
Created attachment 38348 [details]
after fix
Comment 7 Joe Jackson 2022-07-27 13:41:27 UTC
Created attachment 38349 [details]
before fix
Comment 8 Joe Jackson 2022-07-27 13:43:02 UTC
I've added the screenshots showing JFR profiling data before and after the patch. The StringBuilder does still generate garbage, but it's reduced by about 3x - 4x. Mainly because there is less need to create as many new strings with the changed code.
Comment 9 Christopher Schultz 2022-07-29 13:15:20 UTC
I still don't see the savings in the code, but I do see the difference in your metrics. We can only see the output, though. What kind of "load" are you putting on this? Technically, you could get that picture just by using 1/4 of the operations and it would look great.

Do you have some exercise code you could post to corroborate what you are showing here?
Comment 10 Christopher Schultz 2022-07-29 17:24:59 UTC
Upon further review, I wonder if the difference is that you have removed the use of String.substring() in a few locations, instead appending slices of the source Strings.
Comment 11 Joe Jackson 2022-08-16 19:47:37 UTC
Hey Christopher,

Sorry for the delay in getting back to you. I haven't been able to create a demo application, but basically, you would just need a high number of jars and then call StandardRoot.getResource() to hit the related code. 

As for the load I was testing it's just loading the same page in our app 5 times with the browser cache disabled which causes us to load a number of JS files from jars.
Comment 12 Mark Thomas 2022-08-17 11:54:03 UTC
Note that cache validation needs to be per URI (and not per JAR) to account for various use cases where web application resources are changed while a web application is running and those resources overlap with other resources (see PreResources and PostResources).
Comment 13 Mark Thomas 2022-08-18 10:39:50 UTC
I have been trying to recreate these results without success.

I am not comfortable applying a performance change without being able to reliably recreate the results.

Please provide the exact steps to recreate these results from a clean installation of the latest release of any supported Tomcat major version (8.5.x, 9.0.x, 10.0.x or 10.1.x).
Comment 14 Mark Thomas 2022-09-20 09:54:10 UTC
It has been more than a month and the requested information to enable the issue to be recreated has not been provided.

Without the information to reproduce the issue, this bug report will eventually be resolved as WORKSFORME.
Comment 15 Mark Thomas 2022-11-21 14:04:23 UTC
It has been more than 3 months since the request for more information. None has been provided so closing as per previous comment.

If the requested information is available at some point in the future, feel free to re-open this issue and attach it.