Bug 66541 - CachedResource for OSGi URL resources changes URL hashing behavior & exacerbates DNS issues
Summary: CachedResource for OSGi URL resources changes URL hashing behavior & exacerba...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 8.5.x-trunk
Hardware: PC Linux
: P2 minor (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-23 04:00 UTC by Tom Whitmore
Modified: 2023-03-28 01:13 UTC (History)
0 users



Attachments
patch for CachedResource/ CachedResourceURLStreamHandler (5.13 KB, patch)
2023-03-26 21:10 UTC, Tom Whitmore
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Whitmore 2023-03-23 04:00:05 UTC
We run Tomcat with resources mounted from OSGi bundles. The OSGi URLs are of the form 'bundleentry://203.fwk668849042/META-INF/output.tld' and have a custom 'handler' assigned to handle them.

In Tomcat 8.5.48, changes were made to CachedResource (to fix "Intermittent JSP Caching/Compiling Issue while under load", 2b0aaedd76d8) which introduce CachedResourceURLStreamHandler & bypass the OSGi-supplied hashCode() behavior of the OSGi URLs.

Several of our end-users have now reported large delays (up to 40 minutes) in Tomcat startup. The problem is:
* TldScanner hashing URLs of TLDs -- about 150 of these -- to build its tldResourcePathTaglibXmlMap.
* The OSGi URLs are now having java.net.URLStreamHandler hash them & this attempts to resolve their Hostnames, where the OSGi (Equinox) handler did not.
* In the case of DNS misconfiguration on some platforms, which seems to be not uncommon amongst our end-users, Tomcat thus has to wait for 150 failed lookups (of OSGi bundle names) at 15 seconds each before starting.

Proposed solution approach: 
* Consider making CachedResourceURLStreamHandler delegate 'hashCode()' and 'equals()' to the underlying 'resourceURL'.
* This will preserve handler behaviors from the underlying URL and avoid introducing spurious DNS lookups for OSGi-loaded resources.
Comment 1 Tom Whitmore 2023-03-23 04:04:33 UTC
To clarify:  
* The OSGi URLs are now having CachedResourceURLStreamHandler (which inherits from java.net.URLStreamHandler) hash them; this attempts to resolve their Hostnames, where the OSGi (Equinox) handler did not.
Comment 2 Mark Thomas 2023-03-23 14:59:27 UTC
Thanks for the detailed explanation. That makes sense. I've applied a fix that should address the performance issue.

Fixed in:
- 11.0.x for 11.0.0-M5 onwards
- 10.1.x for 10.1.8 onwards
-  9.0.x for  9.0.74 onwards
-  8.5.x for  8.5.88 onwards

If you want to test the fix and provide feedback before the next release, I have provided a dev build here:

https://people.apache.org/~markt/dev/v8.5.88-dev/

Usual caveats apply for a dev build: It isn't an official release so use it at your own risk.
Comment 3 Tom Whitmore 2023-03-26 21:09:28 UTC
Hi, thanks very much for the quick response. Just checking up on the proposed fix -- looks like a small issue with the delegation logic, confirmed in debugger.

The CachedResourceURLStreamHandler is created for the wrapped URL ('associatedURL') and should delegate for this. It isn't the handler for the 'resourceURL' so won't be called for that.

I had already noticed the logic here was a bit subtle & easy to mislay -- I propose clarifying it by renaming 'associatedURL' to 'wrappedURL' which makes things easier to understand. 

"The Wrapped URL delegates to the Cache and the underlying Resource URL."

Patch attached.
Comment 4 Tom Whitmore 2023-03-26 21:10:59 UTC
Created attachment 38528 [details]
patch for CachedResource/ CachedResourceURLStreamHandler
Comment 5 Mark Thomas 2023-03-26 21:31:25 UTC
Thanks for the review. I agree the wording is misleading. I'm not sure wrappedURL is the best terminology either since it is the original resource URL that is wrapped. I think maybe one of wrapperURL, wrappingURL, outerURL os something along those lines would be even better.

I'm currently leaning towards wrapperURL but plan to look at this in more detail tomorrow.
Comment 6 Tom Whitmore 2023-03-26 23:03:45 UTC
Thanks. The "wrapped one" is the outer one; I use wrapped & underlying as a frequent terminology, it works well & is clear semantically.

> I'm not sure wrappedURL is the best terminology either since it is the original
> resource URL that is wrapped.

* I think this objection is not logically correct, it's confusing the subject with the verb/ transformation. 

* The fact that you apply an operation (wrapping) to an input to produce a result, does not mean that you should describe or identify the input with the operation (wrapping). 

* Using a verb to select inputs is misusing a linguistic shortcut from selectivity; used where there is no ambiguity about which is input & output but where there is a need to select which of many inputs is used. 

* Here however, there is only one input, there is ambiguity between inputs and outputs, and the question of selectivity (which URL is the handler being called with?) applies only in the domain of outputs.

* The 'wrappedURL' is wrapped with the new URLStreamHandler. The underlying 'resourceURL' is not itself wrapped. It is prima facie confusing to refer the resourceURL as itself being wrapped, since it itself is not.

Better might be to say "We wrap the underlying Resource URL with a Cache-aware  URL Handler, to produce the Wrapped URL". This clearly distinguishes the input (the Resource URL), the operation (wrapping) and the result (the Wrapped URL).

Language can sometimes be tricky. I think it's worth clearly distinguishing between _descriptive terminology_, which should be preferred, and _implicit selection_ which is sometimes useful but can -- as in this case -- also lead to unnecessary confusion.

Thanks Mark. I hope this helps!
Comment 7 Mark Thomas 2023-03-27 13:26:53 UTC
(In reply to Tom Whitmore from comment #6)
> Thanks. The "wrapped one" is the outer one; I use wrapped & underlying as a
> frequent terminology, it works well & is clear semantically.

On that we are going to have to agree to disagree.

The cache URL handler has references to both the original resource URL and that cache URL. I can see how either of those could be viewed as wrapped. That you and I seem to view this differently - and both us of seem fairly sure we are right - tells me any variation on the theme of "wrap" is not the right one.

Replacing "associatedURL" with "cacheURL" is an alternative possibility.

The first step is to fix the bug - which I'll do shortly.

The second step is to take another look at the logic. I'm not sure the "... == u1" tests are required at all. I think those methods are only going to be called from the URL instance that was created with the handler instance so they are always going to be true. If that is the case, I'll simplify (remove) the logic.

Finally there is the rename to make things clearer. Current front-runner is cacheURL, with the caveat a reason not to use it / a better idea may occur to me while working on this issue.
Comment 8 Mark Thomas 2023-03-27 15:55:47 UTC
Fixed.

Bug fixed and field renamed. The logic needed to stay and I added a comment to remind the next person looking at the code why this is.
Comment 9 Tom Whitmore 2023-03-27 23:14:51 UTC
Great, thanks Mark. 
* I think 'cacheURL' and 'resourceURL' sound reasonably clearly distinct. 
* I agree the 'url == cacheURL' logic is necessary; relative resource URL objects can be created inheriting their parent handlers.
* I personally wasn't certain the null-check on openConnection() was necessary unless that can somehow be called with a null. (But that's pre-existing to this issue.)

Do you have a build I can review?
Comment 10 Mark Thomas 2023-03-28 01:13:39 UTC
I have updated https://people.apache.org/~markt/dev/v8.5.88-dev/ with a new test / dev build. Same caveats apply.