Bug 49822

Summary: Add hash lb worker method
Product: Tomcat Connectors Reporter: Ryan Fox <rfox>
Component: CommonAssignee: Tomcat Developers Mailing List <dev>
Status: NEW ---    
Severity: enhancement CC: vinnieashley4ever
Priority: P2 Keywords: PatchAvailable
Version: 1.2.30   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Patch to add hash lb worker method
Patch to add hash lb worker method

Description Ryan Fox 2010-08-25 12:28:50 UTC
Created attachment 25938 [details]
Patch to add hash lb worker method

The attached patch adds a hash method to the LB worker.  This is useful for scenarios where back-end cache affinity is more desirable than other load balancing methods.

This works by taking a hash of the req_uri + query_string, seeding the random number generator with this value, and using the first random number returned mod num_of_workers.  This will consistently send a given uri/query string to a particular worker.  If that worker is unavailable, fall back to default method.
Comment 1 Ryan Fox 2010-08-25 17:23:39 UTC
Created attachment 25941 [details]
Patch to add hash lb worker method

Previous patch was incomplete
Comment 2 Christopher Schultz 2017-09-01 14:42:26 UTC
Note the age of the patch.
Comment 3 Mark Thomas 2021-01-29 16:44:39 UTC
Comment on attachment 25941 [details]
Patch to add hash lb worker method

Repair vandalism
Comment 4 Christopher Schultz 2021-01-29 23:04:44 UTC
I question the quality of this patch. There are a number of issues that I see:

1. The combining of the URI and query string is completely wrong. Only the query string is actually used. I'm not sure if s->query_string can be NULL (instead of simply empty-string), in which case this will crash the process.

2. There is no ensuring that the "req" "combined" URI+query-string is null-terminated. Out-of-bounds read is possible, given that the client controls the inputs.

3. There is a race-condition between seeding the PRNG (which is a process-wide change) and obtaining a random number from the PRNG. Two threads running at the same moment can step on each other and the random numbers may not be correct.

4. After hashing, why bother initializing a random-number generator and then generating a single random number? You already have a nice "random number" generated: the hash you just computed. Just use the hash directly.

Also, why not use a standard hashing algorithm? The one here appears to be the ELF hash algorithm[1] which is, I suppose, standard enough. I looked at libapr and it doesn't look like it has any specific hashing algorithms implemented, unless I am very much mistaken. I suppose ELF hash is as good as any out there. Suggestions are welcome.

[1] http://www.sco.com/developers/devspecs/gabi41.pdf pages 94-95, figure 5-12. Also https://en.wikipedia.org/wiki/PJW_hash_function
I wonder if the ELF hash algorithm requires some kind of attribution. It's such a short bit of code it can't really be implemented any other way.