Summary: | Proxy error doesnot enable the worker even retry property is enabled. | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Bhushan Jade <bhushan.jade> |
Component: | mod_proxy | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | NEW --- | ||
Severity: | major | CC: | rageratwork |
Priority: | P2 | ||
Version: | 2.4.18 | ||
Target Milestone: | --- | ||
Hardware: | Other | ||
OS: | Linux |
Description
Bhushan Jade
2019-03-20 14:58:06 UTC
Does DNS change after the restart? (In reply to Eric Covener from comment #1) > Does DNS change after the restart? No. We are restarting apache service. Backend load balancer is AWS ELB. Which gives response ,when we hit directly using LOAD-BALANACER-DNS URL. We have setup like this : [Front end Server(Apache)]<---elb--->[Backend Server] Worker connection once lost,its not establishing again even there is retry=0. It starts when apache service restarted. Hello, it looks like I may have encountered this recently in our Production environment. We have two front end servers that connect to backend servers through an AWS load balancer similar to what is described here. Both began to fail about the same time with this error. I would like to try to recreate this in our Test environment but I'm unsure what triggered it. Does anyone have any pointers on how to reproduce it? I believe the issue is related to how AWS ELBs scale and how Apache workers are configured by default. From Apache docs: "When connection reuse is enabled, each backend domain is resolved only once per child process, and cached for all further connections until the child is recycled." When resolving an AWS ELB hostname "by default, Elastic Load Balancing will return multiple IP addresses when clients perform a DNS resolution, with the records being randomly ordered on each DNS resolution request." Under load, AWS ELBs will scale to handle the traffic. (Not the same as scaling application servers behind the ELB). "The Elastic Load Balancing service will update the Domain Name System (DNS) record of the load balancer when it scales so that the new resources have their respective IP addresses registered in DNS." I believe what is happening is, under load, previously unused workers are started, resolve the ELB host and receive a "new" IP for the ELB. Once the load subsides, the ELBs scale down and one or more IP addresses now cached by workers are no longer valid. Because the worker has cached that IP, it tries to use it the next time it receives a request which then fails with the described error. Regardless of the value of 'retry', that cached IP address will never be refreshed and the worker will always fail until it is recycled. Using the parameter 'disablereuse=on' (or 'enablereuse=off') will force the worker to resolve the hostname to get a new IP. Also note, it is better to leave 'retry' to its default value of 60 in the case a worker resolves an IP address that hasn't yet been removed from the DNS record when scaling down: "The DNS record that is created includes a Time-to-Live (TTL) setting of 60 seconds, with the expectation that clients will re-lookup the DNS at least every 60 seconds." |