the problem that i am reporting here is that when name resolution of a load balancer member fails, the affected member is not marked as disabled (or in error state), and taken out of the loop of actively load balanced members. the bad member continues to get requests and fail them. i believe what should occur is that the bad member should be marked as disabled (or in error state), a log entry made to the error log (this already occurs), and the request sent to a good member, if one is available. also, subsequent requests should not be sent to the bad member until/unless it becomes available. if a good member is not available to handle a request, then an error response is appropriate. this is a minor issue because the member can be disabled at runtime through the balancer manager, or the name can be mapped to an address (even if it points to an address where nothing is listening on the right port -- because as long as the name resolves and the connection fails, then the member gets marked as being in error state and removed from the load balancing loop), etc... this problem was noticed in an environment where the same httpd config is used in multiple testing environments. the testing environments have varying numbers of cluster members actually provisioned and available. the idea was that the loadbalancer would be able to determine when members were not available and skip over them. given this simplified config: <Proxy balancer://api-cluster> BalancerMember http://box01:8182/api BalancerMember http://box02:8182/api </Proxy> ProxyPass /api/ balancer://api-cluster/ when the box02 name is not resolvable, then every other client request coming into the load balancer generates a response similar to: HTTP/1.1 502 Proxy Error Date: Fri, 08 Mar 2013 15:12:03 GMT Content-Length: 400 Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/api/whatever">GET /api/whatever</a></em>.<p> Reason: <strong>DNS lookup failure for: box02</strong></p></p> </body></html> also, this goes to the error_log at error level: [Fri Mar 08 10:12:03 2013] [error] [client 127.0.0.1] proxy: DNS lookup failure for: box02 returned by /api/whatever
just adding notes on how to reproduce this problem with the current latest version, 2.4.4: build process: - mkdir /tmp/apache/ - cd /tmp/apache/ - wget http://www.gtlib.gatech.edu/pub/apache/httpd/httpd-2.4.4.tar.gz - wget http://www.gtlib.gatech.edu/pub/apache/apr/apr-1.4.6.tar.gz - wget http://www.gtlib.gatech.edu/pub/apache/apr/apr-util-1.5.1.tar.gz - wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.32.tar.gz - tar xzf httpd-2.4.4.tar.gz - tar xzf apr-1.4.6.tar.gz - tar xzf apr-util-1.5.1.tar.gz - tar xzf pcre-8.32.tar.gz - mv apr-1.4.6 httpd-2.4.4/srclib/apr/ - mv apr-util-1.5.1 httpd-2.4.4/srclib/apr-util/ - cd /tmp/apache/pcre-8.32/ - ./configure --prefix=/tmp/apache/mypcre - make && make install - cd /tmp/apache/httpd-2.4.4/ - ./configure --prefix=/tmp/apache/myapache/ --enable-mods-shared=all --enable-mpms-shared=all --with-mpm=worker --disable-cgid --enable-proxy=shared --with-included-apr --with-pcre=/tmp/apache/mypcre/ - make && make install test config: /tmp/apache/myapache/conf> cat httpd.conf ServerRoot /tmp/apache/myapache/ Listen 8080 LoadModule authz_core_module modules/mod_authz_core.so LoadModule mpm_worker_module modules/mod_mpm_worker.so LoadModule unixd_module modules/mod_unixd.so LoadModule slotmem_shm_module modules/mod_slotmem_shm.so LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_http_module modules/mod_proxy_http.so LoadModule proxy_balancer_module modules/mod_proxy_balancer.so LoadModule lbmethod_byrequests_module modules/mod_lbmethod_byrequests.so <Directory /> AllowOverride None Require all denied </Directory> DocumentRoot /tmp/apache/myapache/htdocs <Directory /tmp/apache/myapache/htdocs> AllowOverride None Require all granted </Directory> ErrorLog logs/error_log LogLevel warn <Proxy balancer://api-cluster> BalancerMember http://box01:8182/api BalancerMember http://box02:8182/api </Proxy> ProxyPass /api/ balancer://api-cluster/ - start the server - reproduce problem by repeatedly doing: curl http://localhost:8080/api/whatever
I can confirm this happens on my 2.2.24 also. The app-03.local member listed below has no entry in my hosts file. My member numbers also vary based on environment. centos 6.4 kernel 2.6.32-358.14.1.el6.x86_64 httpd 2.2.24 apr 1.4.6 apr-util 1.5.2 Sch Host Stat Route Redir F Set Acc Wr Rd ajp app-03.local Ok app-03 1 0 2 0 0 [08/Nov/2013:08:59:14 -0700] GET /XXX 502 Sz 17 BR 246 BS 166 TMSec 12412 TSec 0 Bal balancer://loadbalancer SessWrk - RealWrk app-03 WrkName ajp://app-03.local:8009 PID 31861 TID 140004374312704 UID Un0KUgoKCtMAAHx1OF8AAAAM VHost XXX <IfModule mod_proxy_balancer.c> ProxyPass /balancer-manager ! ProxyPass /XXX balancer://loadbalancer/XXX ProxyPassReverse /XXX balancer://loadbalancer/XXX </IfModule> <IfModule mod_proxy.c> ProxyRequests off ProxyStatus On # Enable/disable the handling of HTTP/1.1 "Via:" headers. # ("Full" adds the server version; "Block" removes all outgoing Via: headers) # Set to one of: Off | On | Full | Block ProxyVia Off ProxyPreserveHost On <IfModule mod_proxy_balancer.c> <Proxy balancer://loadbalancer> # Max is equal to the max threads a single tomcat can handle, devided by the number of tomcats being balanced # So if a single tomcat is configured for 300 max threads and there are 3 tomcats, you would set Max to 100 for each balancer member BalancerMember ajp://app-01.local:8009 route=app-01 loadfactor=1 max=200 acquire=2000 connectiontimeout=2 disablereuse=off keepalive=off ping=2 timeout=60 retry=60 ttl=120 flushpackets=on BalancerMember ajp://app-02.local:8009 route=app-02 loadfactor=1 max=200 acquire=2000 connectiontimeout=2 disablereuse=off keepalive=off ping=2 timeout=60 retry=60 ttl=120 flushpackets=on BalancerMember ajp://app-03.local:8009 route=app-03 loadfactor=1 max=200 acquire=2000 connectiontimeout=2 disablereuse=off keepalive=off ping=2 timeout=60 retry=60 ttl=120 flushpackets=on ProxySet stickysession=JSESSIONID|jsessionid ProxySet lbmethod=bybusyness ProxySet scolonpathdelim=On # Balancer timeout in seconds. If set this will be the maximum time to wait for a free worker # Default is not to wait. (Acquire time * number of workers) / 1000? ProxySet timeout=6 # A single or comma-separated list of HTTP status codes. If set this will force the worker # into error state when the backend returns any status code in the list. ProxySet failonstatus=500,503,502 </Proxy> </IfModule> # end mod_proxy_balancer </IfModule> # end mod_proxy
I just got bitten by this same behaviour as well.