54657 – DNS lookup failure for load balancer member does not stop affected member from receiving and failing requests

Bug 54657 - DNS lookup failure for load balancer member does not stop affected member from receiving and failing requests

Summary: DNS lookup failure for load balancer member does not stop affected member fro...

Status:	NEW

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	mod_proxy (show other bugs)
Version:	2.4.4
Hardware:	PC Linux

Importance:	P2 normal with 3 votes (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-03-09 08:07 UTC by mike
Modified:	2016-01-14 16:17 UTC (History)
CC List:	1 user (show)

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description mike 2013-03-09 08:07:04 UTC

the problem that i am reporting here is that when name resolution of a load balancer member fails, the affected member is not marked as disabled (or in error state), and taken out of the loop of actively load balanced members. the bad member continues to get requests and fail them.

i believe what should occur is that the bad member should be marked as disabled (or in error state), a log entry made to the error log (this already occurs), and the request sent to a good member, if one is available. also, subsequent requests should not be sent to the bad member until/unless it becomes available. if a good member is not available to handle a request, then an error response is appropriate.

this is a minor issue because the member can be disabled at runtime through the balancer manager, or the name can be mapped to an address (even if it points to an address where nothing is listening on the right port -- because as long as the name resolves and the connection fails, then the member gets marked as being in error state and removed from the load balancing loop), etc...

this problem was noticed in an environment where the same httpd config is used in multiple testing environments. the testing environments have varying numbers of cluster members actually provisioned and available. the idea was that the loadbalancer would be able to determine when members were not available and skip over them.

given this simplified config:
<Proxy balancer://api-cluster>
  BalancerMember http://box01:8182/api
  BalancerMember http://box02:8182/api
</Proxy>
ProxyPass /api/ balancer://api-cluster/

when the box02 name is not resolvable, then every other client request coming into the load balancer generates a response similar to:

HTTP/1.1 502 Proxy Error
Date: Fri, 08 Mar 2013 15:12:03 GMT
Content-Length: 400
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/api/whatever">GET&nbsp;/api/whatever</a></em>.<p>
Reason: <strong>DNS lookup failure for: box02</strong></p></p>
</body></html>

also, this goes to the error_log at error level:
[Fri Mar 08 10:12:03 2013] [error] [client 127.0.0.1] proxy: DNS lookup failure for: box02 returned by /api/whatever

Comment 1 mike 2013-03-17 14:00:03 UTC

just adding notes on how to reproduce this problem with the current latest version, 2.4.4:

build process:
- mkdir /tmp/apache/
- cd /tmp/apache/
- wget http://www.gtlib.gatech.edu/pub/apache/httpd/httpd-2.4.4.tar.gz
- wget http://www.gtlib.gatech.edu/pub/apache/apr/apr-1.4.6.tar.gz
- wget http://www.gtlib.gatech.edu/pub/apache/apr/apr-util-1.5.1.tar.gz
- wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.32.tar.gz
- tar xzf httpd-2.4.4.tar.gz
- tar xzf apr-1.4.6.tar.gz
- tar xzf apr-util-1.5.1.tar.gz
- tar xzf pcre-8.32.tar.gz
- mv apr-1.4.6 httpd-2.4.4/srclib/apr/
- mv apr-util-1.5.1 httpd-2.4.4/srclib/apr-util/
- cd /tmp/apache/pcre-8.32/
- ./configure --prefix=/tmp/apache/mypcre
- make && make install
- cd /tmp/apache/httpd-2.4.4/
- ./configure --prefix=/tmp/apache/myapache/ --enable-mods-shared=all --enable-mpms-shared=all --with-mpm=worker --disable-cgid --enable-proxy=shared --with-included-apr --with-pcre=/tmp/apache/mypcre/
- make && make install

test config:
/tmp/apache/myapache/conf> cat httpd.conf
ServerRoot /tmp/apache/myapache/
Listen 8080
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule mpm_worker_module modules/mod_mpm_worker.so
LoadModule unixd_module modules/mod_unixd.so
LoadModule slotmem_shm_module modules/mod_slotmem_shm.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule lbmethod_byrequests_module modules/mod_lbmethod_byrequests.so
<Directory />
    AllowOverride None
    Require all denied
</Directory>
DocumentRoot /tmp/apache/myapache/htdocs
<Directory /tmp/apache/myapache/htdocs>
    AllowOverride None
    Require all granted
</Directory>
ErrorLog logs/error_log
LogLevel warn
<Proxy balancer://api-cluster>
  BalancerMember http://box01:8182/api
  BalancerMember http://box02:8182/api
</Proxy>
ProxyPass /api/ balancer://api-cluster/


- start the server
- reproduce problem by repeatedly doing: curl http://localhost:8080/api/whatever

Comment 2 abiacco 2013-11-08 16:11:06 UTC

I can confirm this happens on my 2.2.24 also.
The app-03.local member listed below has no entry in my hosts file.
My member numbers also vary based on environment.

centos 6.4
kernel 2.6.32-358.14.1.el6.x86_64
httpd 2.2.24
apr 1.4.6
apr-util 1.5.2


Sch	Host	Stat	Route	Redir	F	Set	Acc	Wr	Rd
ajp	app-03.local	Ok	app-03		1	0	2	 0	 0

[08/Nov/2013:08:59:14 -0700] GET /XXX 502 Sz 17 BR 246 BS 166 TMSec 12412 TSec 0 Bal balancer://loadbalancer SessWrk - RealWrk app-03 WrkName ajp://app-03.local:8009 PID 31861 TID 140004374312704 UID Un0KUgoKCtMAAHx1OF8AAAAM VHost XXX

<IfModule mod_proxy_balancer.c>
        ProxyPass /balancer-manager !
        ProxyPass /XXX balancer://loadbalancer/XXX
        ProxyPassReverse /XXX balancer://loadbalancer/XXX
</IfModule>

<IfModule mod_proxy.c>

        ProxyRequests off
        ProxyStatus On
        # Enable/disable the handling of HTTP/1.1 "Via:" headers.
        # ("Full" adds the server version; "Block" removes all outgoing Via: headers)
        # Set to one of: Off | On | Full | Block
        ProxyVia Off
        ProxyPreserveHost On

<IfModule mod_proxy_balancer.c>

<Proxy balancer://loadbalancer>
# Max is equal to the max threads a single tomcat can handle, devided by the number of tomcats being balanced
# So if a single tomcat is configured for 300 max threads and there are 3 tomcats, you would set Max to 100 for each balancer member
BalancerMember ajp://app-01.local:8009 route=app-01 loadfactor=1 max=200 acquire=2000 connectiontimeout=2 disablereuse=off keepalive=off ping=2 timeout=60 retry=60 ttl=120 flushpackets=on
BalancerMember ajp://app-02.local:8009 route=app-02 loadfactor=1 max=200 acquire=2000 connectiontimeout=2 disablereuse=off keepalive=off ping=2 timeout=60 retry=60 ttl=120 flushpackets=on
BalancerMember ajp://app-03.local:8009 route=app-03 loadfactor=1 max=200 acquire=2000 connectiontimeout=2 disablereuse=off keepalive=off ping=2 timeout=60 retry=60 ttl=120 flushpackets=on

ProxySet stickysession=JSESSIONID|jsessionid
ProxySet lbmethod=bybusyness
ProxySet scolonpathdelim=On
# Balancer timeout in seconds. If set this will be the maximum time to wait for a free worker
# Default is not to wait. (Acquire time * number of workers) / 1000?
ProxySet timeout=6
# A single or comma-separated list of HTTP status codes. If set this will force the worker
# into error state when the backend returns any status code in the list.
ProxySet failonstatus=500,503,502
</Proxy>

</IfModule>
# end mod_proxy_balancer

</IfModule>
# end mod_proxy

Comment 3 Corey Puffalt 2016-01-14 16:17:37 UTC

I just got bitten by this same behaviour as well.