Bug 48777 - proxy balancer not detecting correctly when host (BalancerMember) is down
Summary: proxy balancer not detecting correctly when host (BalancerMember) is down
Status: RESOLVED LATER
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_balancer (show other bugs)
Version: 2.2.9
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: MassUpdate
Depends on:
Blocks:
 
Reported: 2010-02-19 13:47 UTC by Jean-Sébastien Frerot
Modified: 2018-11-07 21:09 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jean-Sébastien Frerot 2010-02-19 13:47:28 UTC
here is the configuration

<Proxy balancer://images>
    BalancerMember http://10.100.0.28 retry=60 timeout=5
    BalancerMember http://10.100.0.29 retry=60 timeout=5
    BalancerMember http://10.100.0.30 retry=60 timeout=5
    BalancerMember http://10.100.0.44 retry=60 timeout=5
    ProxySet lbmethod=bytraffic timeout=5
    Order deny,allow
    Deny from all
    Allow from all 
</Proxy>

RewriteRule ^/common(.*) balancer://images$1 [P]
ProxyPassReverse /common balancer://images

How to reproduce the problem:
set one of the balancer member to drop all reply packets returning to the requester server: 
[10.100.0.30]: sudo iptables -A OUTPUT -d 10.100.0.89 -j DROP

if you go to the balancer-manager page of 10.100.0.89 you will see that this node http://10.100.0.30 will flip forth and back from "ok" to "err" without considering the 60 seconds retry parameter. And by looking at the Elected field, you'll see requests still going to this server.

tcpdump results:
12:46:11.468623 IP 10.100.0.89.39148 > 10.100.0.30.80: . ack 2312 win 23 <nop,nop,timestamp 925863575 1017403,nop,nop,sack 1 {2311:2312}>
12:46:11.495487 IP 10.100.0.89.39550 > 10.100.0.30.80: S 1784562342:1784562342(0) win 5840 <mss 1460,sackOK,timestamp 925863582 0,nop,wscale 9>
12:46:11.520143 IP 10.100.0.89.39050 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863588 1012082>
12:46:11.555322 IP 10.100.0.89.37903 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863597 1000084>
12:46:11.571507 IP 10.100.0.89.37680 > 10.100.0.30.80: F 526220930:526220930(0) ack 4086845075 win 17 <nop,nop,timestamp 925863601 997939>
12:46:11.571512 IP 10.100.0.89.39551 > 10.100.0.30.80: S 1775937170:1775937170(0) win 5840 <mss 1460,sackOK,timestamp 925863601 0,nop,wscale 9>
12:46:11.753822 IP 10.100.0.89.39471 > 10.100.0.30.80: S 1738586911:1738586911(0) win 5840 <mss 1460,sackOK,timestamp 925863647 0,nop,wscale 9>
12:46:11.774826 IP 10.100.0.89.38406 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863652 1005140>
12:46:11.778066 IP 10.100.0.89.37680 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863653 997939>
12:46:11.849827 IP 10.100.0.89.39139 > 10.100.0.30.80: F 459:459(0) ack 504 win 14 <nop,nop,timestamp 925863671 1013569>
12:46:11.933836 IP 10.100.0.89.39050 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863692 1012082>
12:46:12.001753 IP 10.100.0.89.38976 > 10.100.0.30.80: F 1357893028:1357893028(0) ack 641429889 win 142 <nop,nop,timestamp 925863708 1011320>
12:46:12.001775 IP 10.100.0.89.39552 > 10.100.0.30.80: S 1787212656:1787212656(0) win 5840 <mss 1460,sackOK,timestamp 925863708 0,nop,wscale 9>
12:46:12.015495 IP 10.100.0.89.39477 > 10.100.0.30.80: S 1748850970:1748850970(0) win 5840 <mss 1460,sackOK,timestamp 925863712 0,nop,wscale 9>
12:46:12.101656 IP 10.100.0.89.39483 > 10.100.0.30.80: S 1746295340:1746295340(0) win 5840 <mss 1460,sackOK,timestamp 925863733 0,nop,wscale 9>
12:46:12.194069 IP 10.100.0.89.37680 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863757 997939>
12:46:12.205816 IP 10.100.0.89.38976 > 10.100.0.30.80: F 0:0(0) ack 1 win 142 <nop,nop,timestamp 925863760 1011320>
12:46:12.206577 IP 10.100.0.89.39484 > 10.100.0.30.80: S 1741309974:1741309974(0) win 5840 <mss 1460,sackOK,timestamp 925863760 0,nop,wscale 9>
12:46:12.217813 IP 10.100.0.89.36664 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863763 991676>
12:46:12.325818 IP 10.100.0.89.38423 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863790 1005548>
12:46:12.518073 IP 10.100.0.89.38451 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863838 1006166>
12:46:12.563846 IP 10.100.0.89.39559 > 10.100.0.30.80: S 1796546847:1796546847(0) win 5840 <mss 1460,sackOK,timestamp 925863849 0,nop,wscale 9>
12:46:12.622322 IP 10.100.0.89.38976 > 10.100.0.30.80: F 0:0(0) ack 1 win 142 <nop,nop,timestamp 925863864 1011320>
12:46:12.770186 IP 10.100.0.89.39165 > 10.100.0.30.80: F 1343:1343(0) ack 1309 win 17 <nop,nop,timestamp 925863900 1016019>
12:46:12.770417 IP 10.100.0.89.39050 > 10.100.0.30.80: F 0:0(0) ack 1 win 17 <nop,nop,timestamp 925863900 1012082>
12:46:12.774981 IP 10.100.0.89.39489 > 10.100.0.30.80: S 1762397365:1762397365(0) win 5840 <mss 1460,sackOK,timestamp 925863902 0,nop,wscale 9>
12:46:12.789824 IP 10.100.0.89.39490 > 10.100.0.30.80: S 1751438195:1751438195(0) win 5840 <mss 1460,sackOK,timestamp 925863906 0,nop,wscale 9>
...


Note that this problem doesn't not occur if I shutdown apache. It only occurs if the server looses the network. However if I shutdown apache, then shutdown the server, after 60 seconds I will have the same problematic behavior.


Packages version on debian:
apache2           2.2.9-10+lenny6
apache2.2-common  2.2.9-10+lenny6
Comment 1 William A. Rowe Jr. 2018-11-07 21:09:35 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.