Bug 61499 - TCP healthchecks failing falsely / not actually checking
Summary: TCP healthchecks failing falsely / not actually checking
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_hcheck (show other bugs)
Version: 2.4.27
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on: 63010
Blocks:
  Show dependency tree
 
Reported: 2017-09-06 18:26 UTC by M Jackson
Modified: 2019-06-14 06:09 UTC (History)
1 user (show)



Attachments
patch removing the line, based on trunk (881 bytes, patch)
2018-09-11 13:40 UTC, Dominik Stillhard
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description M Jackson 2017-09-06 18:26:11 UTC
TCP based healthchecks that functioned in 2.4.25 are failing as of 2.4.27

Configuration example:
<Proxy balancer://default>
  ProxyHCTemplate tcpvarnish hcmethod=tcp hcinterval=2 hcpasses=3 hcinterval=2
  BalancerMember http://172.28.211.254:6081 loadfactor=95 hctemplate=tcpvarnish
  BalancerMember http://172.28.212.107:6081 loadfactor=5 hctemplate=tcpvarnish
</Proxy>

ProxyPass "/balancer-manager" !
ProxyPass "/" "balancer://default/"
ProxyPassReverse "/" "balancer://default/"
ProxyPreserveHost on

<Location '/balancer-manager'>
    SetHandler balancer-manager
    Require all denied
    Require ip 172.16.0.0/12
</Location>

Config worked perfectly in 2.4.25 and prior.  After updating to 2.4.27, the balancer manager shows "Init HcFl" for both balancer members.  I do not see the TCP health checks actually attempting to run if I examine via tcpdump.  Reverting to 2.4.25 fixes issue.

Happens in amzn-linux build as well as a build from source.

tcpdump on the host that should be receiving the check is showing no activity.

Debug logs show the following:
[Wed Sep 06 17:40:51.549378 2017] [proxy:debug] [pid 4058:tid 140411104278272] proxy_util.c(2156): AH00942: HCTCP: has acquired connection for (172.28.211.254)
[Wed Sep 06 17:40:51.549380 2017] [proxy:debug] [pid 4058:tid 140411028743936] proxy_util.c(2156): AH00942: HCTCP: has acquired connection for (172.28.212.107)
[Wed Sep 06 17:40:51.549382 2017] [proxy:debug] [pid 4058:tid 140411104278272] proxy_util.c(2171): AH00943: HCTCP: has released connection for (172.28.211.254)
[Wed Sep 06 17:40:51.549384 2017] [proxy:debug] [pid 4058:tid 140411028743936] proxy_util.c(2171): AH00943: HCTCP: has released connection for (172.28.212.107)
[Wed Sep 06 17:40:51.549396 2017] [proxy_hcheck:debug] [pid 4058:tid 140411104278272] mod_proxy_hcheck.c(561): AH03251: Health check TCP Status (-1) for 562e10aa0f40.
[Wed Sep 06 17:40:51.549399 2017] [proxy_hcheck:debug] [pid 4058:tid 140411028743936] mod_proxy_hcheck.c(561): AH03251: Health check TCP Status (-1) for 562e10aa1520.

root@172.28.211.167 (testwaf) [httpd]# nmap -p 6081 172.28.212.107

Starting Nmap 6.40 ( http://nmap.org ) at 2017-09-06 17:55 UTC
Nmap scan report for ip-172-28-212-107.ec2.internal (172.28.212.107)
Host is up (0.0013s latency).
PORT     STATE SERVICE
6081/tcp open  unknown
Comment 1 Luca Toscano 2017-10-29 21:32:17 UTC
Hi,

it might be due to https://github.com/apache/httpd/commit/77ebb516535eecd90b458c52647b08a4da82e84e

Any chance that you could try the 'no-proxy' variable via SetEnvIf?
Comment 2 Dominik Stillhard 2018-09-11 13:40:25 UTC
Created attachment 36143 [details]
patch removing the line, based on trunk
Comment 3 Dominik Stillhard 2018-09-11 13:42:06 UTC
I tested this in version 2.3.34 and the bug still exists. 

The problem is, that compare to http checks, in the function  ap_proxy_connect_backend (proxy_util.c) the backend_addr  is NULL.
Thats why we never fall in the while-loop in that case, because ap_proxy_check_connection returns APR_ENOSOCKET then.

This is because of this line in the function check_tcp (mod_proxy_hcheck.c): 
       backend->addr = hc->cp->addr;  

But this is already done in the function hc_get_backend (mod_proxy_hcheck.c), which is called just two lines before:
     (*backend)->addr = hc->cp->addr;

I don’t completely understand why this double-copy leads to backend->addr  being NULL, but removing the line solves the problem.

I have tested this with http 2.4.34 and tcp checks arrive at the backend server.
The patch (based on trunk) simply removes this line, anyway i have attached it.
Comment 5 Jim Jagielski 2018-09-11 15:02:53 UTC
Yes, it does not do the double-set for HTTP so this is likely fluff left over from the rework around 2.4.27.

Thx for the patch
Comment 6 AK 2018-09-18 08:50:43 UTC
Hi,
I have just patched version 2.4.34 (removed the line) and TCP health check started to work. But httpd process is still consuming more and more memory. In approx. 20 hours it rised in heap up to 20GB. I have 260 TCP hchecks configured.
Comment 7 Jim Jagielski 2018-09-18 13:23:33 UTC
By "still" do you mean it was doing so before as well?
Comment 8 Graham Leggett 2018-09-18 21:31:16 UTC
Backported to 2.4.35.
Comment 9 AK 2018-09-20 05:35:58 UTC
Memory consumption started to rise after the patch was applied. There were no this problem before the patch.
Comment 10 Christophe JAILLET 2019-06-14 06:09:12 UTC
Closing because the reported issue (i.e. regression in 2.4.27) has been fixed and confirmed (see comment #6)

The memory issue is already reported in bug 63010, so I've just added a 'Depends on tag'