Bug 63010 - mod_proxy_hcheck when health checks configured, esp. vs down service, consumes large memory
Summary: mod_proxy_hcheck when health checks configured, esp. vs down service, consume...
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_hcheck (show other bugs)
Version: 2.4.37
Hardware: PC All
: P2 major with 2 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks: 61499
  Show dependency tree
 
Reported: 2018-12-13 21:18 UTC by CFindley
Modified: 2019-06-14 06:09 UTC (History)
1 user (show)



Attachments
error.log output covering 2-tenths of a second/5k lines (847.00 KB, text/plain)
2018-12-13 21:22 UTC, CFindley
Details

Note You need to log in before you can comment on or make changes to this bug.
Description CFindley 2018-12-13 21:18:29 UTC
Defect related to mod_proxy_hcheck

Found/repro's in Apache httpd version 2.4.33, 2.4.35, and 2.4.37 (mod_proxy_hcheck module) on Windows.

We added the hcheck module and then noticed that httpd started consuming large amounts of memory. In previous versions before configuring the module, httpd would be stable at about 35MB of ram. With this module, the memory would grow to 3GB and stabilize. One repro actually had it grow to 17GB and the machine ran out of pagefile space and crashed. We noticed that in the default configuration we were checking a service that was not running. After configuring this service to run, the memory used dropped significantly. Instead of 3GB, it grew to 150MB and stabilized.

Even with the service running, there is a period of time shortly after startup during which the healthcheck runs hundreds of times for each service each second. During this time the memory consumption steadily grows. It does this for about 5 seconds, generating thousands of lines of (trace) level logging and running the health check over a thousand times for each service. Then for some reason it stabilizes, and checks are seen at a more normal 30 second interval. But the memory is not released and the process still has an elevated amount of memory. This might be due to the healthcheck module just requiring more memory than running without it. But it is concerning as it seems like increased memory consumption is related to checking a service which is down. If a service was down, you have one problem, but if the healthcheck itself starts consuming tons of resources on top of that, you compound the problem.


278 times in one second:
Condition ok1234 for 6af324730 (http://localhost:8843): passed

This sequence repeated (this is logging from one single thread checking one endpoint) (please see attached file for a full one-second interval of logging, 15k+ lines):
[Thu Dec 13 17:58:40.618694 2018] [proxy_hcheck:debug] [pid 21288:tid 748] mod_proxy_hcheck.c(829): AH03256: Threaded Health checking http://localhost:8843
[Thu Dec 13 17:58:40.618694 2018] [proxy:debug] [pid 21288:tid 748] proxy_util.c(2313): AH00942: HCOH: has acquired connection for (localhost)
[Thu Dec 13 17:58:40.619694 2018] [proxy:trace2] [pid 21288:tid 748] proxy_util.c(3010): HCOH: fam 2 socket created to connect to localhost
[Thu Dec 13 17:58:40.619694 2018] [proxy:debug] [pid 21288:tid 748] proxy_util.c(3042): AH02824: HCOH: connection established with 127.0.0.1:8843 (localhost)
[Thu Dec 13 17:58:40.620696 2018] [proxy:debug] [pid 21288:tid 748] proxy_util.c(3212): AH00962: HCOH: connection complete to 127.0.0.1:8843 (localhost)
[Thu Dec 13 17:58:40.649654 2018] [proxy_hcheck:debug] [pid 21288:tid 748] mod_proxy_hcheck.c(644): AH03254: HTTP/1.1 200 OK
[Thu Dec 13 17:58:40.649654 2018] [proxy_hcheck:trace2] [pid 21288:tid 748] mod_proxy_hcheck.c(797): Condition ok1234 for 6af324730 (http://localhost:8843): passed
[Thu Dec 13 17:58:40.649654 2018] [proxy:debug] [pid 21288:tid 748] proxy_util.c(2328): AH00943: HCOH: has released connection for (localhost)
[Thu Dec 13 17:58:40.649654 2018] [proxy_hcheck:debug] [pid 21288:tid 748] mod_proxy_hcheck.c(573): AH03251: Health check GET Status (0) for 6af324730.


excerpts from httpd.conf

...
# This defines a template for use by proxy_hcheck_module.
# Some other specific parameters are set directly on the BalancerMember lines.
ProxyHCExpr ok1234 {%{REQUEST_STATUS} =~ /^[1234]/}
ProxyHCTemplate hcsimple hcmethod=GET hcexpr=ok1234 hcuri=/favicon.ico
...
<Proxy balancer://samlservice-cluster>
BalancerMember http://localhost:8843 route=1 hcinterval=30 hcfails=2 hcpasses=1 hctemplate=hcsimple
</Proxy>
...
LogLevel debug
LogLevel proxy_hcheck:TRACE2
LogLevel proxy:TRACE2
LogLevel watchdog:TRACE2
...

Related:
Have fix for this issue in place
https://bz.apache.org/bugzilla/show_bug.cgi?id=62318

This issue has similarities - note comments about consuming 20GB heap space:
https://bugs.httpd.apache.narkive.com/tyCY7SPh/bug-61499-new-tcp-healthchecks-failing-falsely-not-actually-checking

Also noticed this issue on Reddit
https://www.reddit.com/r/apache/comments/9rs4cd/apache_24_memory_leak_when_using_mod_proxy/
Comment 1 CFindley 2018-12-13 21:22:34 UTC
Created attachment 36324 [details]
error.log output covering 2-tenths of a second/5k lines
Comment 2 CFindley 2018-12-14 01:05:16 UTC
Note: we have a very lengthy and aged httpd.conf file which has cruft from years of fixing defects and adding new options, so it is entirely possible this is some artifact of that. I did try setting up a very simple configuration which just health checks a single non-existent service, and it seems to be well behaved. I can't share the config file as it contains sensitive (and embarrassing) information unfortunately. :)
Comment 3 pawel.c 2019-04-03 22:49:29 UTC
Replicable also on 39, Windows Server 2012 

It is about health check memory leaks depends on frequency.

To reproduce...
(1) You have to have more than one high freq health check like...
ProxyHCExpr wsOK {%{REQUEST_STATUS} =~ /405|200/}
ProxyHCTemplate X1 hcexpr=wsOK hcmethod=GET hcuri=WHATEVER_WORKS_FOR_YOU_1 hcinterval=5 hcpasses=2 hcfails=1
<Proxy balancer://clusterFxAgregation>
BalancerMember http://127.0.0.1:10179 route=127.0.0.1 retry=0 hctemplate=X1
BalancerMember http://10.115.51.15:10179 status=+H route=10.115.51.15 retry=0 hctemplate=X1
BalancerMember http://10.115.51.27:10179 status=+H route=10.115.51.27 retry=0 hctemplate=X1
</Proxy>

ProxyHCTemplate X2 hcexpr=wsOK hcmethod=GET hcuri=WHATEVER_WORKS_FOR_YOU_2 hcinterval=5 hcpasses=2 hcfails=1
<Proxy balancer://clusterFxAgregation>
BalancerMember http://127.0.0.1:10178 route=127.0.0.1 retry=0 hctemplate=X2
BalancerMember http://10.115.51.15:10178 status=+H route=10.115.51.15 retry=0 hctemplate=X2
BalancerMember http://10.115.51.27:10178 status=+H route=10.115.51.27 retry=0 hctemplate=X2
</Proxy>

here those are defined as two virtual hosts in the same conf

(2) Some nodes must be down. For instance 127.0.0.1 in example above

(3) Observe memory. Leak spead depends on number of defined checks and their freq. With configuration like above and around 20 virtual hosts it should be like 10MB/min