Bug 65697

Summary: Initial state of proxy backend is 'Ok' when healthcheck was not run yet
Product: Apache httpd-2 Reporter: hendrik.jaeger
Component: mod_proxy_hcheckAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: normal    
Priority: P2    
Version: 2.4.51   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description hendrik.jaeger 2021-11-23 09:51:34 UTC
Symptoms

mod_status reports a proxy balancer backend as 'Init Ok' for the duration of 1 hcinterval after startup


Environment

I compiled apache 2.4.51 from source on Debian stable (bullseye).
httpd -V output:
Server version: Apache/2.4.51 (Unix)
Server built:   Nov 15 2021 16:33:58
Server's Module Magic Number: 20120211:118
Server loaded:  APR 1.7.0, APR-UTIL 1.6.1
Compiled using: APR 1.7.0, APR-UTIL 1.6.1
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
Server compiled with....
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_PROC_PTHREAD_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=256
 -D HTTPD_ROOT="/home/henk/usr/local"
 -D SUEXEC_BIN="/home/henk/usr/local/bin/suexec"
 -D DEFAULT_PIDLOG="logs/httpd.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"


Steps to reproduce

httpd.conf:
ServerRoot "/home/henk/usr/local"
Listen 80
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule watchdog_module modules/mod_watchdog.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_hcheck_module modules/mod_proxy_hcheck.so
LoadModule slotmem_shm_module modules/mod_slotmem_shm.so
LoadModule lbmethod_byrequests_module modules/mod_lbmethod_byrequests.so
LoadModule unixd_module modules/mod_unixd.so
LoadModule status_module modules/mod_status.so
ServerAdmin you@example.com
ErrorLog "logs/error_log"
LogLevel warn
LogLevel proxy_hcheck:TRACE2 proxy:TRACE2 watchdog:TRACE2
<IfModule log_config_module>
    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
    LogFormat "%h %l %u %t \"%r\" %>s %b" common
    <IfModule logio_module>
      LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio
    </IfModule>
    CustomLog "logs/access_log" common
</IfModule>
<Proxy balancer://bz54657>
 BalancerMember http://hopefully.nonexistant.example.org hcmethod=GET
</Proxy>
<Location "/server-status">
    SetHandler server-status
</Location>
ProxyStatus On


request:
curl localhost/server-status?auto 2>/dev/null | grep -e Status -e Uptime


Expected result:
The worker should be marked 'Init HcFl' because its name cannot even be resolved.


Actual result:
% curl localhost/server-status?auto | grep -i -e status -e uptime
ServerUptimeSeconds: 29
ServerUptime: 29 seconds
Uptime: 29
ProxyBalancer[0]Worker[0]Status: Init Ok 
% curl localhost/server-status?auto | grep -i -e status -e uptime
ServerUptimeSeconds: 30
ServerUptime: 30 seconds
Uptime: 30
ProxyBalancer[0]Worker[0]Status: Init HcFl 


Suspected issues:
* The first health check is done 30s after startup because the default value of hcinterval (how often to do a healthcheck) is 30s.
* Backend workers are by default in the state 'Init Ok' until a healthcheck is run.


Expected behaviour:
* To be on the 'safe' side, backend workers should either not have a state directly after httpd startup until a healthcheck has run, that state should be 'undetermined' or 'failed' but not 'Ok'.
* The first healthcheck should be run during or immediately after httpd startup, not hcinterval after httpd startup.


Impact:
The mod_status page could be used for monitoring backend status and alerting if all backends are down but its content needs to be accurate for that to be useful.
Comment 1 hendrik.jaeger 2021-11-23 09:59:37 UTC
Some more context:
I was initially seeing a different behaviour: backends with non-resolvable hostnames (NXDOMAIN) were marked as 'Init Ok' constantly. I thought it was https://bz.apache.org/bugzilla/show_bug.cgi?id=54657.
Trying to reproduce it with the most recent release I was not successful but found the reported behaviour.
Even in the setting where we earlier (thought we) saw the behaviour, I was unable to reproduce it. Currently the only explanation is that we imagined things. I’ll keep an eye out for that and try to reproduce it and report a bug when I can.

Another note: https://bz.apache.org/bugzilla/show_bug.cgi?id=60757 sounded like a related issue.


Also, I forgot to provide logs:
[Tue Nov 23 10:43:53.933285 2021] [proxy:debug] [pid 789067:tid 140495479191360] mod_proxy.c(2693): AH01147: Defining worker 'http://hopefully.nonexistant.example.org' for balancer 'balancer://bz54657'
[Tue Nov 23 10:43:53.933303 2021] [proxy:debug] [pid 789067:tid 140495479191360] mod_proxy.c(2698): AH01148: Defined worker 'http://hopefully.nonexistant.example.org' for balancer 'balancer://bz54657'
[Tue Nov 23 10:43:53.933485 2021] [proxy:debug] [pid 789067:tid 140495479191360] proxy_util.c(1244): AH02337: copying shm[0] (0x7fc7a71a8018) for balancer://bz54657
[Tue Nov 23 10:43:53.933543 2021] [proxy:debug] [pid 789067:tid 140495479191360] proxy_util.c(2094): AH02338: copying shm[0] (0x7fc7a71a6018) for worker: http://hopefully.nonexistant.example.org
[Tue Nov 23 10:43:53.933549 2021] [proxy_hcheck:debug] [pid 789067:tid 140495479191360] mod_proxy_hcheck.c(1124): AH03265: watchdog callback registered (_proxy_hcheck_ for henksdebvm.henk.home.arpa)
[Tue Nov 23 10:43:53.933552 2021] [watchdog:debug] [pid 789067:tid 140495479191360] mod_watchdog.c(445): AH010033: Watchdog: Running with WatchdogInterval 1000ms
[Tue Nov 23 10:43:53.933555 2021] [watchdog:debug] [pid 789067:tid 140495479191360] mod_watchdog.c(454): AH02974: Watchdog: found parent providers.
[Tue Nov 23 10:43:53.933558 2021] [watchdog:debug] [pid 789067:tid 140495479191360] mod_watchdog.c(500): AH02977: Watchdog: found child providers.
[Tue Nov 23 10:43:53.933561 2021] [watchdog:debug] [pid 789067:tid 140495479191360] mod_watchdog.c(508): AH02978: Watchdog: Looking for child (_proxy_hcheck_).
[Tue Nov 23 10:43:53.933574 2021] [watchdog:debug] [pid 789067:tid 140495479191360] mod_watchdog.c(534): AH02979: Watchdog: Created singleton mutex (_proxy_hcheck_).
[Tue Nov 23 10:43:53.933576 2021] [watchdog:debug] [pid 789067:tid 140495479191360] mod_watchdog.c(508): AH02978: Watchdog: Looking for child (_singleton_).
[Tue Nov 23 10:43:53.933579 2021] [watchdog:debug] [pid 789067:tid 140495479191360] mod_watchdog.c(508): AH02978: Watchdog: Looking for child (_default_).
[Tue Nov 23 10:43:53.933584 2021] [core:warn] [pid 789067:tid 140495479191360] AH00098: pid file /home/henk/usr/local/logs/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
[Tue Nov 23 10:43:53.933973 2021] [watchdog:debug] [pid 789067:tid 140495479191360] mod_watchdog.c(582): AH02981: Watchdog: Created child worker thread (_proxy_hcheck_).
[Tue Nov 23 10:43:53.933984 2021] [proxy:debug] [pid 789067:tid 140495479191360] proxy_util.c(2117): AH00925: initializing worker http://hopefully.nonexistant.example.org shared
[Tue Nov 23 10:43:53.933988 2021] [proxy:debug] [pid 789067:tid 140495479191360] proxy_util.c(2177): AH00927: initializing worker http://hopefully.nonexistant.example.org local
[Tue Nov 23 10:43:53.934001 2021] [proxy:debug] [pid 789067:tid 140495479191360] proxy_util.c(2208): AH00930: initialized pool in child 789067 for (hopefully.nonexistant.example.org) min=0 max=25 smax=25
[Tue Nov 23 10:43:53.934008 2021] [proxy:debug] [pid 789067:tid 140495479191360] proxy_util.c(2117): AH00925: initializing worker proxy:reverse shared
[Tue Nov 23 10:43:53.934011 2021] [proxy:debug] [pid 789067:tid 140495479191360] proxy_util.c(2177): AH00927: initializing worker proxy:reverse local
[Tue Nov 23 10:43:53.934022 2021] [proxy:debug] [pid 789067:tid 140495479191360] proxy_util.c(2208): AH00930: initialized pool in child 789067 for (*) min=0 max=25 smax=25
[Tue Nov 23 10:43:53.934020 2021] [watchdog:debug] [pid 789067:tid 140495478236928] mod_watchdog.c(168): AH02972: Singleton Watchdog (_proxy_hcheck_) running
[Tue Nov 23 10:43:53.934061 2021] [proxy_hcheck:debug] [pid 789067:tid 140495478236928] mod_proxy_hcheck.c(959): AH03258: _proxy_hcheck_ watchdog started.
[Tue Nov 23 10:43:53.934382 2021] [proxy_hcheck:debug] [pid 789067:tid 140495478236928] mod_proxy_hcheck.c(973): AH03313: apr_thread_pool_create() with 16 threads succeeded
[Tue Nov 23 10:44:23.952678 2021] [proxy_hcheck:debug] [pid 789067:tid 140495478236928] mod_proxy_hcheck.c(480): AH03248: Creating hc worker 559e6400ce10 for http://hopefully.nonexistant.example.org:80
[Tue Nov 23 10:44:23.952730 2021] [proxy:debug] [pid 789067:tid 140495478236928] proxy_util.c(2117): AH00925: initializing worker 559e6400ce10 shared
[Tue Nov 23 10:44:23.952738 2021] [proxy:debug] [pid 789067:tid 140495478236928] proxy_util.c(2177): AH00927: initializing worker 559e6400ce10 local
[Tue Nov 23 10:44:23.952763 2021] [proxy:debug] [pid 789067:tid 140495478236928] proxy_util.c(2208): AH00930: initialized pool in child 789067 for (hopefully.nonexistant.example.org) min=0 max=25 smax=25
[Tue Nov 23 10:44:23.952940 2021] [proxy_hcheck:debug] [pid 789067:tid 140495208568576] mod_proxy_hcheck.c(891): AH03256: Threaded Health checking http://hopefully.nonexistant.example.org
[Tue Nov 23 10:44:23.952958 2021] [proxy:debug] [pid 789067:tid 140495208568576] proxy_util.c(2526): AH00942: HCOH: has acquired connection for (hopefully.nonexistant.example.org)
[Tue Nov 23 10:44:23.954936 2021] [proxy_hcheck:debug] [pid 789067:tid 140495208568576] mod_proxy_hcheck.c(548): AH03249: DNS lookup failure for: hopefully.nonexistant.example.org:80
[Tue Nov 23 10:44:23.954965 2021] [proxy:debug] [pid 789067:tid 140495208568576] proxy_util.c(2541): AH00943: HCOH: has released connection for (hopefully.nonexistant.example.org)
[Tue Nov 23 10:44:23.954973 2021] [proxy_hcheck:debug] [pid 789067:tid 140495208568576] mod_proxy_hcheck.c(586): AH03251: Health check GET Status (1) for 559e6400ce10.
[Tue Nov 23 10:44:23.954976 2021] [proxy_hcheck:info] [pid 789067:tid 140495208568576] AH03303: Threaded Health check DISABLING http://hopefully.nonexistant.example.org
[Tue Nov 23 10:44:54.056673 2021] [proxy_hcheck:debug] [pid 789067:tid 140495191783168] mod_proxy_hcheck.c(891): AH03256: Threaded Health checking http://hopefully.nonexistant.example.org
[Tue Nov 23 10:44:54.056714 2021] [proxy:debug] [pid 789067:tid 140495191783168] proxy_util.c(2526): AH00942: HCOH: has acquired connection for (hopefully.nonexistant.example.org)
[Tue Nov 23 10:44:54.058509 2021] [proxy_hcheck:debug] [pid 789067:tid 140495191783168] mod_proxy_hcheck.c(548): AH03249: DNS lookup failure for: hopefully.nonexistant.example.org:80
[Tue Nov 23 10:44:54.058536 2021] [proxy:debug] [pid 789067:tid 140495191783168] proxy_util.c(2541): AH00943: HCOH: has released connection for (hopefully.nonexistant.example.org)
[Tue Nov 23 10:44:54.058545 2021] [proxy_hcheck:debug] [pid 789067:tid 140495191783168] mod_proxy_hcheck.c(586): AH03251: Health check GET Status (1) for 559e6400ce10.