Bug 60948 - Large TCP timeout delays hcheck disabling a node
Summary: Large TCP timeout delays hcheck disabling a node
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_hcheck (show other bugs)
Version: 2.4.25
Hardware: Sun Solaris
: P2 enhancement (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2017-03-31 14:59 UTC by Michael Renz
Modified: 2019-06-27 05:47 UTC (History)
0 users



Attachments
added new hcconnectiontimeout parameter (1.55 KB, patch)
2017-03-31 14:59 UTC, Michael Renz
Details | Diff
I forgot to allow it in ProxyHCTemplate and the parameter is now optional (3.19 KB, patch)
2017-04-03 13:19 UTC, Michael Renz
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Renz 2017-03-31 14:59:41 UTC
Created attachment 34892 [details]
added new hcconnectiontimeout parameter

Using latest patched mod_proxy_hcheck (with patch from bug 60071) I encountered a problematic situation. 
If a node goes down due to a complete failure and is not reachable via tcp/ip anymore, the long solaris tcp/ip timeout causes mod_proxy_hcheck to DISABLE the node very late. 
mod_proxy_hcheck does not provide a connection-timeout parameter to shorten this.
On top, the threadpool defined via ProxyHCTPsize quickly fills up and uses all available threads waiting for the timeout. The workaround is to increase ProxyHCTPsize to e.g. 500. But the problem remains, that once the node goes down it is not DISABLED until the first timeout has been reached. Solaris has a timeout of about 120s, therefore the problematic node will still get requests during this time. These requests will run into the "connectiontimeout", but this is still not a good situation as it slows down many requests.

I have patched (well, more copy/paste) the mod_proxy_hcheck.c and added a new parameter called "hcconnectiontimeout". With this new parameter my tests look good now.
Example configuration would look like this:

   SSLProxyEngine On
   SSLProxyVerify none
   SSLProxyCheckPeerCN off
   SSLProxyCheckPeerName off
   SSLProxyCheckPeerExpire off

   ProxyHCTPsize 400
   ProxyHCExpr get {hc('body') =~ /OK/}
   ProxyHCTemplate server hcmethod=GET hcexpr=get hcfails=1 hcinterval=2 hcpasses=1 hcuri=/tester
   <Proxy balancer://group>
      BalancerMember https://192.168.0.2:8080 connectiontimeout=1 hcconnectiontimeout=1 hctemplate=server
      BalancerMember https://192.168.0.3:8080 connectiontimeout=1 hcconnectiontimeout=1 hctemplate=server
   </Proxy>
<VirtualHost *:80>

   ProxyPass "/" "balancer://group/" failontimeout=On timeout=2
   ProxyPassReverse "/" "balancer://group/"

</VirtualHost>

I hope this helps anyone.
Comment 1 Michael Renz 2017-04-03 13:19:15 UTC
Created attachment 34893 [details]
I forgot to allow it in ProxyHCTemplate and the parameter is now optional

I forgot to allow it in ProxyHCTemplate and the parameter is now optional
Comment 2 Thomas Meyer 2019-06-03 07:42:03 UTC
Hi, any updates on this?

an independent timeout for the health check http request would be really helpful!

the patch looks okay, any thing that I can do to get this merged?
Comment 3 jfclere 2019-06-27 05:47:32 UTC
it is confusing to have connectiontimeout and hcconnectiontimeout. 
I have committed http://svn.apache.org/viewvc?view=revision&revision=1862014
if that helps please close the BZ.