Bug 65079

Summary: SSL Handshake failure causes requests to not be sent to load balanced application
Product: Apache httpd-2 Reporter: David Betterton <dbetterton>
Component: mod_proxy_httpAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: major    
Priority: P2    
Version: 2.4.43   
Target Milestone: ---   
Hardware: Other   
OS: Linux   

Description David Betterton 2021-01-14 14:32:10 UTC
We have a reverse proxy setup in AWS that is getting regular attacks.  For some of the attacks we get an SSL Handshake error - see below, when this happens the proxy stops sending traffic to that server for around 15 minutes.


[Thu Dec 03 07:44:29.063701 2020] [proxy:error] [pid 3139:tid 139723423061760] (20014)Internal error (specific information not available): [client 193.16.9.98:53038] AH01084: pass request body failed to 10.21.0.252:8443 (<redacted>.com)
[Thu Dec 03 07:44:29.063720 2020] [proxy:error] [pid 3139:tid 139723423061760] [client 193.16.9.98:53038] AH00898: Error during SSL Handshake with remote server returned by /spip/
[Thu Dec 03 07:44:29.063724 2020] [proxy_http:error] [pid 3139:tid 139723423061760] [client 193.16.9.98:53038] AH01097: pass request body failed to 10.21.0.252:8443 (<redacted>.com) from 193.16.9.98 ()

At one point we were getting this error (All workers are in error state), however this is no longer appearing, but the application server is seeing the same effect 

[Fri Dec 18 06:10:04.804824 2020] [proxy:error] [pid 20740:tid 140712213456640] (20014)Internal error (specific information not available): [client 54.234.1.200:51298] AH01084: pass request body failed to 10.21.0.252:8443 (ukuappc2.agileassets.com)
[Fri Dec 18 06:10:04.804850 2020] [proxy:error] [pid 20740:tid 140712213456640] [client 54.234.1.200:51298] AH00898: Error during SSL Handshake with remote server returned by /.env
[Fri Dec 18 06:10:04.804853 2020] [proxy_http:error] [pid 20740:tid 140712213456640] [client 54.234.1.200:51298] AH01097: pass request body failed to 10.21.0.252:8443 (ukuappc2.agileassets.com) from 54.234.1.200 ()
[Fri Dec 18 06:10:06.284584 2020] [proxy_balancer:error] [pid 20657:tid 140712548968192] [client 90.248.114.57:52517] AH01167: balancer://mybalancer: All workers are in error state for route (worker2), referer: https://cps.agileassets.com/ams_m25_prd/Kernel/w_work.jsp?AA_SID=0376d5de-d09c-4d6d-8c5f-63fa0b1684f8&window_id=14_br_inspection_manage
[Fri Dec 18 06:10:11.761057 2020] [proxy_balancer:error] [pid 20740:tid 140712347674368] [client 90.248.114.57:52520] AH01167: balancer://mybalancer: All workers are in error state for route (worker2), referer: https://cps.agileassets.com/ams_m25_prd/Kernel/w_work.jsp?AA_SID=0376d5de-d09c-4d6d-8c5f-63fa0b1684f8&window_id=14_br_inspection_manage
Comment 1 David Betterton 2021-01-14 14:35:39 UTC
This bug has the effect of creating a DoS situation from the attack, so it appears to be a security vulnerability
Comment 2 Yann Ylavic 2021-01-14 14:53:34 UTC
Aren't you looking for the proxy worker/BalancerMember parameter "retry=" (or eventually "status=+i") described in [1]?

[1] https://httpd.apache.org/docs/2.4/en/mod/mod_proxy.html#proxypass
Comment 3 David Betterton 2021-01-14 15:25:43 UTC
(In reply to Yann Ylavic from comment #2)
> Aren't you looking for the proxy worker/BalancerMember parameter "retry="
> (or eventually "status=+i") described in [1]?
> 
> [1] https://httpd.apache.org/docs/2.4/en/mod/mod_proxy.html#proxypass

This is what we have - 

<Proxy balancer://mybalancer>
    BalancerMember https://ukuappc1.agileassets.com:8443 route=worker1 redirect=worker2 timeout=3600 retry=900
    BalancerMember https://ukuappc2.agileassets.com:8443 route=worker2 redirect=worker1 timeout=3600 retry=900
</Proxy>

Would using forcerecovery=On be a better option ?

I don't see an option to "try to recover immediately, but only once (or a small number)"
Comment 4 Yann Ylavic 2021-01-14 16:48:14 UTC
(In reply to David Betterton from comment #3)
> 
> <Proxy balancer://mybalancer>
>     BalancerMember https://ukuappc1.agileassets.com:8443 route=worker1
> redirect=worker2 timeout=3600 retry=900
>     BalancerMember https://ukuappc2.agileassets.com:8443 route=worker2
> redirect=worker1 timeout=3600 retry=900
> </Proxy>

This configuration implies that when any BalancerMember is in error state (like after the error from comment 1), it won't be retried/reused before 15 minutes.
I don't know your environment but it's quite a high value for me, aren't those errors transient (how long)?
If the other BalancerMember also encounters an error during these 15 minutes than "All workers are in error state" and your service bacomes unavailable.

> 
> Would using forcerecovery=On be a better option ?

This would try to recover if all the workers are in error state, thus never fail without having tried. It can be used in addition to your existing configuration.
(Note that forcerecovery=on goes on the ProxyPass line or with a ProxySet in the above <Proxy> block, not with each BalancerMember.)

If you don't want forcerecovery, you should consider lowering retry= at least.
Comment 5 David Betterton 2021-01-14 17:42:39 UTC
(In reply to Yann Ylavic from comment #4)
> (In reply to David Betterton from comment #3)
> > 
> > <Proxy balancer://mybalancer>
> >     BalancerMember https://ukuappc1.agileassets.com:8443 route=worker1
> > redirect=worker2 timeout=3600 retry=900
> >     BalancerMember https://ukuappc2.agileassets.com:8443 route=worker2
> > redirect=worker1 timeout=3600 retry=900
> > </Proxy>
> 
> This configuration implies that when any BalancerMember is in error state
> (like after the error from comment 1), it won't be retried/reused before 15
> minutes.
> I don't know your environment but it's quite a high value for me, aren't
> those errors transient (how long)?
> If the other BalancerMember also encounters an error during these 15 minutes
> than "All workers are in error state" and your service becomes unavailable.
> 
> > 
> > Would using forcerecovery=On be a better option ?
> 
> This would try to recover if all the workers are in error state, thus never
> fail without having tried. It can be used in addition to your existing
> configuration.
> (Note that forcerecovery=on goes on the ProxyPass line or with a ProxySet in
> the above <Proxy> block, not with each BalancerMember.)
> 
> If you don't want forcerecovery, you should consider lowering retry= at
> least.

Thanks - we'll try lowering this and report back here
Comment 6 Ruediger Pluem 2021-01-14 20:27:10 UTC
(In reply to Yann Ylavic from comment #4)
> (In reply to David Betterton from comment #3)

> > 
> > Would using forcerecovery=On be a better option ?
> 
> This would try to recover if all the workers are in error state, thus never
> fail without having tried. It can be used in addition to your existing
> configuration.

Isn't forcerecovery=on the default?
Comment 7 Ruediger Pluem 2021-01-14 20:41:44 UTC
(In reply to David Betterton from comment #0)

> 140712548968192] [client 90.248.114.57:52517] AH01167:
> balancer://mybalancer: All workers are in error state for route (worker2),
> referer:

This error messages can only appear if nofailover is set to on either in the ProxyPass for this balancer or via ProxySet in the Proxy block. It does not show up in the Proxy block configuration below. What are your ProxyPass directives?
Do you use
BalancerPersist on 
somewhere in your configuration or do you have the balancer manager enabled
(http://httpd.apache.org/docs/2.4/mod/mod_proxy_balancer.html#balancer_manager) and an admin set nofailover dynamically?