Greetings Apache Dev's; A bug was filed (47207) mid last year under the expectation that HTTPD should mark servers in error state on a 500 error status return. While I disagree that this should be universal, we have found ourselves in situations where we need the ability to mark members as PROXY_WORKER_IN_ERROR automatically on certain status code returns. This varies heavily by application and backend, but is a feature worth adding. I feel that this should be a separate bug report since this is adding a server administrator configuration parameter rather than applying the proposed change in 47207. Problem examples: Apache HTTPD server as reverse proxy to two WebSphere application servers Some applications take a significant amount of time to initialize. Until they are done initializing, WebSphere will rightfully return a 503. Depending on how long this application may take to initialize (30 minutes in the extreme cases of in-memory databases), this could inadvertently leave a member in service during a period of time where it can not service requests. Apache HTTPD server as reverse proxy to another HTTPD reverse proxy Sometimes DMZ segments are broken up in a way such that only reverse proxies with specific proxypass rules are allowed to traverse firewalls. Additionally, sometimes the data carried within those proxies is too sensitive to do in cleartext, so SSL may be needed. In some situations (server cert expired, client cert expired, misconfiguration), the target (second) HTTPD reverse proxy will throw a 502 that gets bubbled up to the first reverse proxy. Marking that proxy as unusable would be beneficial. Apache HTTPD server as reverse proxy to any WebSphere application server In WebSphere, it is possible to have an application deployed but not running. In the event an application is deployed but not started, the context root is not bound in the web container. When a request comes in for that context root, a 404 is returned. While it would be insane to mark a member out of service on every 404, this is just an example of a use case. Apache HTTPD server to any backend Some folks are brazen enough to say their application has handled all possible error conditions and that a 500 being returned to the user means the application server must be at fault and taken out of service. Again, this is madness, but there are other use cases (testing scripts) where taking instances out of service because of a 500 may be desirable. The proposed solution: Add a configuration directive for balancers called "ErrorOnStatus" with usage like: ErrorOnStatus=501,502,503,504 Construct a apr_array_header_t that is checked during the currently unused proxy_balancer_post_request method. My thoughts: I chose to use apr_array_header_t because it does not suggest a data type. My preference would be to use apr_hash_t since it seems it would be faster, but I am concerned that there is an expectation for character array datatypes for key and value. I am tidying up the patch now and will attach it soon.
Created attachment 25148 [details] Code modifications to support initial proposal
Created attachment 25150 [details] Update Noticed a mistake in the log message - was logging the name of the balancer instead of the worker.
Created attachment 25153 [details] Final proposed patch Update to set error_time in the proxy_worker_stat now that testing is complete.
Final patch added and functionality has been tested as follows: <Proxy balancer://App_cluster> BalancerMember http://127.0.0.1:8001 route=1 BalancerMember http://127.0.0.1:8002 route=2 ProxySet lbmethod=byrequests stickysession=App_STICKY nofailover=Off erroronstatus=500,502 </Proxy> 127.0.0.1 is answered by an Apache instance with valid CGI script and buggy CGI script (causing 500). Continuous hits to valid script: [1]http://127.0.0.1:8001 1 1 0 Ok 20 6.9K 3.1K [2]http://127.0.0.1:8002 2 1 0 Ok 18 6.2K 3.7K One hit to script that generates the 500 (500 returned to browser): [1]http://127.0.0.1:8001 1 1 0 Err 21 7.3K 3.7K [2]http://127.0.0.1:8002 2 1 0 Ok 18 6.2K 3.7K Several hits to valid script before 60 second retry time: [1]http://127.0.0.1:8001 1 1 0 Ok 34 12K 3.9K [2]http://127.0.0.1:8002 2 1 0 Err 20 6.9K 4.8K 2 hits after retry time expired: [1]http://127.0.0.1:8001 1 1 0 Ok 35 12K 3.9K [2]http://127.0.0.1:8002 2 1 0 Ok 21 7.3K 4.8K At the same time, I ran a test case with several hits to the buggy CGI script - as expected the force_recovery function forced the traffic through.
Thanks for the patch. Committed to trunk in r930125
*** Bug 47207 has been marked as a duplicate of this bug. ***
Created attachment 25788 [details] Final patch for 2.2 branch This is the final patch including doc for 2.2 branch.
Created attachment 25923 [details] Fixes the fix to be "failonstatus"
Added in 2.2.17