Bug 47011 - mod_proxy/mod_proxy_balancer hot-standby BalancerMembers not taking over immediately
Summary: mod_proxy/mod_proxy_balancer hot-standby BalancerMembers not taking over imme...
Status: RESOLVED LATER
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_balancer (show other bugs)
Version: 2.2.8
Hardware: PC Linux
: P1 regression with 3 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: MassUpdate
Depends on:
Blocks:
 
Reported: 2009-04-09 13:32 UTC by mwhiteley
Modified: 2018-11-07 21:09 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mwhiteley 2009-04-09 13:32:33 UTC
After upgrading from Apache 2.2.6 to Apache 2.2.8 or greater, when gracefully shutting down one of our Embedded Tomcats on our application server, we notice a temporary outage (503) from our proxy balancer before the hot-standby (status=+H) takes over.

Layout:
- Application proxy server (Apache 2.2.8)
   - Proxies requests via mod_proxy/mod_proxy_balancer/mod_proxy_ajp to application server
- Application server (Java)
   - Runs master Java application server with Embedded Tomcat (Tomcat/5.5.17) on port 8009
   - Runs slave Java application server with Embedded Tomcat (Tomcat/5.5.17) on port 8008

When the application proxy server was Apache 2.2.6, we were able to gracefully shutdown the master Tomcat server (calling embedded.stop()), and the Hot Standby (status=+H) BalancerMember would immediately start serving requests.  After the upgrade to 2.2.8, we see "HTTP/1.1 503 This application is not currently available" while the active BalancerMember is shutting down before the hot-standby takes over (~1 second). These errors appear in both the Tomcat access log and the Apache access log.

I have recompiled and tested Apache versions 2.2.6, 2.2.8, 2.2.9 and 2.2.11 to verify that this problem exists after version 2.2.6.  I still had the problem when switching the BalancerMember protocol from ajp:// to http://, so I think this rules out AJP-specific issues.  I was unable to reproduce the problem using Apache servers as the BalancerMembers, so I'm speculating this has something to do with the interaction with Tomcat.

Proxy conf:
=================================
ProxyPassMatch /(.+/)?application.server$	balancer://production_server

<Proxy balancer://production_server/>
	BalancerMember ajp://server.domain.tld:8009/	lbset=1	retry=10 loadfactor=100
	BalancerMember ajp://server.domain.tld:8008/	lbset=2	retry=10 status=+H

	ProxySet lbmethod=bytraffic
</Proxy>
=================================
Comment 1 Ruediger Pluem 2009-04-09 13:40:10 UTC
Can you please check if the following patch fixes your issue (:http://svn.apache.org/viewvc/httpd/httpd/branches/2.2.x/modules/proxy/mod_proxy.c?r1=713145&r2=739610&view=patch):

--- httpd/httpd/branches/2.2.x/modules/proxy/mod_proxy.c	2008/11/11 20:01:59	713145
+++ httpd/httpd/branches/2.2.x/modules/proxy/mod_proxy.c	2009/01/31 20:58:07	739610
@@ -1002,8 +1002,10 @@
              * We can not failover to another worker.
              * Mark the worker as unusable if member of load balancer
              */
-            if (balancer)
+            if (balancer) {
                 worker->s->status |= PROXY_WORKER_IN_ERROR;
+                worker->s->error_time = apr_time_now();
+            }
             break;
         }
         else if (access_status == HTTP_SERVICE_UNAVAILABLE) {
@@ -1013,6 +1015,7 @@
              */
             if (balancer) {
                 worker->s->status |= PROXY_WORKER_IN_ERROR;
+                worker->s->error_time = apr_time_now();
             }
         }
         else {
Comment 2 mwhiteley 2009-04-09 14:25:17 UTC
i have tried the patch you sent in a clean httpd-2.2.8 and also tried patching  mod_proxy, mod_proxy_http, and mod_proxy_balancer in httpd-2.2.11 up to revision 763402 with the same results.
Comment 3 mwhiteley 2009-04-22 12:30:27 UTC
After further research, i've traced this to proxy_util.c r582620 (which references PR 43472). Reverting just this change in 2.2.11 resolves the issue described in this report.
Comment 4 Ruediger Pluem 2009-04-22 13:35:51 UTC
Which connector are you using in Tomcat? The APR connector or the classic blocking connector?

Does it help when you add

ping=1

to the other BalancerMember parameters?
Comment 5 mwhiteley 2009-04-22 14:22:14 UTC
i'm not exactly sure if this is the answer you need but, we're using an  Embedded tomcat, and setting up the AJP connector as follows:
         Connector ajpConnector  = embedded.createConnector((java.net.InetAddress) null, this.ajpPort, "ajp");
         embedded.addConnector(ajpConnector);

Results of adding "ping=1" to the BalancerMembers:

Apache 2.2.11:
- Wouldn't start, output the following:
BalancerMember Ping/Pong timeout has wrong format

Apache 2.2.9:
- During graceful shutdown outlined previously, balancer returned:
HTTP/1.1 500 Internal Server Error
- Logged:
[Wed Apr 22 16:05:53 2009] [error] (70014)End of file found: ajp_ilink_receive() can't receive header

Apache 2.2.8:
- Balancer always returned:
HTTP/1.1 503 Service Temporarily Unavailable
- Logged:
[Wed Apr 22 16:00:50 2009] [error] ajp_msg_append_uint8(): BufferOverflowException 4 4
[Wed Apr 22 16:00:50 2009] [error] ajp_handle_cping_cpong: ajp_marshal_into_msgb failed
[Wed Apr 22 16:00:50 2009] [error] (120001)APR does not understand this error code: proxy: AJP: cping/cpong failed to (null) (xxx.xxxxxxxxxx.com)
Comment 6 Ruediger Pluem 2009-04-22 23:16:11 UTC
Ah, my bad. To get this working with 2.2.11 you should apply

http://svn.apache.org/viewvc/apr/apr/branches/1.3.x/strings/apr_strings.c?r1=727605&r2=727604&pathrev=727605&view=patch

first (its from http://svn.apache.org/viewvc?view=rev&revision=727605).
Please try again afterwards with 2.2.11.
Comment 7 mwhiteley 2009-04-23 10:30:19 UTC
After applying the apr_strings.c patch on 2.2.11, when gracefully shutting down the primary tomcat, Apache logs the following once:

[Thu Apr 23 12:07:59 2009] [error] (111)Connection refused: proxy: AJP: attempt to connect to xxx.xxx.xxx.xxx:8009 (xxxxxxx.xxxxxxx.com) failed
[Thu Apr 23 12:07:59 2009] [error] ap_proxy_connect_backend disabling worker for (xxxxxxx.xxxxxxx.com)
[Thu Apr 23 12:07:59 2009] [error] proxy: AJP: failed to make connection to backend: xxxxxxx.xxxxxxx.com

And the hot-standby takes the request as expected without an outage on the front-end.

The above worked with and without "ping=1".
Comment 8 William A. Rowe Jr. 2018-11-07 21:09:04 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.