I noticed that, after a failed worker has recovered, no request is fowarded to it although it is marked as OK in balancer-manager : Load Balancer Manager for www.europarldv.ep.ec Server Version: Apache/2.2.12 (Unix) DAV/2 mod_ssl/2.2.12 OpenSSL/0.9.8e Server Built: Aug 5 2009 12:54:36 -------------------------------------------------------------------------------- LoadBalancer Status for balancer://websdi StickySession Timeout FailoverAttempts Method JSESSIONID|jsessionid 0 1 bybusyness Worker URL Route RouteRedir Factor Set Status Elected To From http://websdidv-node1.appsrv:64675 node1 1 0 Ok 250 81K 13M http://websdidv-node2.appsrv:64675 node2 1 0 Ok 51 16K 2.6M This issue does not occur with the default method (byrequests). Here is my configuration : ProxyPass /parliament/ balancer://websdi/parliament/ stickysession=JSESSIONID|jsessionid lbmethod=bybusyness scolonpathdelim=On <Proxy balancer://websdi> BalancerMember http://websdidv-node1.appsrv:64675 route=node1 BalancerMember http://websdidv-node2.appsrv:64675 route=node2 </Proxy> Server version: Apache/2.2.14 (Unix) Server built: Jan 28 2010 09:10:16 Server's Module Magic Number: 20051115:23 Server loaded: APR 1.3.9, APR-Util 1.3.9 Compiled using: APR 1.3.9, APR-Util 1.3.9 Architecture: 32-bit Server MPM: Worker threaded: yes (fixed thread count) forked: yes (variable process count) Server compiled with.... -D APACHE_MPM_DIR="server/mpm/worker" -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_FCNTL_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=128 -D HTTPD_ROOT="/local/products/revproxy" -D SUEXEC_BIN="/local/products/revproxy/bin/suexec" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" System = SunOS Node = eiciluxd5 Release = 5.9 KernelID = Generic_122300-36 Machine = sun4u BusType = <unknown> Serial = <unknown> Users = <unknown> OEM# = 0 Origin# = 1 NumCPU = 4
Created attachment 26123 [details] Fix by adding error handling and atomic functions The proposed bugfix uses a fix from #46215 (by Thomas Binder) by using atomic functions for increasing/decreasing busyness. I added proper error handling for unreachable workers (in this case the post_req function had never been called).
I confirm the bug cannot be re-produced with 2.2.17 Thanks!
Has this patch been applied to the 2.2.17 release? I'm confused because I don't seen anything in the changelog to reflect a fix.
Hello, Gents, Please ignore my previous comment (I confirm the bug cannot be re-produced with 2.2.17). The problem still exists in 2.2.17. It can be easily reproduced : stop one of of the nodes, watch the dashboard (Status=Err), restart the node, status will be OK but Elected will not evolve. The only way to fix it is to shutdown and restart Apache (not gracefully). Regards, Olivier
I'm having the same issue and would be happy to help diagnose the problem if it's unclear what is going on.
The original report was on 2.2.14 on Solaris, but the same behavior is happening for me with 2.2.17 on CentOS release 5.5 (Final), so it's more current than originally expressed, and doesn't seem to be platform specific.
Behaviour can be reproduced with Apache 2.2.19 on Solaris
This can be easily reproduced. The "busy" counter is not decreased when the worker tries to send something to node which is down. For example when receiving the "connection refused" from the server. Have balancer for 2 members. One member should be down. Start sending messages concurrently for some period of time (more then 60s due to "retry" timeout for disabled worker due to errors). Start the server which was down and continue sending messages. The started server will not be getting messages only when the "busy" counter for the second node will be high enough so balancers selects the first node as less busy. The balancer thinks that the first node is still busy handling request which failed with error due to lack of proper handling of "busy" value.
Created attachment 27900 [details] cleanup of counters added when disabled worker becomes usable added busy and lbstatus to the balancer-manager page.
I didn't see anything about these fixes being included in any of the recent releases (since 2.2.17). I'm still easily able to replicate the behavior under Apache 2.2.21 under Windows. There doesn't seem to be any way to get traffic routed to a restarted back-end instance without forcing a restart of Apache. The issue only happens when the load balancing method is set to bybusyness.
I've confirmed that the patch for mod_proxy_balancer.c (attachment by Amada C at 2011-11-04 16:57 UTC) successful fixed this bug when applied to httpd 2.2.20 on RHEL4 in our production environment. It also provides extra details on the balancer manager page, which is way cool!
I also confirm it perfectly works when applied to apache 2.2.21 (Linux / SunOS hosts in production) thanks a lot for this fix
Can we have that in the next release? PLEASE Patching every Apache version to make it stable is not fun.
I also can reproduce this error with 2.4.2
Thanks for your update, Zisis. I'm working on a patch for trunk/2.4.x and will update the bug when it is testable.
Hello, I've just checked the patch an Solaris 10 - i386 platform with apache 2.2.21 release and it doesn't work for me. After the failing backend server is coming back online, it will be no more elected. In the LB manager status, i notice that the priority counter from this server keeps on increasing. On the opposite the priority of the another server keeps on decreasing. I don't know if this is the reason. Regards, Christophe
(In reply to comment #16) > Hello, > I've just checked the patch an Solaris 10 - i386 platform with apache 2.2.21 > release and it doesn't work for me. After the failing backend server is > coming back online, it will be no more elected. In the LB manager status, i > notice that the priority counter from this server keeps on increasing. On > the opposite the priority of the another server keeps on decreasing. I don't > know if this is the reason. > > Regards, > > Christophe Was the patch applied manually? The symptomps are exactly like ones without the patch.
Yes, i confirm. Here are some details. I stopped the node1. After it has successfully stopped, the result of the balancer manager is the following (the priority for node1 is 172): Load Balancer Manager for eicixzl034 Server Version: Apache/2.2.21 (Unix) mod_ssl/2.2.21 OpenSSL/0.9.8k DAV/2 Server Built: May 23 2012 09:27:11 LoadBalancer Status for balancer://platosws StickySession Timeout FailoverAttempts Method JSESSIONID|jsessionid 0 1 bybusyness Worker URL Route RouteRedir Priority Factor Set Status Busyness Elected To From http://platoswsdv-node1.appsrv:54000 node1 172 1 0 Err 1 65 34K 69K http://platoswsdv-node2.appsrv:54010 node2 -170 1 0 Ok 0 406 227K 203K Then i restarted node1 and after successfull restart and some requests on the web site, i have the following: Load Balancer Manager for eicixzl034 Server Version: Apache/2.2.21 (Unix) mod_ssl/2.2.21 OpenSSL/0.9.8k DAV/2 Server Built: May 23 2012 09:27:11 LoadBalancer Status for balancer://platosws StickySession Timeout FailoverAttempts Method JSESSIONID|jsessionid 0 1 bybusyness Worker URL Route RouteRedir Priority Factor Set Status Busyness Elected To From http://platoswsdv-node1.appsrv:54000 node1 429 1 0 Ok 1 70 36K 70K http://platoswsdv-node2.appsrv:54010 node2 -427 1 0 Ok 0 669 354K 669K The node1's priority keeps on increasing (429) and it is no more elected.
Hi, I've just reproduced the same problem as above on Solaris 10 - Sparc architecture.
A fix has been committed to trunk (http://svn.apache.org/viewvc?view=revision&revision=1366344) and proposed for 2.4.x. (2.2.x will follow that.) The handling of the busy flag is different than either proposal here. Feel free to comment on the viability of that. Note that this exact fix has thus far been tested only with trunk and 2.4.x. Potentially something different would be necessary for 2.2.x. Additionally, changes to use atomic operations or augment the balancer manager have not been considered. I suggest tracking those with different bugs.
>Additionally, changes to use atomic operations or augment the >balancer manager have not been considered. I suggest tracking >those with different bugs. a. atomic operations I just opened Bug 53618 - proxy_worker_shared fields not maintained in thread-safe manner for the thread-safe handling of busy. b. balancer manager display httpd trunk and 2.4 already display the extra information.
Applying this patch from httpd trunk/2.4.x fixes the issue for me: http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/proxy/mod_proxy_balancer.c?r1=1366344&r2=1366343&pathrev=1366344 That's what I'll propose for the 2.2.x branch. Can anyone reproduce the problem with this new patch applied?
Applied to 2.4 in r1374299. Released with 2.4.3. Applied to 2.2 in r1373355. Not yet released there.
Just tested 2.4.3 - issue seems to be fixed now. Thanks!
i've just tried the new patch on apache 2.2.22. There is one error during the compilation: mod_proxy_balancer.c: In function `force_recovery': mod_proxy_balancer.c:420: error: structure has no member named `forcerecovery' *** Error code 1 Has mod_proxy.h also been modified ? If yes, can you add it to the patch ? Thanks
You need at least http://svn.apache.org/viewvc?view=revision&revision=1373320 but I suggest trying the candidate for 2.2.23 instead, because there might be more requirements not included in 2.2.22. Version 2.2.23 already contains the patch for this bug, no need for additional patches as far as we currently know. http://httpd.apache.org/dev/dist/ Note that 2.2.23 is not yet officially released(!!!), so it is only adequate for testing purposes. The official release should not be far away though.
I've just tested in 2.2.23. Seems to be working now. Just a remark on the balancer-manager web interface, the Priority and Busyness fields are not displayed althrough they were appearing in previous 2.2 releases when the patch was installed. Is it normal? Thanks.
The balancer manager changes were not backported.
> I've just tested in 2.2.23. Seems to be working now.