Workers shared parameter (ex. route) is invalid after graceful restart. For example, my config is like this: ProxyPass /test balancer://mycluster/test stickysession=abc <Proxy balancer://mycluster> BalancerMember http://10.1.1.10 route=r10 BalancerMember http://10.1.1.20 route=r20 </Proxy> On balancer-manager: StickySession Timeout FailoverAttempts Method abc 0 1 byrequests Worker URL Route RouteRedir Factor Set Status Elected To From http://10.1.1.10 r10 1 0 Ok 0 0 0 http://10.1.1.20 r20 1 0 Ok 0 0 0 And I change the first worker of my config like this: ProxyPass /test balancer://mycluster/test stickysession=abc <Proxy balancer://mycluster> BalancerMember http://10.1.1.99 route=r99 BalancerMember http://10.1.1.20 route=r20 </Proxy> After graceful restart, balancer-manager shows: StickySession Timeout FailoverAttempts Method abc 0 1 byrequests Worker URL Route RouteRedir Factor Set Status Elected To From http://10.1.1.99 r10 1 0 Ok 0 0 0 http://10.1.1.20 r20 1 0 Ok 0 0 0 The first worker's route is invalid. r99 is expected. This bug is caused by not clearing the shared area (scoreboard) on graceful restart. The shared area is cleared on normal restart in pre_mpm hook. Attached patch is against 2.2.9 and fixes this bug by clearing the shared area. The patch may fix Bug#39811 and Bug#44736.
Created attachment 22671 [details] Clearing the LB-scoreboard patch
Thanks for the patch, but I think it has some undesired side effects. The problem is that during a graceful restart other processes still use the *old* data in the scoreboard and we cannot simply delete it beneath them. In order to solve this properly some deeper thoughts need to be made how to manage this shared configuration data in a better way.
Created attachment 24487 [details] fix graceful restart problem I think this problem was caused by improper use of worker index. In http-2.2.14, worker id (member "id" of sturct proxy_worker) is used for search key of scoreboard data (struct proxy_worker_stat). But worker id is inappropriate value for search key of scoreboad data because worker id might be changed after graceful restart because configured number of workers can be changed. (struct proxy_worker is not stored in scoreboard so its id is not saved) Shared parameter broken after graceful restart is caused by use of wrong index for scoreboard data. Attached patch introduces fixed key for scoreboard data generated from server's hostname and worker name. Since these are never changed after graceful restarts, it's suitable for search key of scoreboard data.
Compiled and installed Apache 2.2.14 on Sparc Solaris 9. Problem remains : if you modify the configuration (at least adding or removing one or more virtual hosts) and do a graceful restart, scroreboard appears corrupted and load balancing does not keep routes, which causes problems to applications that are not stateless. Tried patch proposed by Satoshi Ebisawa but I get continuous errors at startup [notice] child pid ... exit signal Segmentation fault (11) and the web server never works. Did anybody experience the same problem... and fixed it? TIA, Olivier
This is a known issue. Do a hard restart of httpd (aka. stop (or graceful stop) and start) when changing these configuration parameters.
This bug can be reproduced with Apache 2.2.19 on Solaris
WONTFIX - 2.2.x does not guarantee local changes via balancer-manager are kept This is a feature in 2.3/2.4
It appears by the WONTFIX comment that this is another instance of the same bug being misread. This is when you actually change the file on the filesystem and do a graceful, apache comes back with the workers all corrupted. I believe that this may be a duplicate of 44736. I have recently tested in Oracle Linux 6 with httpd-2.2.15-15.0.1.el6.x86_64 from upstream and am seeing same behavior even on a site with little to no traffic. The suggestion to restart the server, esp. on a busy server, is completely broken. It causes a significant perceived downtime. *** This bug has been marked as a duplicate of bug 44736 ***