45950 – Workers parameter is invalid after graceful restart

Bug 45950 - Workers parameter is invalid after graceful restart

Summary: Workers parameter is invalid after graceful restart

Status:	RESOLVED DUPLICATE of bug 44736

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	mod_proxy_balancer (show other bugs)
Version:	2.2.9
Hardware:	All All

Importance:	P2 normal with 4 votes (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-10-04 20:23 UTC by Tomokazu Harada
Modified:	2012-05-04 23:21 UTC (History)
CC List:	1 user (show)

Attachments
Clearing the LB-scoreboard patch (821 bytes, patch) 2008-10-04 20:25 UTC, Tomokazu Harada	Details \| Diff
fix graceful restart problem (6.40 KB, patch) 2009-11-05 02:28 UTC, Satoshi Ebisawa	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tomokazu Harada 2008-10-04 20:23:31 UTC

Workers shared parameter (ex. route) is invalid after graceful restart.
For example, my config is like this:

ProxyPass /test balancer://mycluster/test stickysession=abc
<Proxy balancer://mycluster>
    BalancerMember http://10.1.1.10 route=r10
    BalancerMember http://10.1.1.20 route=r20
</Proxy>

On balancer-manager:

StickySession Timeout FailoverAttempts Method 
abc           0       1                byrequests 
Worker URL       Route RouteRedir Factor Set Status Elected To From
http://10.1.1.10 r10              1      0   Ok     0       0  0
http://10.1.1.20 r20              1      0   Ok     0       0  0

And I change the first worker of my config like this:

ProxyPass /test balancer://mycluster/test stickysession=abc
<Proxy balancer://mycluster>
    BalancerMember http://10.1.1.99 route=r99
    BalancerMember http://10.1.1.20 route=r20
</Proxy>

After graceful restart, balancer-manager shows:

StickySession Timeout FailoverAttempts Method 
abc           0       1                byrequests 
Worker URL       Route RouteRedir Factor Set Status Elected To From
http://10.1.1.99 r10              1      0   Ok     0       0  0
http://10.1.1.20 r20              1      0   Ok     0       0  0

The first worker's route is invalid. r99 is expected.

This bug is caused by not clearing the shared area (scoreboard) on graceful restart.
The shared area is cleared on normal restart in pre_mpm hook.

Attached patch is against 2.2.9 and fixes this bug by clearing the shared area.
The patch may fix Bug#39811 and Bug#44736.

Comment 1 Tomokazu Harada 2008-10-04 20:25:07 UTC

Created attachment 22671 [details]
Clearing the LB-scoreboard patch

Comment 2 Ruediger Pluem 2008-10-06 04:04:00 UTC

Thanks for the patch, but I think it has some undesired side effects. The problem is that during a graceful restart other processes still use the *old* data in the scoreboard and we cannot simply delete it beneath them. In order to solve this properly some deeper thoughts need to be made how to manage this shared configuration data in a better way.

Comment 3 Satoshi Ebisawa 2009-11-05 02:28:16 UTC

Created attachment 24487 [details]
fix graceful restart problem

I think this problem was caused by improper use of worker index.

In http-2.2.14, worker id (member "id" of sturct proxy_worker)
is used for search key of scoreboard data (struct proxy_worker_stat).
But worker id is inappropriate value for search key of scoreboad data
because worker id might be changed after graceful restart
because configured number of workers can be changed. 
(struct proxy_worker is not stored in scoreboard so its id is not saved)

Shared parameter broken after graceful restart is caused by
use of wrong index for scoreboard data.

Attached patch introduces fixed key for scoreboard data
generated from server's hostname and worker name.
Since these are never changed after graceful restarts,
it's suitable for search key of scoreboard data.

Comment 4 Olivier BOËL 2010-01-29 03:32:09 UTC

Compiled and installed Apache 2.2.14 on Sparc Solaris 9.

Problem remains : if you modify the configuration (at least adding or removing one or more virtual hosts) and do a graceful restart, scroreboard appears corrupted and load balancing does not keep routes, which causes problems to applications that are not stateless.

Tried patch proposed by Satoshi Ebisawa but I get continuous errors at startup [notice] child pid ... exit signal Segmentation fault (11)
and the web server never works.

Did anybody experience the same problem... and fixed it?

TIA,


Olivier

Comment 5 Ruediger Pluem 2010-01-29 06:15:22 UTC

This is a known issue. Do a hard restart of httpd (aka. stop (or graceful stop) and start) when changing these configuration parameters.

Comment 6 Olivier BOËL 2011-07-14 06:20:32 UTC

This bug can be reproduced with Apache 2.2.19 on Solaris

Comment 7 Jim Jagielski 2011-08-04 14:50:16 UTC

WONTFIX - 2.2.x does not guarantee local changes via balancer-manager are kept

This is a feature in 2.3/2.4

Comment 8 tarun 2012-05-04 23:21:18 UTC

It appears by the WONTFIX comment that this is another instance of the same bug being misread. This is when you actually change the file on the filesystem and do a graceful, apache comes back with the workers all corrupted.

I believe that this may be a duplicate of 44736. I have recently tested in Oracle Linux 6 with httpd-2.2.15-15.0.1.el6.x86_64 from upstream and am seeing same behavior even on a site with little to no traffic.

The suggestion to restart the server, esp. on a busy server, is completely broken. It causes a significant perceived downtime.

*** This bug has been marked as a duplicate of bug 44736 ***