Bug 44855 - irregular balancing with sticky-session after re-enabling worker or lbset-changes
Summary: irregular balancing with sticky-session after re-enabling worker or lbset-cha...
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_balancer (show other bugs)
Version: 2.2-HEAD
Hardware: All All
: P2 normal with 8 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
Keywords: MassUpdate
Depends on:
Reported: 2008-04-22 10:51 UTC by Marcus Heilmann
Modified: 2018-11-07 21:08 UTC (History)
1 user (show)


Note You need to log in before you can comment on or make changes to this bug.
Description Marcus Heilmann 2008-04-22 10:51:42 UTC

I've a little issue to discuss regarding sticky-sessions, which could result in very high loads on the backend-servers.

After changing the lbset of a worker to a lower one (eg. from 1 to 0), it will receive a large number of requests. This is because all requests (with and without sticky-session) are counted the same way.
This would also happen if a worker left an unusable-state (not tested, but its the same principle).

You can find an example at the end of this posting.

As already mentioned in the sourcecode (file: mod_proxy_balancer.c, function: proxy_balancer_pre_request), this is an open issue:
             * TODO: Abstract the below, since this is dependent
             *       on the LB implementation

Possible solutions for the sticky-session issue:

1) Configurable counting of lbstatus
   - New proxy-balancer option: lbmethod_counting=[all|new|(sticky?)]
     all: the current behaviour (default)
     new: only new sessions will modify the lbstatus
     sticky: only sticky session will modify the lbstatus <- useful?

  Changes needed:
     mod_proxy_balancer.c (proxy_balancer_pre_request, find_best_by* <- only for 'sticky')
     mod_proxy.h (struct proxy_balancer)
     and a few other files to read the new config-parameter

2) Implementation of a "soft fade-in". After changes to lbset, error-states, ... a soft fade-in-flag will be set.
   For a defined period of time, the counting of lbstatus will be reduced to new sessions.
   - New proxy-balancer option: lbmethod_fadeintime=<seconds int>
     0 : the current behaviour (default)
     >0: fadeintime
   Changes needed:
     mod_proxy_balancer.c (proxy_balancer_pre_request)
     mod_proxy.h (struct proxy_balancer, struct proxy_worker_stat:+ apr_time_t softin_time)
     all places where lbset and error-states are modified
     and a few other files to read the new config-parameter

My current favorite is solution 1), because there is less code to modify and it would meet my requirements.

While digging around in the sourcecode I've also found a little flaw within that TODO-Section:
lbsets aren't handled at all. So the lbstatus for all workers in all lbsets are modified, which would result in unbeautiful values of lbstatus (e.g. -1.000.000 : -1.000.000 : +2.000.000)
This little patch could fix that flaw:
             * TODO: Abstract the below, since this is dependent
             *       on the LB implementation
           -  if (PROXY_WORKER_IS_USABLE(workers)) {
           + if (PROXY_WORKER_IS_USABLE(workers) && workers->s->lbset == runtime->s->lbset) {

What do you think about it?

Ciao - Marcus

- 3 workers with sticky-session (a, b, c)
- 1 new session per second with 50(+1) requests over 300 seconds
- initial lbsets: a:0, b:0, c:1

After changing the lbset of worker c to 0, the lbstatus will be cleared (recalc_factors()). So we start at 0.

// all new session will go to c
Second  :    0
ReqDone : a=  0.00, b=  0.00, c=   0.00
lbstatus: a=  0.0, b= -0.0, c=  0.0
Sessions: a= 150.00, b= 150.00, c=   0.00

Second  :   60
ReqDone : a= 1378.58, b= 1367.42, c= 365.00
lbstatus: a= -1024.7, b= -991.3, c= 2016.0
Sessions: a= 120.50, b= 119.50, c=  60.00

Second  :  180
ReqDone : a= 3183.58, b= 3152.42, c= 2895.00
lbstatus: a= -319.7, b= -226.2, c= 546.0
Sessions: a=  60.50, b=  59.50, c= 180.00

// since now all new session will go to a + b
// but we are also losing old-session
Second  :  192 
ReqDone : a= 3298.08, b= 3264.92, c= 3280.00
lbstatus: a= -51.2, b= 48.2, c=  3.0
Sessions: a=  54.50, b=  53.50, c= 192.00

// reached break-even point: c loses sessions
Second  :  301
ReqDone : a= 4316.83, b= 4317.33, c= 6767.83
lbstatus: a= 2451.5, b= 2450.0, c= -4901.5
Sessions: a=  54.00, b=  55.00, c= 191.00

// till c has 0 active sessions
Second  :  575
ReqDone : a= 9792.00, b= 9792.00, c= 9792.00
lbstatus: a=  0.0, b= -0.0, c=  0.0
Sessions: a= 150.00, b= 150.00, c=   0.00

// now we have a similar situation to second 0
Comment 1 Alex 2011-02-11 21:50:25 UTC
Hi Marcus

I've the same problem us you reported a long time ago

BalancerMembers with loadfactor=1 and lbset=0

I use lbset=1 to stop sending new sessions to a node. (node maintenance)

A node with lbset=1 no receive traffic (only sticky sessions), but when lbset=0 the node receive a lot of traffic instead of continue a round robin balancing. It's seems mod_proxy try to reach the same level of requests as others nodes running all time. Finally the node it's overloaded due to a lot of traffic, while others nodes are idle.

I think the option 1 would be great for me

- New proxy-balancer option: lbmethod_counting=[all|new|(sticky?)]

Do you fix the problem ?

Please let me know


Comment 2 William A. Rowe Jr. 2018-11-07 21:08:32 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.