Bug 41680

Summary: Troubles with mod_proxy_balancer
Product: Apache httpd-2 Reporter: bondaron <andriju>
Component: mod_proxy_balancerAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED DUPLICATE    
Severity: major CC: andriju
Priority: P3    
Version: 2.2.4   
Target Milestone: ---   
Hardware: PC   
OS: Solaris   

Description bondaron 2007-02-22 09:01:45 UTC
We have such problem situation and it is very important for us to find a
very fast and effective solution:

We use mod_proxy_balancer with mod_proxy_ajp in httpd-2.2.4.Tomcat (5.5.20) is
used as the backend. We use stickysessions based on cookies in the balancer
(CookieName=VotingRoute). Here is a part of httpd configuration taken from our
configuration files

---apache.conf
<Proxy balancer://tomcat_voting/>
BalancerMember ajp://xxx.xxx.xxx.135:8009 route=VOTING1_CURRENT loadfactor=1
BalancerMember ajp://xxx.xxx.xxx.136:8009 route=VOTING2_CURRENT loadfactor=1
ProxySet stickysession=VotingRoute nofailover=Off
</Proxy>

RewriteRule /servlet/survey.Voting balancer://tomcat_voting%{REQUEST_URI} [P,L]
---/apache.conf

Below you can see a sample of the httpd server log (with LogLevel=debug
option),when the site looks working. As you can see from the log below
queries with Cookies received before get to the right tomcat from the
balancer

---debug
[Tue Feb 20 14:09:36 2007] [debug] mod_proxy_balancer.c(41): proxy: BALANCER:
canonicalising URL //tomcat_voting/survey/112333/4c39d6cf/
[Tue Feb 20 14:09:36 2007] [debug] mod_proxy_balancer.c(247): proxy: BALANCER:
Found value x.VOTING1_CURRENT for stickysession VotingRoute
[Tue Feb 20 14:09:36 2007] [debug] mod_proxy_balancer.c(257): proxy: BALANCER:
Found route VOTING1_CURRENT
[Tue Feb 20 14:09:36 2007] [debug] mod_proxy_balancer.c(488): proxy: BALANCER
(balancer://tomcat_voting) worker (ajp://xxx.xxx.xxx.135:8009) rewritten to
ajp://xxx.xxx.xxx.135:8009/survey/112333/4c39d6cf/
[Tue Feb 20 14:09:36 2007] [debug] mod_proxy.c(777): Running scheme balancer
handler (attempt 0)
[Tue Feb 20 14:09:36 2007] [debug] mod_proxy_http.c(1652): proxy: HTTP:
declining URL ajp://xxx.xxx.xxx.135:8009/survey/112333/4c39d6cf/
[Tue Feb 20 14:09:36 2007] [debug] mod_proxy_ajp.c(507): proxy: AJP: serving URL
ajp://xxx.xxx.xxx.135:8009/survey/112333/4c39d6cf/
[Tue Feb 20 14:09:36 2007] [debug] proxy_util.c(1798): proxy: AJP: has acquired
connection for (xxx.xxx.xxx.135)
[Tue Feb 20 14:09:36 2007] [debug] proxy_util.c(1859): proxy: connecting
ajp://xxx.xxx.xxx.135:8009/survey/112333/4c39d6cf/ to xxx.xxx.xxx.135:8009
[Tue Feb 20 14:09:36 2007] [debug] proxy_util.c(1955): proxy: connected
/survey/112333/4c39d6cf/ to xxx.xxx.xxx.135:8009
---/debug

But sometimes we strange things happen. Unexpectedly balancer chooses a
tomcat that has different 'route' option that found in the Cookies. The query
goes to the wrong tomcat and receives another Cookie from it with
another 'route' in the Cookie, then the situation repeats, in other words
the balancer send the query to a wrong tomcat. And it happens to every next
query.Here is an example from the log files of httpd server with sending a query
to a wrong Tomcat:

---debug
[Tue Feb 20 11:24:17 2007] [debug] mod_proxy_balancer.c(41): proxy: BALANCER:
canonicalising URL //tomcat_voting/servlet/survey.VotingSurvey
[Tue Feb 20 11:24:17 2007] [debug] mod_proxy_balancer.c(247): proxy: BALANCER:
Found value x.VOTING2_CURRENT for stickysession VotingRoute
[Tue Feb 20 11:24:17 2007] [debug] mod_proxy_balancer.c(257): proxy: BALANCER:
Found route VOTING2_CURRENT
[Tue Feb 20 11:24:17 2007] [debug] mod_proxy_balancer.c(922): proxy: Entering
byrequests for BALANCER (balancer://tomcat_voting)
[Tue Feb 20 11:24:17 2007] [debug] mod_proxy_balancer.c(488): proxy: BALANCER
(balancer://tomcat_voting) worker (ajp://xxx.xxx.xxx.135:8009) rewritten to
ajp://xxx.xxx.xxx.135:8009/servlet/survey.VotingSurvey?i_n_f=survey112333_pg0_totpg2_rid9392985_lqid2020313%23errQue
[Tue Feb 20 11:24:17 2007] [debug] mod_proxy.c(777): Running scheme balancer
handler (attempt 0)
[Tue Feb 20 11:24:17 2007] [debug] mod_proxy_http.c(1652): proxy: HTTP:
declining URL
ajp://xxx.xxx.xxx.135:8009/servlet/survey.VotingSurvey?i_n_f=survey112333_pg0_totpg2_rid9392985_lqid2020313%23errQue
[Tue Feb 20 11:24:17 2007] [debug] mod_proxy_ajp.c(507): proxy: AJP: serving URL
ajp://xxx.xxx.xxx.135:8009/servlet/survey.VotingSurvey?i_n_f=survey112333_pg0_totpg2_rid9392985_lqid2020313%23errQue
[Tue Feb 20 11:24:17 2007] [debug] proxy_util.c(1798): proxy: AJP: has acquired
connection for (xxx.xxx.xxx.135)
[Tue Feb 20 11:24:17 2007] [debug] proxy_util.c(1859): proxy: connecting
ajp://xxx.xxx.xxx.135:8009/servlet/survey.VotingSurvey?i_n_f=survey112333_pg0_totpg2_rid9392985_lqid2020313%23errQue
to xxx.xxx.xxx.135:8009
[Tue Feb 20 11:24:17 2007] [debug] proxy_util.c(1955): proxy: connected
/servlet/survey.VotingSurvey?i_n_f=survey112333_pg0_totpg2_rid9392985_lqid2020313%23errQue
to xxx.xxx.xxx.135:8009
---/debug

We reloaded tomcats one by one but it did not help to make the balancer work
correctly. Execution of 'apachectl graceful' command does not help either. Only
if we reload the httpd server balancer works ok (stickysessions work). It is
very hard to reproduce the problem but from time to time balancer
works as described above and we have to reload the httpd server. httpd
server reloading is a bad solution for us. 

Please recommend what we can do to solve this problem.

Thanks.
Comment 1 Ruediger Pluem 2007-02-22 14:10:46 UTC
From what I can see here is that the balancer switches to another Tomcat because
the Tomcat with the correct route set is not available. As you have set
nofailover=Off this works as designed. If you would have set nofailover=On there
should be the following error message in the log in such situations:

All workers are in error state for route.

Please have a look in your error log for lines that contain the words 'proxy'
and 'failed'.
To check the status of single workers please enable the balancer manager
(http://httpd.apache.org/docs/2.2/en/mod/mod_proxy_balancer.html#enable) and
have a look on its status page.
Comment 2 bondaron 2007-06-15 11:54:17 UTC
Thank you. Balancer-manager turned out to be a very helpful tool for analyzing
of failures.

We seem to have found the reasons why we had issues with stickysessions. At
least we believe that these were the reasons. 

If there are a few back-ends in the balancer description, then if a back-end is
either deleted or added from/to the beginning of the list, then Route values of
the following back-ends are changed.

As an example here is a part of our test configuration: 
...
<Proxy balancer://app/>
BalancerMember ajp://app1:8009 route=T1 loadfactor=1
BalancerMember ajp://app2:8009 route=T2 loadfactor=1
BalancerMember ajp://app3:8009 route=T3 loadfactor=1
BalancerMember ajp://app4:8009 route=T4 loadfactor=1
ProxySet stickysession=SessionRoute nofailover=Off
</Proxy>
...

After rebooting the httpd server we can see the following on the
balancer-manager page 

...
Worker URL	Route	RouteRedir	Factor	Set	Status	Elected	To	From
ajp://app1:8009	T1		1	0	Ok	325	1.3K	1.3M
ajp://app2:8009	T2		1	0	Ok	271	587 	1.2M
ajp://app3:8009	T3		1	0	Ok	150	19K	1.2M
ajp://app4:8009	T4		1	0	Ok	137	38K	1.3M
...

However, balancer works fine. But if a 'ajp://app2:8009'  back-end is deleted
and then 'apachectl graceful' command is run, the following will be seen on the
balancer-manager page:

...
Worker URL	Route	RouteRedir	Factor	Set	Status	Elected	To	From
ajp://app1:8009	T1		1	0	Ok	325	1.3K	1.3M
ajp://app3:8009	T2		1	0	Ok	271	587 	1.2M
ajp://app4:8009	T3		1	0	Ok	150	19K	1.2M
...

Routes 'T2' &#1080; 'T3 correspond to ‘app3’ and ‘app4’ back-ends '. As a result
StickySessions work non-properly. Adding back-ends to the list also results in
bad referencing for the ‘Route’ options even though they are defined by the
configuration.
 
Also, we noticed that deleting or adding back-ends affects properties of
back-ends for other balancers as well, even back-ends for other VirtualHost
sections.

Of course after changing balancers descriptions we can reboot the httpd server
or correct Route options with the help of balancer-manager. But the first
approach is not fine for us as we use the server 24x7, and the latter is not
very good for us either as there are many balancers in the server’s configuration.
Comment 3 Ruediger Pluem 2007-06-16 08:56:14 UTC
You describe the really nasty effects of the root cause for PR42621. Please see
my comment there.

*** This bug has been marked as a duplicate of 42621 ***