Bug 51417

Summary: Apache mod_jk worker gets stuck in OK/BUSY.
Product: Tomcat Connectors Reporter: Samuel Mendenhall <smendenh>
Component: CommonAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 1.2.31   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: busy.patch created by Mladen Turk and verified independently to fix this issue

Description Samuel Mendenhall 2011-06-22 15:10:41 UTC
Created attachment 27193 [details]
busy.patch created by Mladen Turk and verified independently to fix this issue

An AJP worker can get stuck in the OK/BUSY state and - in the case of a stateless web service - not handle any further requests. 

In jk_lb_worker.c's service(), a worker is set to busy if it can't provide a free endpoint. If a request is finished and releases one of the worker's endpoints then the busy state is cleared. 

The problem is that as one thread performs the final jk_sleep() in jk_ajp_common's do_ajp_service(), all endpoints for the worker may be released. 

Now the worker is marked busy, but no requests will complete subsequently which would clear the busy state. The worker is stuck, indefinitely. 

The problem is fairly easy to reproduce as follows: 

- Configure an lb worker with a few AJP member workers. 
- Configure one of the AJP workers with connection_pool_size=1. 
- Run only the application servers corresponding to the worker with connection_pool_size=1. 
- Deploy a servlet that sleeps for 200ms (default 2 retries * 100ms sleep). 
- Invoke the servlet twice in parallel, via Apache and mod_jk. 

The above is a pathological setup and just for testing, however it has been encountered in a specific use case with a web service.
Comment 1 Mladen Turk 2011-06-22 16:37:40 UTC
The fix was applied to the trunk as r1137160
http://svn.apache.org/viewvc?view=revision&revision=1137160

Thanks for filling the issue with the better explanation of symptoms.
I tested the patch and it cleanly applies for all mod_jk versions from
1.2.27 up, so anyone affected with the bug can use the attached patch
until 1.2.32 gets released.
Comment 2 Mladen Turk 2011-06-22 16:40:31 UTC
Oops. Gave a wrong SVN reference. It's r1137200
http://svn.apache.org/viewvc?rev=1137200&view=rev