Created attachment 27193 [details] busy.patch created by Mladen Turk and verified independently to fix this issue An AJP worker can get stuck in the OK/BUSY state and - in the case of a stateless web service - not handle any further requests. In jk_lb_worker.c's service(), a worker is set to busy if it can't provide a free endpoint. If a request is finished and releases one of the worker's endpoints then the busy state is cleared. The problem is that as one thread performs the final jk_sleep() in jk_ajp_common's do_ajp_service(), all endpoints for the worker may be released. Now the worker is marked busy, but no requests will complete subsequently which would clear the busy state. The worker is stuck, indefinitely. The problem is fairly easy to reproduce as follows: - Configure an lb worker with a few AJP member workers. - Configure one of the AJP workers with connection_pool_size=1. - Run only the application servers corresponding to the worker with connection_pool_size=1. - Deploy a servlet that sleeps for 200ms (default 2 retries * 100ms sleep). - Invoke the servlet twice in parallel, via Apache and mod_jk. The above is a pathological setup and just for testing, however it has been encountered in a specific use case with a web service.
The fix was applied to the trunk as r1137160 http://svn.apache.org/viewvc?view=revision&revision=1137160 Thanks for filling the issue with the better explanation of symptoms. I tested the patch and it cleanly applies for all mod_jk versions from 1.2.27 up, so anyone affected with the bug can use the attached patch until 1.2.32 gets released.
Oops. Gave a wrong SVN reference. It's r1137200 http://svn.apache.org/viewvc?rev=1137200&view=rev