In case of worker process graceful shutdown, listener thread cannot get worker for processing connections under asynchronous state because get_worker() will not set *have_idle_worker_p. This raise the dead-lock problem, that is the listener thread cannot get idle workers to process asynchronous (e.g. CONN_STATE_WRITE_COMPLETION) because the get_worker don't pass have_idle_worker, but workers are already waiting in ap_queue_pop_something(). This is because the ap_queue_info_wait_for_idler() will return APR_EOF with allocating idle worker (s.t. decreasing worker_queue_info->idlers) after ap_queue_info_term() but the get_worker() will not set *have_idle_worker_p in APR_EOF case. So the listener_thread() multiply call get_worker() for processing waiting connections and worker_queue_info->idlers goes to pt_zero. --- diff -r -u httpd-2.4.10.orig//server/mpm/event/event.c httpd-2.4.10//server/mpm/event/event.c --- httpd-2.4.10.orig//server/mpm/event/event.c Thu Jun 26 07:01:31 2014 +++ httpd-2.4.10//server/mpm/event/event.c Thu Sep 11 19:04:52 2014 @@ -1271,13 +1271,13 @@ else rc = ap_queue_info_try_get_idler(worker_queue_info); - if (rc == APR_SUCCESS) { + if (rc == APR_SUCCESS || APR_STATUS_IS_EOF(rc)) { *have_idle_worker_p = 1; } else if (!blocking && rc == APR_EAGAIN) { *all_busy = 1; } - else if (!APR_STATUS_IS_EOF(rc)) { + else { ap_log_error(APLOG_MARK, APLOG_ERR, rc, ap_server_conf, APLOGNO(00472) "ap_queue_info_wait_for_idler failed. " "Attempting to shutdown process gracefully");
Created attachment 32071 [details] patch for Bug 56960
How to reproduce... 1) enable mod_status with extended status (to check status) 2) set small value for MaxConnectionsPerChild eg. 1000 (to easy reproduce) 3) make dummy large file (>100M) under DocumentRoot. 4) make a heavy load with the large file, with many (>1000) concurrent connection After several tens minutes, you can find some dead-locked httpd process. Thease process has following status - Connections total is not 0, listen 'no'. - Busy and idle thread is 0. - have some Async connections. This issue is CRITICAL because this not only happen in case of administrative graceful shutdown but also happen restart each worker process (e.g. exceed MaxConnectionsPerChild)
add keyword 'PatchAvailable' ...
This patch seems to work properly with httpd-2.4.x and fixes the mentioned bug for me, but for some reason it fails to fix it in httpd-trunk. Although it improves the situation there, I still see some workers dead-locks (but not so many as without the patch). Before committing this patch, I will try to find out what's different between trunk and 2.4.x in the event MPM causing this.
Possible duplicate of Bug 49504 (active children during graceful restart may trigger it)?
So the remaining issue I see with event MPM is caused by http://svn.apache.org/r1605328 (according to svn-bisect)
Probably workers_were_busy and have_idle_worker should be declared back as before. They have a both a meaningful value during the whole life of the listener thread...
(In reply to Yann Ylavic from comment #7) > Probably workers_were_busy and have_idle_worker should be declared back as > before. > They have a both a meaningful value during the whole life of the listener > thread... In fact only have_idle_worker is concerned, workers_were_busy must be reset to 0 for each main loop.
Yes, just wanted to write here that have_idle_worker must be preserved during iterations. I'm going to commit that and patch from this bug into trunk.
Attached patch for the original issue committed in r1629577, patch for the issue I found while fixing this bug committed in r1629576. Proposed for 2.4.x.
Backported to 2.4.11 in r1634526.