After running ab with concurrency equal or more than MaxRequestWorkers for some time, apache doesn't kill idle processes any longer. Following is logged every second: [Mon Oct 11 15:17:29.416964 2021] [mpm_event:trace5] [pid 71:tid 140381265582400] event.c(2834): Not shutting down child: total daemons 16 / active limit 16 / ServerLimit 16 server-status (W busy worker is for server-status itself, no any other requests): BusyWorkers: 1 IdleWorkers: 1023 Processes: 16 Stopping: 0 BusyWorkers: 1 IdleWorkers: 1023 ConnsTotal: 0 ConnsAsyncWriting: 0 ConnsAsyncKeepAlive: 0 ConnsAsyncClosing: 0 Scoreboard: ________________________________________________________________________________________________________________________________________________________________________________W_______________________________________________________________________________________________________________________________________________________________ MPM conf: <IfModule mpm_event_module> StartServers 2 ServerLimit 16 ThreadsPerChild 64 MaxRequestWorkers 1024 MinSpareThreads 32 MaxSpareThreads 96 </IfModule> ApacheBenchmark command requesting 700kb static file: $ ab -n 10000 -k -c 1100 I can reproduce it after every run of ab with -c more than MaxRequestWorkers.
Created attachment 38060 [details] event mod log
This issue has been discussed on dev@ and we came to a patch in r1894285. Could you please try that it works for you?
Unfortunately it stil doesn't work, logs looks the same except for the line # emittig the message is different now: [Sun Oct 24 23:40:27.324323 2021] [mpm_event:trace5] [pid 209173:tid 140277577320256] event.c(2899): Not shutting down child: total daemons 16 / active limit 16 / ServerLimit 16
(In reply to Greg Voronov from comment #3) > Unfortunately it stil doesn't work, logs looks the same except for the line > # emittig the message is different now: > > [Sun Oct 24 23:40:27.324323 2021] [mpm_event:trace5] [pid 209173:tid > 140277577320256] event.c(2899): Not shutting down child: total daemons 16 / > active limit 16 / ServerLimit 16 This does not directly mean that the patch does not work. The error message still looks the same as in your original report. Did you just apply r1894285? If yes please apply r 1894286 as well for getting more logging information.
I apologize, it was my bad. I tried to checkout revision 1894285 from 2.4 repo instead of trunk and svn client didn't return any errors. Now I applied patches for both 1894285 and 1894286 on the top of httpd-2.4.37 that we use yet and rerun several tests, every time server was able to recover after high load and kill children processes with excessive idle threads.
Created attachment 38077 [details] event mod log after r1894286
Is there any expectations on which release build of 2.4 would include this patch?
r1894285 + r1894286 have been proposed for backport to 2.4, once/if accepted they'll be in the next release.
Backported to 2.4.x (r1895871), will be in the next release.
Apparently in 2.4.52, but I'm still seeing the behaviour.
(In reply to Mark Nottingham from comment #10) > Apparently in 2.4.52, but I'm still seeing the behaviour. Does patch [1] help? [1] http://svn.apache.org/viewvc/httpd/httpd/branches/2.4.x/server/mpm/event/event.c?r1=1897149&r2=1897148&pathrev=1897149&view=patch
(In reply to Mark Nottingham from comment #10) > Apparently in 2.4.52, but I'm still seeing the behaviour. Mark, a few quetions. What is your configuration and type of workload for reproducing this on 2.4.52? Do you observe IdleWorkers == MaxRequestWorkers too whereas the number of connections is below MaxSpareThreads? Or is it that htpd stops accepting connections like in bug 65769? Does doubling or tripling ServerLimit help? Some "scoreboard is full" messages in the error_log?
Updated httpd to 2.4.53 from 2.4.51. After several time reloads, child processes disappeared one by one with following: [Mon Apr 11 14:19:49.422912 2022] [mpm_event:debug] [pid 20759:tid 140355756119808] event.c(576): wake up listener All child processes disappeared and only parent process survive, following is logged every second: [Mon Apr 11 14:29:25.195671 2022] [mpm_event:info] [pid 28925:tid 140356948518720] AH00486: server seems busy, (you may need to increase StartServers, ThreadsPerChild or Min/MaxSpareThreads), spawning 0 children, there are around 64 idle threads, 14 active children, and 14 children that are shutting down [Mon Apr 11 14:29:26.196818 2022] [mpm_event:info] [pid 28925:tid 140356948518720] AH00486: server seems busy, (you may need to increase StartServers, ThreadsPerChild or Min/MaxSpareThreads), spawning 0 children, there are around 64 idle threads, 14 active children, and 14 children that are shutting down [Mon Apr 11 14:29:27.197926 2022] [mpm_event:info] [pid 28925:tid 140356948518720] AH00486: server seems busy, (you may need to increase StartServers, ThreadsPerChild or Min/MaxSpareThreads), spawning 0 children, there are around 64 idle threads, 14 active children, and 14 children that are shutting down Reloading or stopping/starting httpd recovers from this situation. This can be reproduce not only at production environment but also at the test environment with no web accesses to httpd. Can you please investigate and fix this. ///////////////////////////commands(RHEL) systemctl stop httpd systemctl start httpd /usr/bin/systemctl reload httpd.service /usr/bin/systemctl reload httpd.service /usr/bin/systemctl reload httpd.service /usr/bin/systemctl reload httpd.service /usr/bin/systemctl reload httpd.service (wait about 30 minutes) ///////////////////////////httpd-mpm.conf <IfModule mpm_event_module> ServerLimit 10 StartServers 4 MinSpareThreads 75 MaxSpareThreads 250 ThreadsPerChild 64 MaxRequestWorkers 640 MaxConnectionsPerChild 300 </IfModule>
(In reply to Shun F from comment #13) Could you please open a new ticket? Also in this new ticket, please attach the error_log with "LogLevel mpm_event:trace5" when reproducing from your testing environment.
(In reply to Yann Ylavic from comment #14) > (In reply to Shun F from comment #13) > > Could you please open a new ticket? > Also in this new ticket, please attach the error_log with "LogLevel > mpm_event:trace5" when reproducing from your testing environment. Hi Yann, Thank you for your quick reply. OK, I will open a new ticket. Regards, Shun F
(In reply to Shun F from comment #15) > OK, I will open a new ticket. Followed up in bug 66004