Bug 65626 - MPM Event doesn't shutdown idle children after working under high load
Summary: MPM Event doesn't shutdown idle children after working under high load
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mpm_event (show other bugs)
Version: 2.4.37
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: FixedInTrunk
Depends on:
Blocks:
 
Reported: 2021-10-11 12:31 UTC by Greg Voronov
Modified: 2022-04-12 13:21 UTC (History)
1 user (show)



Attachments
event mod log (41.29 KB, text/plain)
2021-10-11 12:42 UTC, Greg Voronov
Details
event mod log after r1894286 (24.18 KB, application/x-gzip)
2021-10-26 00:26 UTC, Greg Voronov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Greg Voronov 2021-10-11 12:31:40 UTC
After running ab with concurrency equal or more than MaxRequestWorkers for some time, apache doesn't kill idle processes any longer. Following is logged every second:

[Mon Oct 11 15:17:29.416964 2021] [mpm_event:trace5] [pid 71:tid 140381265582400] event.c(2834): Not shutting down child: total daemons 16 / active limit 16 / ServerLimit 16

server-status (W busy worker is for server-status itself, no any other requests):

BusyWorkers: 1
IdleWorkers: 1023
Processes: 16
Stopping: 0
BusyWorkers: 1
IdleWorkers: 1023
ConnsTotal: 0
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 0
ConnsAsyncClosing: 0
Scoreboard: ________________________________________________________________________________________________________________________________________________________________________________W_______________________________________________________________________________________________________________________________________________________________

MPM conf:

<IfModule mpm_event_module>
    StartServers            2
    ServerLimit            16
    ThreadsPerChild        64
    MaxRequestWorkers    1024
    MinSpareThreads        32
    MaxSpareThreads        96
</IfModule>

ApacheBenchmark command requesting 700kb static file:
$ ab -n 10000 -k -c 1100

I can reproduce it after every run of ab with -c more than MaxRequestWorkers.
Comment 1 Greg Voronov 2021-10-11 12:42:46 UTC
Created attachment 38060 [details]
event mod log
Comment 2 Yann Ylavic 2021-10-15 10:34:42 UTC
This issue has been discussed on dev@ and we came to a patch in r1894285.
Could you please try that it works for you?
Comment 3 Greg Voronov 2021-10-24 23:48:52 UTC
Unfortunately it stil doesn't work, logs looks the same except for the line # emittig the message is different now:

[Sun Oct 24 23:40:27.324323 2021] [mpm_event:trace5] [pid 209173:tid 140277577320256] event.c(2899): Not shutting down child: total daemons 16 / active limit 16 / ServerLimit 16
Comment 4 Ruediger Pluem 2021-10-25 07:17:40 UTC
(In reply to Greg Voronov from comment #3)
> Unfortunately it stil doesn't work, logs looks the same except for the line
> # emittig the message is different now:
> 
> [Sun Oct 24 23:40:27.324323 2021] [mpm_event:trace5] [pid 209173:tid
> 140277577320256] event.c(2899): Not shutting down child: total daemons 16 /
> active limit 16 / ServerLimit 16

This does not directly mean that the patch does not work. The error message still looks the same as in your original report. Did you just apply r1894285? If yes please apply r	1894286 as well for getting more logging information.
Comment 5 Greg Voronov 2021-10-26 00:24:21 UTC
I apologize, it was my bad. I tried to checkout revision 1894285 from 2.4 repo instead of trunk and svn client didn't return any errors.
Now I applied patches for both 1894285 and 1894286 on the top of httpd-2.4.37 that we use yet and rerun several tests, every time server was able to recover after high load and kill children processes with excessive idle threads.
Comment 6 Greg Voronov 2021-10-26 00:26:01 UTC
Created attachment 38077 [details]
event mod log after r1894286
Comment 7 Greg Voronov 2021-11-23 18:13:17 UTC
Is there any expectations on which release build of 2.4 would include this patch?
Comment 8 Yann Ylavic 2021-11-25 13:30:57 UTC
r1894285 + r1894286 have been proposed for backport to 2.4, once/if accepted they'll be in the next release.
Comment 9 Yann Ylavic 2021-12-15 14:29:53 UTC
Backported to 2.4.x (r1895871), will be in the next release.
Comment 10 Mark Nottingham 2022-01-19 23:33:18 UTC
Apparently in 2.4.52, but I'm still seeing the behaviour.
Comment 11 Yann Ylavic 2022-01-19 23:43:11 UTC
(In reply to Mark Nottingham from comment #10)
> Apparently in 2.4.52, but I'm still seeing the behaviour.

Does patch [1] help?

[1] http://svn.apache.org/viewvc/httpd/httpd/branches/2.4.x/server/mpm/event/event.c?r1=1897149&r2=1897148&pathrev=1897149&view=patch
Comment 12 Yann Ylavic 2022-01-21 10:51:54 UTC
(In reply to Mark Nottingham from comment #10)
> Apparently in 2.4.52, but I'm still seeing the behaviour.

Mark, a few quetions.

What is your configuration and type of workload for reproducing this on 2.4.52?

Do you observe IdleWorkers == MaxRequestWorkers too whereas the number of connections is below MaxSpareThreads?
Or is it that htpd stops accepting connections like in bug 65769?

Does doubling or tripling ServerLimit help? Some "scoreboard is full" messages in the error_log?
Comment 13 Shun F 2022-04-11 07:34:37 UTC
Updated httpd to 2.4.53 from 2.4.51.

After several time reloads, child processes disappeared one by one with following:

[Mon Apr 11 14:19:49.422912 2022] [mpm_event:debug] [pid 20759:tid 140355756119808] event.c(576): wake up listener


All child processes disappeared and only parent process survive, following is logged every second:

[Mon Apr 11 14:29:25.195671 2022] [mpm_event:info] [pid 28925:tid 140356948518720] AH00486: server seems busy, (you may need to increase StartServers, ThreadsPerChild or Min/MaxSpareThreads), spawning 0 children, there are around 64 idle threads, 14 active children, and 14 children that are shutting down
[Mon Apr 11 14:29:26.196818 2022] [mpm_event:info] [pid 28925:tid 140356948518720] AH00486: server seems busy, (you may need to increase StartServers, ThreadsPerChild or Min/MaxSpareThreads), spawning 0 children, there are around 64 idle threads, 14 active children, and 14 children that are shutting down
[Mon Apr 11 14:29:27.197926 2022] [mpm_event:info] [pid 28925:tid 140356948518720] AH00486: server seems busy, (you may need to increase StartServers, ThreadsPerChild or Min/MaxSpareThreads), spawning 0 children, there are around 64 idle threads, 14 active children, and 14 children that are shutting down

Reloading or stopping/starting httpd recovers from this situation.

This can be reproduce not only at production environment but also at the test environment with no web accesses to httpd.

Can you please investigate and fix this.

///////////////////////////commands(RHEL)
systemctl stop httpd
systemctl start httpd
/usr/bin/systemctl reload httpd.service
/usr/bin/systemctl reload httpd.service
/usr/bin/systemctl reload httpd.service
/usr/bin/systemctl reload httpd.service
/usr/bin/systemctl reload httpd.service
(wait about 30 minutes)

///////////////////////////httpd-mpm.conf
<IfModule mpm_event_module>
    ServerLimit             10
    StartServers               4
    MinSpareThreads           75
    MaxSpareThreads          250
    ThreadsPerChild           64
    MaxRequestWorkers        640
    MaxConnectionsPerChild   300
</IfModule>
Comment 14 Yann Ylavic 2022-04-11 09:24:08 UTC
(In reply to Shun F from comment #13)

Could you please open a new ticket?
Also in this new ticket, please attach the error_log with "LogLevel mpm_event:trace5" when reproducing from your testing environment.
Comment 15 Shun F 2022-04-11 09:41:54 UTC
(In reply to Yann Ylavic from comment #14)
> (In reply to Shun F from comment #13)
> 
> Could you please open a new ticket?
> Also in this new ticket, please attach the error_log with "LogLevel
> mpm_event:trace5" when reproducing from your testing environment.

Hi Yann,

Thank you for your quick reply.
OK, I will open a new ticket.

Regards, Shun F
Comment 16 Yann Ylavic 2022-04-12 13:21:07 UTC
(In reply to Shun F from comment #15)
> OK, I will open a new ticket.

Followed up in bug 66004