Summary: | Child processes disappears one by one with no web access. | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Shun F <uni.roy> |
Component: | mpm_event | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | Keywords: | FixedInTrunk |
Priority: | P2 | ||
Version: | 2.4.53 | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: |
error log
Fix active/total daemons accounting on non-graceful restart error log with "LogLevel mpm_event:trace6" after patching |
Description
Shun F
2022-04-11 09:49:06 UTC
It would be useful to run the reproducer with "LogLevel mpm_event:trace6" and attach the error log here. (In reply to Yann Ylavic from comment #1) > It would be useful to run the reproducer with "LogLevel mpm_event:trace6" > and attach the error log here. OK. I'm trying to reproduce now. Please wait. Created attachment 38243 [details]
error log
I reproduced with "LogLevel mpm_event:trace6" and uploaded error log. Can you please investigae. ///////////////////////////// # systemctl start httpd # /usr/bin/systemctl reload httpd.service # /usr/bin/systemctl reload httpd.service # /usr/bin/systemctl reload httpd.service # /usr/bin/systemctl reload httpd.service # /usr/bin/systemctl reload httpd.service # systemctl status httpd ● httpd.service - The Apache HTTP Server Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset: disabled) Active: active (running) since 月 2022-04-11 19:03:14 JST; 37s ago Process: 15865 ExecStop=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k graceful-stop (code=exited, status=0/SUCCESS) Process: 17327 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS) Process: 17325 ExecReload=/opt/app/httpd/httpd-2/bin/httpd -t (code=exited, status=0/SUCCESS) Process: 15929 ExecStart=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k start (code=exited, status=0/SUCCESS) Main PID: 15930 (httpd) CGroup: /system.slice/httpd.service tq15930 /opt/app/httpd/httpd-2/bin/httpd -k start tq17329 /opt/app/httpd/httpd-2/bin/httpd -k start tq17331 /opt/app/httpd/httpd-2/bin/httpd -k start mq17336 /opt/app/httpd/httpd-2/bin/httpd -k start 33 minutes later... # systemctl status httpd ● httpd.service - The Apache HTTP Server Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset: disabled) Active: active (running) since 月 2022-04-11 19:03:14 JST; 33min ago Process: 15865 ExecStop=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k graceful-stop (code=exited, status=0/SUCCESS) Process: 17327 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS) Process: 17325 ExecReload=/opt/app/httpd/httpd-2/bin/httpd -t (code=exited, status=0/SUCCESS) Process: 15929 ExecStart=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k start (code=exited, status=0/SUCCESS) Main PID: 15930 (httpd) CGroup: /system.slice/httpd.service tq15930 /opt/app/httpd/httpd-2/bin/httpd -k start mq17329 /opt/app/httpd/httpd-2/bin/httpd -k start ///////////////////////////// Created attachment 38244 [details]
Fix active/total daemons accounting on non-graceful restart
Could you please try this patch?
(In reply to Yann Ylavic from comment #5) > Created attachment 38244 [details] > Fix active/total daemons accounting on non-graceful restart > > Could you please try this patch? Thank you for your quick response. What was wrong with 2.4.53? I will try and report the result of the trial. (In reply to Shun F from comment #6) > What was wrong with 2.4.53? The issue was with the non-accounting of children processes stopped by reloads (ungraceful restarts). 2.4.53 only accounts for processes started and stopped during runtime maintenance (namely perform_idle_server_maintenance() which honors Min/MaxSpareThreads and restarts children exited after MaxRequestsPerChild), while reloads also kill processes outside this mechanism thus outside the accounting (note that graceful restarts don't use this mechanism so are not affected by this issue). At some point, with bad counters, httpd thinks it can't create new children even though none is actually running (besides the main process). The fix consists in updating the counters for started/stopped children in a common function called by both runtime maintenance and reloads. Created attachment 38245 [details]
error log with "LogLevel mpm_event:trace6" after patching
The patch trial completed. I tried to reproduce twice and the phenomenon didn't reproduce. The fix will be included in the next httpd release (2.4.54)? /////////////////////////////error log uploaded (as error_log_after_patching0412) systemctl start httpd /usr/bin/systemctl reload httpd.service /usr/bin/systemctl reload httpd.service /usr/bin/systemctl reload httpd.service /usr/bin/systemctl reload httpd.service /usr/bin/systemctl reload httpd.service systemctl status httpd ● httpd.service - The Apache HTTP Server Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset: disabled) Active: active (running) since 火 2022-04-12 18:00:36 JST; 58s ago Process: 21597 ExecStop=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k graceful-stop (code=exited, status=0/SUCCESS) Process: 23190 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS) Process: 23188 ExecReload=/opt/app/httpd/httpd-2/bin/httpd -t (code=exited, status=0/SUCCESS) Process: 21713 ExecStart=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k start (code=exited, status=0/SUCCESS) Main PID: 21714 (httpd) CGroup: /system.slice/httpd.service tq21714 /opt/app/httpd/httpd-2/bin/httpd -k start tq23192 /opt/app/httpd/httpd-2/bin/httpd -k start tq23193 /opt/app/httpd/httpd-2/bin/httpd -k start mq23248 /opt/app/httpd/httpd-2/bin/httpd -k start About an hour later... 2 child processes still exists. systemctl status httpd ● httpd.service - The Apache HTTP Server Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset: disabled) Active: active (running) since 火 2022-04-12 18:00:36 JST; 1h 2min ago Process: 21597 ExecStop=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k graceful-stop (code=exited, status=0/SUCCESS) Process: 23190 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS) Process: 23188 ExecReload=/opt/app/httpd/httpd-2/bin/httpd -t (code=exited, status=0/SUCCESS) Process: 21713 ExecStart=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k start (code=exited, status=0/SUCCESS) Main PID: 21714 (httpd) CGroup: /system.slice/httpd.service tq21714 /opt/app/httpd/httpd-2/bin/httpd -k start tq23192 /opt/app/httpd/httpd-2/bin/httpd -k start mq26723 /opt/app/httpd/httpd-2/bin/httpd -k start ///////////////////////////// Thank you Thanks for testing, checked in r1899777. The change will be proposed for backport to 2.4, let's how it goes with others' reviews.. (In reply to Yann Ylavic from comment #7) > (In reply to Shun F from comment #6) > > What was wrong with 2.4.53? > > The issue was with the non-accounting of children processes stopped by > reloads (ungraceful restarts). 2.4.53 only accounts for processes started > and stopped during runtime maintenance (namely > perform_idle_server_maintenance() which honors Min/MaxSpareThreads and > restarts children exited after MaxRequestsPerChild), while reloads also kill > processes outside this mechanism thus outside the accounting (note that > graceful restarts don't use this mechanism so are not affected by this > issue). > At some point, with bad counters, httpd thinks it can't create new children > even though none is actually running (besides the main process). > > The fix consists in updating the counters for started/stopped children in a > common function called by both runtime maintenance and reloads. Thank you for telling me. I got the reason. (In reply to Yann Ylavic from comment #10) > Thanks for testing, checked in r1899777. > > The change will be proposed for backport to 2.4, let's how it goes with > others' reviews.. Thanks so much. I got it. I hope your fix will be included in the next release. If MaxConnectionsPerChild is set to a non-zero value, The phenomenon seems to reproduce. I'm testing now. I will report the results later. There is a fix (follow up) to r1899777, namely r1899812. A full patch for 2.4 including all the changes is available there: https://patch-diff.githubusercontent.com/raw/apache/httpd/pull/313.diff (In reply to Yann Ylavic from comment #14) > There is a fix (follow up) to r1899777, namely r1899812. > > A full patch for 2.4 including all the changes is available there: > https://patch-diff.githubusercontent.com/raw/apache/httpd/pull/313.diff Is r1899777 an incomplete fix? I'd better test the r1899812, right? (In reply to Shun F from comment #13) > If MaxConnectionsPerChild is set to a non-zero value, The phenomenon seems > to reproduce. > > I'm testing now. > I will report the results later. If MaxConnectionsPerChild is 0, the phenomenon doesn't reproduce under unpatched 2.4.53. The reproducibility conditions are: [1]Using httpd 2.4.53 (2.4.52 is probably the same) [2]Using event MPM [3]Set a non-zero value to MaxConnectionsPerChild [4]Reloaded regularly, such as daily (e.g. by /etc/logrotate.d/httpd) > Is r1899777 an incomplete fix? > I'd better test the r1899812, right? Yes, r1899777 alone is not enough, you can test r1899777+r1899812 (both patches) which is the same as 313.diff. (In reply to Yann Ylavic from comment #17) > > Is r1899777 an incomplete fix? > > I'd better test the r1899812, right? > > Yes, r1899777 alone is not enough, you can test r1899777+r1899812 (both > patches) which is the same as 313.diff. I've applied 313.diff to original 2.4.53 and the patch trial completed. I've tried to reproduce twice and the phenomenon didn't reproduce. How does it goes with others' reviews? The fix will be backport to 2.4? Backported to 2.4 (r1901199), will be in the next release. (In reply to Yann Ylavic from comment #20) > Backported to 2.4 (r1901199), will be in the next release. Good news! Thanks a million. |