Bug 48949

Summary: fcgid processes never get killed after graceful restart
Product: Apache httpd-2 Reporter: zlygis
Component: mod_fcgidAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED LATER    
Severity: critical CC: felix.schwarz, issues.apache.org.sites, JBlond, panayot
Priority: P2 Keywords: MassUpdate, PatchAvailable
Version: 2.2.16   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Patch to fix graceful restart/stop bug in Windows

Description zlygis 2010-03-20 12:47:33 UTC
Any fcgid processes that were active/sleeping before executing graceful apache httpd restart, never gets killed after such restart. They get killed only after hard apache restart. This problem causes fcgid processes to „pile up“after each graceful restart, therefore consuming memory and eventually making the whole system unusable.

error_log:

[Wed Mar 17 14:02:57 2010] [notice] Graceful restart requested, doing restart
[Wed Mar 17 14:02:57 2010] [emerg] [client 82.135.207.33] (43)Identifier removed: mod_fcgid: can't get pipe mutex, referer: h$
[Wed Mar 17 14:02:57 2010] [emerg] [client 78.61.82.52] (43)Identifier removed: mod_fcgid: can't get pipe mutex, referer: htt$
[Wed Mar 17 14:02:57 2010] [emerg] [client 91.121.87.87] (22)Invalid argument: mod_fcgid: can't lock process table in pid 170$
[Wed Mar 17 14:02:57 2010] [emerg] [client 91.121.88.99] (22)Invalid argument: mod_fcgid: can't lock process table in pid 160$
[Wed Mar 17 14:02:57 2010] [emerg] mod_fcgid: server is restarted, pid 27466 must exit
[Wed Mar 17 14:02:57 2010] [emerg] (22)Invalid argument: mod_fcgid: can't lock process table in PM, pid 27466


Server Version: Apache/2.2.15 (Unix) mod_ssl/2.2.15 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4 mod_fcgid/2.3.5
Server Built: Mar 11 2010 13:51:44
Server loaded APR Version: 1.4.2
Compiled with APR Version: 1.4.2
Server loaded APU Version: 1.3.9
Compiled with APU Version: 1.3.9
Module Magic Number: 20051115:24
Hostname/port: xxxxxxx:80
Timeouts: connection: 30    keep-alive: 1
MPM Name: Prefork
MPM Information: Max Daemons: 300 Threaded: no Forked: yes
Server Architecture: 64-bit
Server Root: /usr/local/apache
Config File: /usr/local/apache/conf/includes/pre_main_global.conf
Server Built With:  -D APACHE_MPM_DIR="server/mpm/prefork"
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_SYSVSEM_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D HTTPD_ROOT="/usr/local/apache"
 -D SUEXEC_BIN="/usr/local/apache/bin/suexec"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"
Comment 1 pioklo 2010-05-27 13:08:39 UTC
Hello !

Please try using worker instead of prefork
I am using that  MPM and process  gets killed normally after graceful

Piotr
Comment 2 Laurent Declercq 2010-08-23 20:43:08 UTC
I can confirm this issue with worker:

[Mon Aug 23 23:29:44 2010] [notice] SIGUSR1 received.  Doing graceful restart
[Mon Aug 23 23:29:46 2010] [emerg] [client 192.168.0.130] (22)Invalid argument: mod_fcgid: can't lock process table in pid 2302, referer: http://admin.nuxwin.com/reseller/users.php
[Mon Aug 23 23:29:46 2010] [emerg] [client 192.168.0.130] (22)Invalid argument: mod_fcgid: can't lock process table in pid 2303, referer: http://admin.nuxwin.com/reseller/users.php
[Mon Aug 23 23:29:47 2010] [notice] Apache/2.2.16 (Debian) mod_fcgid/2.3.5 configured -- resuming normal operations


root@ispcp:~# apache2ctl -V
Server version: Apache/2.2.16 (Debian)
Server built:   Jul 24 2010 20:24:16
Server's Module Magic Number: 20051115:24
Server loaded:  APR 1.4.2, APR-Util 1.3.9
Compiled using: APR 1.4.2, APR-Util 1.3.9
Architecture:   32-bit
Server MPM:     Worker
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
Server compiled with....
 -D APACHE_MPM_DIR="server/mpm/worker"
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_SYSVSEM_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=128
 -D HTTPD_ROOT="/etc/apache2"
 -D SUEXEC_BIN="/usr/lib/apache2/suexec"
 -D DEFAULT_PIDLOG="/var/run/apache2.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="mime.types"
 -D SERVER_CONFIG_FILE="apache2.conf"
Comment 3 Laurent Declercq 2010-08-23 20:43:41 UTC
Updated to 2.2.16
Comment 4 Gregg L. Smith 2010-08-24 19:00:46 UTC
Created attachment 25933 [details]
Patch to fix graceful restart/stop bug in Windows
Comment 5 Gregg L. Smith 2010-08-24 19:07:37 UTC
Confirmed on Windows XP

Long standing bug going back to Apache/2.2.8 and mod_fcgid/2.2. 
Has as bad of an effect on Windows just Apache's death is a little more sudden.

If you start at console, use php via mod_fcgid and then restart with a ctrl+ScrLk, you get this;

Event Type:	Information
Event Source:	DrWatson

Description:
The application, H:\Apache23\bin\httpd.exe, generated an application error The error occurred on 08/24/2010 @ 15:13:13.046 The exception generated was c0000005 at address 46434CB2 (mod_fcgid!wakeup_thread)

Opening up Task Manager and looking in the processes you see a long httpd process hanging around. It has a lock on port 80 and the logs so you see nothing in the error log. Apache still is answering at this point but shows stopped and has released the console.

If Apache is running as a service on your production server, and you restart the service you see this in rapid succession in the event log in order of appearance;

The Apache service named reported the following error:
>>> (OS 10048)Only one usage of each socket address (protocol/network address/port) is normally permitted. : make_sock: could not bind to address 0.0.0.0:80 .

The Apache service named reported the following error:
>>> no listening sockets available, shutting down .

The Apache service named reported the following error:
>>> Unable to open logs

Apples and Oranges to some degree since the first event log is running Apache 2.3.8-rc and the following event log entries are from I think Apache 2.2.15 at the time.

In all cases a Stop/Shutdown does not kill the lone wolf.

This cause mod_fcgid to become useless cause if mod_fcgid happens to crash, Apache shutsdowns, the lone wolf continues to answer requests till such time as it possibly gets old and dies a good death or fcgid crashes again later down the road and kills the lone wolf. Now your server is not answering requests, and it usually happens 10 minutes after going to bed (Murphy's Law) and you do not notice it till noon the next day.

IIRC this was in the Issue Tracker at Sourceforge and included patches to deal with this on Windows. Why it did not get picked up I do not know. The old tracker seems to have jumped off a cliff into extinction. However, there are still breadcrumbs to be found. The attached patch against trunk is based on one of these breadcrumbs.

Greetings and all credit to Tom.

I believe the caller to procmgr_child_init expects a return or odd things happen later down the road but will not swear to it. I can't see it hurting cause after the patch the function literally does nothing but return APR_SUCCESS. Good explanation of why the bug exists and original patches are in the thread.

http://www.mail-archive.com/mod-fcgid-users@lists.sourceforge.net/msg00223.html
Comment 6 Jeff Trawick 2010-11-21 19:12:42 UTC
track the Windows crash with bug 50309, recently opened
Comment 7 Lefty 2010-11-23 10:39:00 UTC
I faced similar problems. On gracefull stop, apache should release logs and close ports and then let the workers finish (or at least I hope so). However worker + fcgi combination blocks the server, until all children quit.

Gracefull reload works for me, but spawning scores rise quite high (lots of terminated fcgi apps), so that fcgi refuses to start new workers for quite a long time, because of FcgidSpawnScoreUpLimit.
Comment 8 William A. Rowe Jr. 2010-11-29 23:05:25 UTC
Apache cannot close/release logs until things have been logged.

The solution is a mutex that needs again to be refactored between httpd and apr.
Comment 9 E.D. 2011-04-13 10:39:48 UTC
Getting the bug here as well on:

apache2-mpm-worker                  2.2.16-6+squeeze1
libapache2-mod-fcgid                1:2.3.6-1
Comment 10 Michał Grzędzicki 2011-09-26 12:42:39 UTC
try setting

GracefulShutdownTimeout 4
Comment 11 Gregg L. Smith 2011-11-28 20:07:30 UTC
Comment on attachment 25933 [details]
Patch to fix graceful restart/stop bug in Windows

patch obsolete, current form of similar patch by Mario Brandt in PR 50309
Comment 12 Mario 2013-02-27 20:13:47 UTC
Hasn't this been fixed in PR 50309 ??
Comment 13 William A. Rowe Jr. 2018-11-07 21:09:57 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.