With the latest version of Apache (2.0.47) we are seeing occassional defunct/zombie cgi's. The cgi's can't be killed (won't die with kill -9 pid) but do die if we kill the parent httpd process. This seems to me to be identical to bug report 21737 --- but with the 2.0.47 version (21737 was for the 1... version). Since their fix was to alloc.c, and I don't see an alloc.c to compare too, I'm seeking help on fixing this problem. The zombies are not taking up cpu cycles, of course... but do tend to deplete the process count pool. We've counted as high as 60 zombies in one situation. Last night there were 8.
suexec or not? mod_cgi or mod_cgid? probably doesn't matter, but which MPM?
configure:12836: checking whether to enable mod_suexec configure:12888: result: no config.log:MPM_LIB='server/mpm/prefork/libprefork.la' config.log:MPM_NAME='prefork' config.log:MPM_SUBDIR_NAME='prefork' config.log:#define APACHE_MPM_DIR "server/mpm/prefork" ./httpd -l Compiled in modules: core.c mod_access.c mod_auth.c mod_include.c mod_log_config.c mod_env.c mod_setenvif.c mod_ssl.c prefork.c http_core.c mod_mime.c mod_status.c mod_autoindex.c mod_asis.c mod_cgi.c mod_negotiation.c mod_dir.c mod_imap.c mod_actions.c mod_userdir.c mod_alias.c mod_rewrite.c mod_so.c
Does this happen even for simple CGIs such as printenv (in cgi-bin dir of default install), or only for setuid binaries, or what? Also, can you get a truss of a CGI request, including both the web server child handling the request and the CGI itself? Start the server like this: # truss -o outfile -f ./httpd -DONE_PROCESS and run a couple of CGI requests, then use ps to see whether or not the zombie problem occurs, then interupt truss+httpd. If this run exhibited the zombie problem, send in the truss. If not, you may need to start the server normally, run truss against one of the children (truss -o outfile -f -p PID) and keep doing CGI requests until the truss-ed process handles it and we can see the trace.
It is not specific to any cgi. It is difficult for us to reproduce this because we can't predict when it will happen and these are public/commercial servers with which we don't have the luxury of playing with. Is there something I can do once I get zombies? The zombies usually belong to one or two parents. If there is information that I can get from that parent for you that would be useful, let me know (just killed 27 zombies in fact).
I don't know what the next step is, unfortunately. I've been testing 2.0.47 with default config (prefork, mod_cgi, no suexec) this afternoon and using printenv as the example cgi. No long-term zombies. printenv goes through zombie state temporarily but Apache cleans it up very soon after. I'm curious about how you can tell it isn't specific to some cgi. All I see from ps for zombies is trawick 6872 29703 0 0:00 <defunct> Is it possible that the zombie represents a child process that the CGI script created, and not the CGI script itself? Apache parent -> Apache child process -> CGI script -> some command invoked by the CGI Maybe there is some infrequent condition where the Apache child process terminates the CGI script before it has reaped status from the command it runs, and then the Apache child process becomes the parent of the command invoked by the CGI. Since the Apache child process doesn't call waitpid() to collect status from arbitrary processes, then the zombie never gets cleaned up. Apache will terminate the CGI script with SIGTERM (and later SIGKILL) if the CGI script keeps running for a while after the client connection drops. >Is there something I can do once I get zombies? nothing easy that I know of...
I wasn't able to recreate any zombies in this scenario Apache parent -> Apache child process -> CGI script -> some command invoked by the CGI when the CGI script exited without reaping status from its child. (just the way Unix works I guess) If you set MaxRequestsPerChild relatively low, won't that take care of zombies? Another VERY stray thought is to write a simple module that calls waitpid(-1,,) to try to reap status from any stray child process remaining for any reason. Since this is prefork, it shouldn't interfere with any other requests.
Is this reproducible in 2.0.54?
I run Debian stable with apache2 2.0.54. I can confirm that this version leaves defuncts every now and then. It does this every few days, and what's happening is all the defuncts lock up all the apache processes and the server is unresponsible and has to be restarted.
Does this affect 2.2.x?
Nearly three years in NEEDINFO, closing old 2.0 report. If it's not fixed in 2.0.latest, it won't get fixed in 2.0.any.
Still there in 2.2.13-1fc11. I have isolated it: it can be simply reproduced with a cgi containing sleep 9999 >/dev/null & and fixed by redirecting the stderr of the child: sleep 9999 >/dev/null 2> /dev/null & (the stdout redir is needed anyway for the HTTP request to complete) So it boils down to: CGI exits with the stderr dup'ed over to a lingering child. I assume this is linked to Apache's capture of CGI's stderrs (for error_log), not expecting their lifecycle to be decoupled from the CGI process's. Apologies if this is not the proper place to reopen. Spank me in that case :)