Hi, After digging the internet for something usefull on this problem I solved posting this bug report with all I discovered about this "bug" (probably). When Apache starts, it starts all piped log programs "instances" correctly, but, when I stop Apache, all piped log programs instances remain running "sleepy" (not zombie). Starting Apache again, it forks another set of instances of these piped log programs leaving a lot of them running on the system. Initialy I tested with rotatelogs and cronologs, but also this problem can happen with any piped program, dealing or not with the logs (even a simple grep as piped log program). I started debugging Apache by attaching strace to the main root Apache process and stoping Apache to see the error messages. I discovered that Apache logs on error_log these messages: piped log program '/usr/local/sbin/cronolog /usr/local/sites/default/logs/access.%Y-%m-%d.log' failed unexpectedly piped log program '/usr/local/sbin/cronolog /usr/local/sites/portal/logs/access.%Y-%m-%d.log' failed unexpectedly The interesting thing is that when I use strace this way and stops Apache, all instances of the piped log programs stops as nothing had happened wrong. Here are some of the lines strace shows me: ============================================================================== wait4(-1, 0xbffff900, WNOHANG|WUNTRACED, NULL) = 0 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) wait4(-1, 0xbffff900, WNOHANG|WUNTRACED, NULL) = 0 select(0, NULL, NULL, NULL, {1, 0}) = ? ERESTARTNOHAND (To be restarted) --- SIGTERM (Terminated) @ 0 (0) --- sigreturn() = ? (mask now [RTMIN]) getpgrp() = 28370 kill(-28370, SIGTERM) = 0 --- SIGTERM (Terminated) @ 0 (0) --- sigreturn() = ? (mask now [RTMIN]) --- SIGCHLD (Child exited) @ 0 (0) --- select(0, NULL, NULL, NULL, {0, 16384}) = 0 (Timeout) wait4(28373, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL) = 28373 wait4(28374, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL) = 28374 wait4(28375, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL) = 28375 wait4(28376, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL) = 28376 wait4(28377, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL) = 28377 wait4(28372, [WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM], WNOHANG, NULL) = 28372 write(2, "piped log program \'/usr/local/sb"..., 115) = 115 pipe([17, 18]) = 0 dup2(7, 17) = 17 dup2(11, 18) = 18 access("/usr/local/sbin/cronolog", R_OK|X_OK) = 0 fork() = 28428 close(17) = 0 wait4(28371, [WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM], WNOHANG, NULL) = 28371 write(2, "piped log program \'/usr/local/sb"..., 114) = 114 pipe([17, 19]) = 0 dup2(5, 17) = 17 dup2(8, 19) = 19 access("/usr/local/sbin/cronolog", R_OK|X_OK) = 0 fork() = 28429 close(17) = 0 unlink("/usr/local/httpd-2.0.48/logs/httpd.pid") = 0 getpid() = 28370 gettimeofday({1069192645, 791003}, NULL) = 0 write(13, "[Tue Nov 18 19:57:25 2003] [noti"..., 66) = 66 ============================================================================== Then I started digging Apaches source code (without understanding too much) and I built apache with -DNEED_WAITPID but this did not worked. I thought on some patch for the part that Apache kills all piped logs with apr funcions but that seams too much complicated for me. Here are some details of boot systems that I'm having this problem: OS.: Slackware 9.1 (gcc 3.2.3, glibc 2.3.2, kernel 2.4.22) ldd output from httpd: libssl.so.0 => /usr/lib/libssl.so.0 (0x4001f000) libcrypto.so.0 => /usr/lib/libcrypto.so.0 (0x40051000) libaprutil-0.so.0 => /usr/local/httpd-2.0.48/lib/libaprutil-0.so.0 (0x4014f000) libgdbm.so.2 => /usr/lib/libgdbm.so.2 (0x40167000) libdb-3.3.so => /lib/libdb-3.3.so (0x4016d000) libexpat.so.0 => /usr/lib/libexpat.so.0 (0x401f6000) libapr-0.so.0 => /usr/local/httpd-2.0.48/lib/libapr-0.so.0 (0x40215000) librt.so.1 => /lib/librt.so.1 (0x40239000) libm.so.6 => /lib/libm.so.6 (0x4024b000) libcrypt.so.1 => /lib/libcrypt.so.1 (0x4026e000) libnsl.so.1 => /lib/libnsl.so.1 (0x4029b000) libdl.so.2 => /lib/libdl.so.2 (0x402b0000) libpthread.so.0 => /lib/libpthread.so.0 (0x402b3000) libc.so.6 => /lib/libc.so.6 (0x40305000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) and the configure command I used to build apache (without -DNEED_WAITPID): ./configure \ --prefix=/usr/local/httpd-2.0.48 \ --enable-module=ALL \ --enable-mime-magic \ --enable-headers \ --enable-expires \ --enable-cgi \ --enable-info \ --enable-rewrite \ --enable-ssl \ --enable-so Anything else ???
Correction: After stoping Apache, the pipe log programs stops too, but Apache forks another set of them before stoping itself. # apachectl start # ps axuww USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 18785 1.0 0.7 22832 7732 ? S 21:52 0:00 /usr/local/httpd-2.0.48/bin/httpd -k start apache 18788 0.0 0.7 22856 7772 ? S 21:52 0:00 /usr/local/httpd-2.0.48/bin/httpd -k start apache 18789 0.0 0.7 22856 7772 ? S 21:52 0:00 /usr/local/httpd-2.0.48/bin/httpd -k start apache 18790 0.0 0.7 22856 7772 ? S 21:52 0:00 /usr/local/httpd-2.0.48/bin/httpd -k start apache 18791 0.0 0.7 22856 7772 ? S 21:52 0:00 /usr/local/httpd-2.0.48/bin/httpd -k start apache 18792 0.0 0.7 22856 7772 ? S 21:52 0:00 /usr/local/httpd-2.0.48/bin/httpd -k start root 18786 0.0 0.0 1480 268 ? S 21:52 0:00 /usr/local/sbin/cronolog /usr/local/sites/portal/logs/access.%Y-%m-%d.log root 18787 0.0 0.0 1480 268 ? S 21:52 0:00 /usr/local/sbin/cronolog /usr/local/sites/default/logs/access.%Y-%m-%d.log # apachectl stop # ps axuww root 18819 0.0 0.0 1480 268 ? S 21:54 0:00 /usr/local/sbin/cronolog /usr/local/sites/default/logs/access.%Y-%m-%d.log root 18820 0.0 0.0 1480 268 ? S 21:54 0:00 /usr/local/sbin/cronolog /usr/local/sites/portal/logs/access.%Y-%m-%d.log
I created a small C program that captures all signals and logs to a file which signal it received. This program only stops with SIGKILL. I put it as a piped log program on httpd.conf to test what apache is doing with them. After running apachectl start, apache runs twice each piped log. I've put two custom logs with piped logs on httpd.conf and apache started 4 piped log instances. here are the logs after apache starts: pid 9146 started at Thu Nov 20 11:58:51 2003 pid 9146 received signal 15 pid 9147 started at Thu Nov 20 11:58:51 2003 pid 9147 received signal 15 pid 9149 started at Thu Nov 20 11:58:52 2003 pid 9150 started at Thu Nov 20 11:58:52 2003 Apache starts a set of these piped log programs and send SIGTERM to them. After, apache starts them again. After calling apachectl stop, here are the logs: pid 9146 started at Thu Nov 20 11:58:51 2003 pid 9146 received signal 15 pid 9147 started at Thu Nov 20 11:58:51 2003 pid 9147 received signal 15 pid 9149 started at Thu Nov 20 11:58:52 2003 pid 9149 received signal 15 pid 9150 started at Thu Nov 20 11:58:52 2003 pid 9150 received signal 15 Apache sent SIGTERM to the last two instances of my piped log programs. I think that the bug is related to how Apache treats piped log programs crashes. When calling apachectl stop, apache kills all piped log programs with SIGTERM but the crash-safe code of piped log programs starts all piped log programs again, leaving them running before apache really stops. Please! I need help here! Nobody is helping! I'm starting to consider not testing Apache 2 anymore if nobody helps me on this problem.
I am sorry you are frustrated, and I am very empathetic. Please note that something is happening with this PR, but it is not at all obvious to the casual observer. The piped log program respawning issue appears to be related to a mod_cgid daemon respawning issue that has prevented the cgid daemon respawning logic to be merged into stable. What we're working on is a way for other child logic such as piped log management and cgid daemon management to be able to query whether or not the server is terminating. That info is needed to avoid re-spawning the child at termination. See the thread "ap_get_server_state()" on dev@httpd. Respawning logic is just one use for ap_get_server_state(), so there probably isn't any explicit text describing the relationship in that thread.
another note in case certain developers see this and think I am ignoring their comments :) replace "ap_get_server_state()" references in my previous update to this PR with "new flavor of ap_mpm_query(), or something else that gives the state of the server"
you didn't mention what MPM you're using... I added some general support to prefork thus far to fix this type of problem, and also changed the piped log support to use this logic... changes for worker and other MPMs will come later I'll attach a patch for 2.0.48 and prefork MPM in a moment... if you're using prefork, please try it out... the symptoms I get seem to vary from platform to platform (sometimes I can't see any symptom, sometimes Apache tries to start the piped log program but something has been corrupted and prevents it from starting, etc.)
Created attachment 9220 [details] keep apache from trying to restart piped loggers at termination
I don't know what MPM is. Can you tell me, so I can tell you what MPM I'm using. I found that all this problem doesn't happen on a IBM PPC 7046-B50 PowerPC 604r with SuSE 7.3 PPC.
If you don't know which MPM, that probably means you're using the default (prefork). You can display the MPM via "apachectl -V". Here is how prefork looks: $ /usr/local/apache2047/bin/apachectl -V | grep MPM -D APACHE_MPM_DIR="server/mpm/prefork" Here is how worker looks: $ ph/2.0.47/built/bin/apachectl -V | grep MPM -D APACHE_MPM_DIR="server/mpm/worker" Incidentally, I too had a system that had no apparent ill symptom (RedHat 6.1 on a Pentium III 500MHz). I suspect that many (most?) people aren't noticing any such problem with piped loggers, at least on a regular basis.
Yes... It's prefork. If I can switch this to another thing, what do you recommend ? I pretend to try to see if this problem is solved for now. here is my apachectl -V output: Server version: Apache/2.0.48 Server built: Nov 18 2003 21:28:26 Server's Module Magic Number: 20020903:4 Architecture: 32-bit Server compiled with.... -D APACHE_MPM_DIR="server/mpm/prefork" -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D HTTPD_ROOT="/usr/local/httpd-2.0.48" -D SUEXEC_BIN="/usr/local/httpd-2.0.48/bin/suexec" -D DEFAULT_PIDLOG="logs/httpd.pid" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_LOCKFILE="logs/accept.lock" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" This are the hardware configuration of both machines I've experienced problems: Intel P4 Xeon 2.2 Ghz - 2GB RAM Intel P4 1.6 - 512MB RAM Both are running optimized kernels for P4. I was thinking on something... but I cannot see if it's absurd... Both machines are very fast, and both are using Slack 9.1 As Apache 2 have multithread support, can the speed affect Apache ou some system lib for multithread ?? Tks!
Also: I need to use PHP within these servers. So, I going to have problems on changing the MPM to worker or another else ??
I'm not suggesting you switch to worker MPM ;) I originally didn't know which MPM you were using and wanted to point out that the patch doesn't fix the worker scenario, only the prefork scenario. Now we know you're using prefork, so keep your configuration, apply the patch, and do make && make install and hopefully this problem will be resolved for you.
I see. I tested with MPM worker and PHP+ZTS and It worked correctly. No respawning piped log programs. This setup it stable ?? Can I leave that ?
The PHP folks recommend to NOT use PHP with a threaded Apache 2 MPM such as worker, because they offer interfaces to a lot of 3rd party libraries and cannot ensure that such calls are thread-safe. I don't know how to evaluate whether or not your PHP usage is subject to such problems. Unless the PHP folks can help you evaluate your situation, I'd recommend using the prefork MPM.
Thanks for you help. I'm getting back to prefork with your patch applied. But I'm not giving up. If you need anything to help debug this problem, as I have it, please, tell me. And thanks for the patch!
*** Bug 19035 has been marked as a duplicate of this bug. ***
*** Bug 19981 has been marked as a duplicate of this bug. ***
*** Bug 23109 has been marked as a duplicate of this bug. ***
Fix committed to 2.1-dev. Hopefully it can be committed to stable branch before too terribly long.
*** Bug 19846 has been marked as a duplicate of this bug. ***
Just want to confirm that this patch does appear to fix the bug for me, I no longer have stale cronolog processes sitting around after stopping or restarting Apache.
My problem was the same David Rees wrote about, I can also confirm that it is solved now. Please push it into 2.0.49
The patch solved my problem too. Thanks! I recommend this onto the next Apache release.
Thanks for the reports, folks! I have indicated with the merge request that it was tested successfully by our users.