Bug 24805

Summary:	piped log programs respawning again after Apache is stopped
Product:	Apache httpd-2	Reporter:	Herbert G. Fischer <herbert>
Component:	All	Assignee:	Apache HTTPD Bugs Mailing List <bugs>
Status:	CLOSED FIXED
Severity:	critical	CC:	an, c.rueedi, drees76, kachan, ltregouet, pb
Priority:	P3
Version:	2.0.48
Target Milestone:	---
Hardware:	PC
OS:	Linux
Attachments:	keep apache from trying to restart piped loggers at termination

Description Herbert G. Fischer 2003-11-18 23:23:14 UTC

Hi,

After digging the internet for something usefull on this problem I solved
posting this bug report with all I discovered about this "bug" (probably).

When Apache starts, it starts all piped log programs "instances"
correctly, but, when I stop Apache, all piped log programs instances remain
running "sleepy" (not zombie). Starting Apache again, it forks another set of
instances of these piped log programs leaving a lot of them running on the system.

Initialy I tested with rotatelogs and cronologs, but also this problem can
happen with any piped program, dealing or not with the logs (even a simple grep
as piped log program).

I started debugging Apache by attaching strace to the main root Apache process
and stoping Apache to see the error messages. I discovered that Apache logs on
error_log these messages:

piped log program '/usr/local/sbin/cronolog
/usr/local/sites/default/logs/access.%Y-%m-%d.log' failed unexpectedly
piped log program '/usr/local/sbin/cronolog
/usr/local/sites/portal/logs/access.%Y-%m-%d.log' failed unexpectedly

The interesting thing is that when I use strace this way and stops Apache, all
instances of the piped log programs stops as nothing had happened wrong.

Here are some of the lines strace shows me:

==============================================================================
wait4(-1, 0xbffff900, WNOHANG|WUNTRACED, NULL) = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
wait4(-1, 0xbffff900, WNOHANG|WUNTRACED, NULL) = 0
select(0, NULL, NULL, NULL, {1, 0})     = ? ERESTARTNOHAND (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
sigreturn()                             = ? (mask now [RTMIN])
getpgrp()                               = 28370
kill(-28370, SIGTERM)                   = 0
--- SIGTERM (Terminated) @ 0 (0) ---
sigreturn()                             = ? (mask now [RTMIN])
--- SIGCHLD (Child exited) @ 0 (0) ---
select(0, NULL, NULL, NULL, {0, 16384}) = 0 (Timeout)
wait4(28373, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL)
= 28373
wait4(28374, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL)
= 28374
wait4(28375, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL)
= 28375
wait4(28376, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL)
= 28376
wait4(28377, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL)
= 28377
wait4(28372, [WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM], WNOHANG, NULL) =
28372
write(2, "piped log program \'/usr/local/sb"..., 115) = 115
pipe([17, 18])                          = 0
dup2(7, 17)                             = 17
dup2(11, 18)                            = 18
access("/usr/local/sbin/cronolog", R_OK|X_OK) = 0
fork()                                  = 28428
close(17)                               = 0
wait4(28371, [WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM], WNOHANG, NULL) =
28371
write(2, "piped log program \'/usr/local/sb"..., 114) = 114
pipe([17, 19])                          = 0
dup2(5, 17)                             = 17
dup2(8, 19)                             = 19
access("/usr/local/sbin/cronolog", R_OK|X_OK) = 0
fork()                                  = 28429
close(17)                               = 0
unlink("/usr/local/httpd-2.0.48/logs/httpd.pid") = 0
getpid()                                = 28370
gettimeofday({1069192645, 791003}, NULL) = 0
write(13, "[Tue Nov 18 19:57:25 2003] [noti"..., 66) = 66
==============================================================================

Then I started digging Apaches source code (without understanding too much) and
I built apache with -DNEED_WAITPID but this did not worked. I thought on some
patch for the part that Apache kills all piped logs with apr funcions but that
seams too much complicated for me.

Here are some details of boot systems that I'm having this problem:

OS.: Slackware 9.1 (gcc 3.2.3, glibc 2.3.2, kernel 2.4.22)

ldd output from httpd:
libssl.so.0 => /usr/lib/libssl.so.0 (0x4001f000)
libcrypto.so.0 => /usr/lib/libcrypto.so.0 (0x40051000)
libaprutil-0.so.0 => /usr/local/httpd-2.0.48/lib/libaprutil-0.so.0 (0x4014f000)
libgdbm.so.2 => /usr/lib/libgdbm.so.2 (0x40167000)
libdb-3.3.so => /lib/libdb-3.3.so (0x4016d000)
libexpat.so.0 => /usr/lib/libexpat.so.0 (0x401f6000)
libapr-0.so.0 => /usr/local/httpd-2.0.48/lib/libapr-0.so.0 (0x40215000)
librt.so.1 => /lib/librt.so.1 (0x40239000)
libm.so.6 => /lib/libm.so.6 (0x4024b000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0x4026e000)
libnsl.so.1 => /lib/libnsl.so.1 (0x4029b000)
libdl.so.2 => /lib/libdl.so.2 (0x402b0000)
libpthread.so.0 => /lib/libpthread.so.0 (0x402b3000)
libc.so.6 => /lib/libc.so.6 (0x40305000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

and the configure command I used to build apache (without -DNEED_WAITPID):
./configure \
        --prefix=/usr/local/httpd-2.0.48 \
        --enable-module=ALL \
        --enable-mime-magic \
        --enable-headers \
        --enable-expires \
        --enable-cgi \
        --enable-info \
        --enable-rewrite \
        --enable-ssl \
        --enable-so

Anything else ???

Comment 1 Herbert G. Fischer 2003-11-18 23:56:33 UTC

Correction:

After stoping Apache, the pipe log programs stops too, but Apache forks
another set of them
before stoping itself.

# apachectl start
# ps axuww

USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root     18785  1.0  0.7 22832 7732 ?        S    21:52   0:00
/usr/local/httpd-2.0.48/bin/httpd -k start
apache   18788  0.0  0.7 22856 7772 ?        S    21:52   0:00
/usr/local/httpd-2.0.48/bin/httpd -k start
apache   18789  0.0  0.7 22856 7772 ?        S    21:52   0:00
/usr/local/httpd-2.0.48/bin/httpd -k start
apache   18790  0.0  0.7 22856 7772 ?        S    21:52   0:00
/usr/local/httpd-2.0.48/bin/httpd -k start
apache   18791  0.0  0.7 22856 7772 ?        S    21:52   0:00
/usr/local/httpd-2.0.48/bin/httpd -k start
apache   18792  0.0  0.7 22856 7772 ?        S    21:52   0:00
/usr/local/httpd-2.0.48/bin/httpd -k start
root     18786  0.0  0.0  1480  268 ?        S    21:52   0:00
/usr/local/sbin/cronolog /usr/local/sites/portal/logs/access.%Y-%m-%d.log
root     18787  0.0  0.0  1480  268 ?        S    21:52   0:00
/usr/local/sbin/cronolog /usr/local/sites/default/logs/access.%Y-%m-%d.log

# apachectl stop
# ps axuww

root     18819  0.0  0.0  1480  268 ?        S    21:54   0:00
/usr/local/sbin/cronolog /usr/local/sites/default/logs/access.%Y-%m-%d.log
root     18820  0.0  0.0  1480  268 ?        S    21:54   0:00
/usr/local/sbin/cronolog /usr/local/sites/portal/logs/access.%Y-%m-%d.log

Comment 2 Herbert G. Fischer 2003-11-20 14:07:35 UTC

I created a small C program that captures all signals and logs to a file which
signal it received. This program only stops with SIGKILL.

I put it as a piped log program on httpd.conf to test what apache is doing with
them.

After running apachectl start, apache runs twice each piped log. I've put two
custom logs with piped logs on httpd.conf and apache started 4 piped log instances.

here are the logs after apache starts:

pid 9146 started at Thu Nov 20 11:58:51 2003
pid 9146 received signal 15
pid 9147 started at Thu Nov 20 11:58:51 2003
pid 9147 received signal 15
pid 9149 started at Thu Nov 20 11:58:52 2003
pid 9150 started at Thu Nov 20 11:58:52 2003

Apache starts a set of these piped log programs and send SIGTERM to them. After,
apache starts them again.

After calling apachectl stop, here are the logs:

pid 9146 started at Thu Nov 20 11:58:51 2003
pid 9146 received signal 15
pid 9147 started at Thu Nov 20 11:58:51 2003
pid 9147 received signal 15
pid 9149 started at Thu Nov 20 11:58:52 2003
pid 9149 received signal 15
pid 9150 started at Thu Nov 20 11:58:52 2003
pid 9150 received signal 15

Apache sent SIGTERM to the last two instances of my piped log programs.

I think that the bug is related to how Apache treats piped log programs crashes.
 When calling apachectl stop, apache kills all piped log programs with SIGTERM
but the crash-safe code of piped log programs starts all piped log programs
again, leaving them running before apache really stops.

Please! I need help here! Nobody is helping! I'm starting to consider not
testing Apache 2 anymore if nobody helps me on this problem.

Comment 3 Jeff Trawick 2003-11-20 15:47:07 UTC

I am sorry you are frustrated, and I am very empathetic.  Please note that
something is happening with this PR, but it is not at all obvious to the casual
observer.

The piped log program respawning issue appears to be related to a mod_cgid
daemon respawning issue that has prevented the cgid daemon respawning logic to
be merged into stable.

What we're working on is a way for other child logic such as piped log
management and cgid daemon management to be able to query whether or not the
server is terminating.  That info is needed to avoid re-spawning the child at
termination.  See the thread "ap_get_server_state()" on dev@httpd.  Respawning
logic is just one use for ap_get_server_state(), so there probably isn't any
explicit text describing the relationship in that thread.

Comment 4 Jeff Trawick 2003-11-20 15:50:24 UTC

another note in case certain developers see this and think I am ignoring their
comments :)

replace "ap_get_server_state()" references in my previous update to this PR with
"new flavor of ap_mpm_query(), or something else that gives the state of the server"

Comment 5 Jeff Trawick 2003-11-20 21:57:26 UTC

you didn't mention what MPM you're using...  I added some general support to
prefork thus far to fix this type of problem, and also changed the piped log
support to use this logic...  changes for worker and other MPMs will come later

I'll attach a patch for 2.0.48 and prefork MPM in a moment...  if you're using
prefork, please try it out...  the symptoms I get seem to vary from platform to
platform (sometimes I can't see any symptom, sometimes Apache tries to start the
piped log program but something has been corrupted and prevents it from
starting, etc.)

Comment 6 Jeff Trawick 2003-11-20 21:58:53 UTC

Created attachment 9220 [details]
keep apache from trying to restart piped loggers at termination

Comment 7 Herbert G. Fischer 2003-11-20 22:31:08 UTC

I don't know what MPM is. Can you tell me, so I can tell you what MPM I'm using.

I found that all this problem doesn't happen on a IBM PPC 7046-B50 PowerPC 604r
with SuSE 7.3 PPC.

Comment 8 Jeff Trawick 2003-11-21 00:05:59 UTC

If you don't know which MPM, that probably means you're using the default
(prefork).  You can display the MPM via "apachectl -V".  Here is how prefork looks:

$ /usr/local/apache2047/bin/apachectl -V | grep MPM
 -D APACHE_MPM_DIR="server/mpm/prefork"

Here is how worker looks:

$ ph/2.0.47/built/bin/apachectl -V | grep MPM
 -D APACHE_MPM_DIR="server/mpm/worker"

Incidentally, I too had a system that had no apparent ill symptom (RedHat 6.1 on
a Pentium III 500MHz).  I suspect that many (most?) people aren't noticing any
such problem with piped loggers, at least on a regular basis.

Comment 9 Herbert G. Fischer 2003-11-21 02:18:47 UTC

Yes... It's prefork. If I can switch this to another thing, what do you
recommend ? I pretend to try to see if this problem is solved for now.

here is my apachectl -V output:

Server version: Apache/2.0.48
Server built:   Nov 18 2003 21:28:26
Server's Module Magic Number: 20020903:4
Architecture:   32-bit
Server compiled with....
 -D APACHE_MPM_DIR="server/mpm/prefork"
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_SYSVSEM_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D HTTPD_ROOT="/usr/local/httpd-2.0.48"
 -D SUEXEC_BIN="/usr/local/httpd-2.0.48/bin/suexec"
 -D DEFAULT_PIDLOG="logs/httpd.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_LOCKFILE="logs/accept.lock"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"


This are the hardware configuration of both machines I've experienced problems:

Intel P4 Xeon 2.2 Ghz - 2GB RAM
Intel P4 1.6 - 512MB RAM

Both are running optimized kernels for P4.

I was thinking on something... but I cannot see if it's absurd... 
Both machines are very fast, and both are using Slack 9.1
As Apache 2 have multithread support, can the speed affect Apache ou some system
lib for multithread ??

Tks!

Comment 10 Herbert G. Fischer 2003-11-21 02:23:21 UTC

Also:

I need to use PHP within these servers. So, I going to have problems on changing
the MPM to worker or another else ??

Comment 11 Jeff Trawick 2003-11-21 02:35:36 UTC

I'm not suggesting you switch to worker MPM ;)  I originally didn't know which
MPM you were using and wanted to point out that the patch doesn't fix the worker
scenario, only the prefork scenario.  Now we know you're using prefork, so keep
your configuration, apply the patch, and do make && make install and hopefully
this problem will be resolved for you.

Comment 12 Herbert G. Fischer 2003-11-21 12:06:19 UTC

I see.

I tested with MPM worker and PHP+ZTS and It worked correctly. No respawning
piped log programs.

This setup it stable ?? Can I leave that ?

Comment 13 Jeff Trawick 2003-11-21 12:20:49 UTC

The PHP folks recommend to NOT use PHP with a threaded Apache 2 MPM such as
worker, because they offer interfaces to a lot of 3rd party libraries and cannot
ensure that such calls are thread-safe.

I don't know how to evaluate whether or not your PHP usage is subject to such
problems.  Unless the PHP folks can help you evaluate your situation, I'd
recommend using the prefork MPM.

Comment 14 Herbert G. Fischer 2003-11-21 12:33:34 UTC

Thanks for you help. I'm getting back to prefork with your patch applied. But
I'm not giving up.

If you need anything to help debug this problem, as I have it, please, tell me.

And thanks for the patch!

Comment 15 Jeff Trawick 2003-11-21 14:57:22 UTC

*** Bug 19035 has been marked as a duplicate of this bug. ***

Comment 16 Jeff Trawick 2003-11-22 02:29:04 UTC

*** Bug 19981 has been marked as a duplicate of this bug. ***

Comment 17 Jeff Trawick 2003-12-13 22:02:45 UTC

*** Bug 23109 has been marked as a duplicate of this bug. ***

Comment 18 Jeff Trawick 2003-12-13 22:22:04 UTC

Fix committed to 2.1-dev.  Hopefully it can be committed to stable branch before
too terribly long.

Comment 19 Jeff Trawick 2003-12-14 01:22:15 UTC

*** Bug 19846 has been marked as a duplicate of this bug. ***

Comment 20 David Rees 2003-12-16 02:37:30 UTC

Just want to confirm that this patch does appear to fix the bug for me, I no
longer have stale cronolog processes sitting around after stopping or restarting
Apache.

Comment 21 Peter Bieringer 2003-12-17 15:23:53 UTC

My problem was the same David Rees wrote about, I can also confirm that it is
solved now.

Please push it into 2.0.49

Comment 22 Herbert G. Fischer 2003-12-17 17:03:25 UTC

The patch solved my problem too. Thanks!
I recommend this onto the next Apache release.

Comment 23 Jeff Trawick 2003-12-17 17:50:10 UTC

Thanks for the reports, folks!  I have indicated with the merge request that it
was tested successfully by our users.