I use Apache 2.2.3 on Solaris. All Logfiles are configured to use rotatelogs. Whenever I restart Apache either with "restart" or with "graceful" The first piped logger, which cares about the global error log, is left over in the process table. To be more precise: For all loggers new processes get started, and all apart from the described old one terminate. The problem is this unnecessary old process that doesn't die. One description of a reason is, that Apache 2.2 uses APR to do the fork and APR uses "/bin/sh -c" to do that. Unfortunately /bin/sh on Solaris doesn an additional fork when called with "-c" and we end up with a process tree of depth three: Apache-/bin/sh-rotatelogs. It looks like the termination mechanism and the signal handling of Apache is not made for this situation. I described a related problem in Bugzilla http://issues.apache.org/bugzilla/show_bug.cgi?id=38989 I didn't really trace the signal problem down to the details for Apache 2.2/APR but again the same solution does work: If I use another Shell in SHELL_PATH, which does not fork for "-c", the problem goes away. Changing the SHELL_PATH in apr_arch_threadproc.h from "/bin/sh" to "/usr/xpg4/bin/sh" worked. That's a XPG4 compliant shell which exists on Solaris since Solaris 2.5, so for about 10 years. The shell is similar to ksh (see "man ksh") and does *not* fork when called with "-c". It would be nice to at least document the behaviour. Really nice would be to make SHELL_PATH changeable via configure and to be able to change it via Apaches configure (and if that would be documented). If it's really necessary, I could pin down, why the problem only applies for the first error logger. On the other hand, I also would like to get rid of the additional /bin/sh processes, so using another shell would solve both problems.
(In reply to comment #0) > I use Apache 2.2.3 on Solaris. All Logfiles are configured to use rotatelogs. > Whenever I restart Apache either with "restart" or with "graceful" The first > piped logger, which cares about the global error log, is left over in the > process table. To be more precise: For all loggers new processes get started, > and all apart from the described old one terminate. The problem is this > unnecessary old process that doesn't die. Some questions: 1. Does the respective /bin/sh -c die? 2. Does a new rotatelog process get forked after the restart? 3. What is the parent process of the hanging logrotate process 4. What version of Solaris do you use? 5. Could you please truss /bin/sh -c and logrotate during the restart with options -faedl -vall -rall -wall? > It would be nice to at least document the behaviour. Really nice would be to > make SHELL_PATH changeable via configure and to be able to change it via Apaches > configure (and if that would be documented). Sounds reasonable to me.
Hi Rüdiger, Some questions: 1. Does the respective /bin/sh -c die? Yes. All of them die. 2. Does a new rotatelog process get forked after the restart? Yes. All needed new ones get started. 3. What is the parent process of the hanging logrotate process PID 1 4. What version of Solaris do you use? The behaviour was observed on SunOS 5.9 Generic_118558-05 sun4u sparc. I could try with Solaris 8 and 10 to, but that would have to wait until tuesday. 5. Could you please truss /bin/sh -c and logrotate during the restart with options -faedl -vall -rall -wall? Yes, I'll attach them in a minute. I didn't do the "rall", because I don't want the whole config to be in the truss. I'm confident, that the reads are not that important. Otherwise I need to redo the thing with a test configuration. Also I'll attach three ps excerpts, one directly after start, the second after graceful and the third after restart. The processes I'm talking about are: ps1.out: root 11820 11818 0 22:58:45 ? 0:00 /bin/sh -c /usr/local/apache22/bin/rotatelogs /var/apache/logs/error_log_main 8 ps1.out: root 11822 11820 0 22:58:45 ? 0:00 /usr/local/apache22/bin/rotatelogs /var/apache/logs/error_log_main 86400 !ps2.out: root 11822 1 0 22:58:45 ? 0:00 /usr/local/apache22/bin/rotatelogs /var/apache/logs/error_log_main 86400 ps2.out: root 11926 11818 0 22:59:32 ? 0:00 /bin/sh -c /usr/local/apache22/bin/rotatelogs /var/apache/logs/error_log_main 8 ps2.out: root 11929 11926 0 22:59:32 ? 0:00 /usr/local/apache22/bin/rotatelogs /var/apache/logs/error_log_main 86400 !ps3.out: root 11822 1 0 22:58:45 ? 0:00 /usr/local/apache22/bin/rotatelogs /var/apache/logs/error_log_main 86400 !ps3.out: root 11929 1 0 22:59:32 ? 0:00 /usr/local/apache22/bin/rotatelogs /var/apache/logs/error_log_main 86400 ps3.out: root 12034 11818 0 23:00:04 ? 0:00 /bin/sh -c /usr/local/apache22/bin/rotatelogs /var/apache/logs/error_log_main 8 ps3.out: root 12037 12034 0 23:00:04 ? 0:00 /usr/local/apache22/bin/rotatelogs /var/apache/logs/error_log_main 86400
Created attachment 18944 [details] gzipped truss
Created attachment 18945 [details] Process table after start
Created attachment 18946 [details] Process table after apachectl graceful
Created attachment 18947 [details] Process table after apachectl restart
Hi Rainer, many thanks for the quick feedback. I think I know why this problem only happens to the main error logrotater: httpd closes the writing side of the pipe to the logrotater. This auses the logrotater to exit and thus /bin/sh -c to exit. There is no need to send a SIGTERM to either /bin/sh -c or logrotate in this case. This does not work this way with the main error logrotater since the writing side fd has been previously copied (dup2ed) to stderr. After a start of the new main error log logrotater its wrting side should be dup2ed again to stderr. This should cause the old stderr file descriptor to be closed and thus causing the logrotator to exit as this is the last open writing side of the pipe. This does not seem to work on Solaris (libc on Solaris seems to do a fcntl(13, F_DUP2FD, 0x00000002) instead of dup2(13,2)). I did not have the time to do further tests on this issue. As you may notice the logrotater dies after the main httpd process has died as the OS does now seem to close the file descriptor correctly. Since I personally do not use logrotation on the main error log file for years (and this works perfectly if you do everything else in virtual hosts and nothing in the main server) I am not quite sure anymore, but I think I stopped logrotating the main error log file because log messages did not arrive there during restart / graceful restart. Nevertheless it makes still sense to me to stop hardcoding /bin/sh in APR. Additionally to be able to set SHELL_PATH via configure it might be even better if you can set the shell path via some APR function and APR only uses SHELL_PATH if nothing was set.
Hi Rüdiger, I'm impressed! I had a look at the truss output to, but got stuck exactly around the dup2 fcntl. You comments about main error logger are interesting. In fact it would be better not to pipe that one to be sure, that errors can be written at any time. And usually everything else happens in vhosts, so not rotating that one is not a huge problem. But nevertheless, freeing up half of the log child processes by using another shell to invoke still would be nice. The process table looks a lot cleaner after dropping the "/bin/sh -c" processes. I'm not very experienced with the auto(conf|make|header) stuff, although I did some changes to the configure process in mod_jk. If time permits, I could have a look at it for APR next sunday, but if you or anyone else knows easily how to include this feature I would be happy.
Created attachment 18949 [details] Small test program for dup2 Attached a small quick and dirty test program. If dup2 really does not work on Solaris it should print out: Child: Hello World! and NOT return to shell. If it returns to the shell dup2 works as expected. On my Solaris 9 testbox dup2 seems to work as expected. Odd!
If it turns out not to be a Solaris bug with dup2, can you please do the following: 1. Do a truss again as previously (-vall -wall -faedl). 2. Do a graceful restart. 3. Issue pfiles <pid of remaining logrotater> 4. Issue pfiles <pid of httpd main process> 5. Attach outputs of pfiles and truss and let me know the pids of the httpd main process and the remaining logrotater.
This is reproducible on Linux by using, e.g.: ErrorLog "| true; /path/to/rotatelogs etc" which forces the sh process to hang around.
This is a fun bug! After the graceful restart, the *new* piped logger for the main error log is running with its fd 2 pointing to the pipe to the *old* piped logger which it is replacing. That is the only place that fd is left open, and it's what is preventing the old piped logger from exiting. The use of the "sh" process is just what is needed to make this problem apparent. Otherwise the rotatelogs will get SIGTERMed by the parent directly on restart. (courtesy of piped_log_cleanup)
Created attachment 18950 [details] proof-of-concept fix This should prevent the orphaned piped loggers, not sure if this is the optimal solution though.
Re-assigning this to httpd.
(In reply to comment #12) > This is a fun bug! Indeed yes > > After the graceful restart, the *new* piped logger for the main error log is > running with its fd 2 pointing to the pipe to the *old* piped logger which it is > replacing. That is the only place that fd is left open, and it's what is > preventing the old piped logger from exiting. Excellent analysis many thanks.
(In reply to comment #13) > Created an attachment (id=18950) [edit] > proof-of-concept fix > > This should prevent the orphaned piped loggers, not sure if this is the optimal > solution though. I am a little worried that nobody reads at the other end of the pipe. What if the logger writes a huge amount of data to its stderr? Wouldn't it get blocked? What about using the stdout of the main process (should be /dev/null) for the logger? I modified your patch to do this. A quick test shows that it works. Comments welcome.
Created attachment 18952 [details] Adjust proof-of-concept fix to use stdout of main process
Hi Rainer, as your specific problem has to be fixed inside of httpd I propose that you open a new report for the enhancement of APR's configure to make SHELL_PATH configurable.
This raises a deeper question though. Should a graceful restart even recycle the logger processes in httpd? Or should they be persistent and continue to run (short circuit the open logs to keep the existing files *and processes* around?)
Created attachment 18954 [details] Updated version of the stdout version of the fix, which fixes a fd leak
Thanks Ruediger, that's definitely a better approach. I've tweaked that, and also realised that this needs to be done *only* for the logger for the main server. Piped loggers for vhosts should continue to inherit stderr from the main server as before. Committed as http://svn.apache.org/viewvc?view=rev&revision=452431
(In reply to comment #21) > Thanks Ruediger, that's definitely a better approach. > > I've tweaked that, and also realised that this needs to be done *only* for the Thanks for tweaking and committing. > logger for the main server. Piped loggers for vhosts should continue to inherit > stderr from the main server as before. Just for my clarification: This works because the error log of the main server is the first logger that gets "restarted" during a graceful / restart, right? So the other loggers inherit a stderr that is the writing side of the pipe to the *restarted* main server error logger, correct?
Yes, exactly correct. ap_open_logs handles it: opens the main server log, does the dup2-to-stderr trick, then loops through all the vhosts and opens their logs.
(In reply to comment #23) > Yes, exactly correct. ap_open_logs handles it: opens the main server log, does > the dup2-to-stderr trick, then loops through all the vhosts and opens their logs. Thanks for clarification. Proposed for backport to 2.2.x as r452457 (http://svn.apache.org/viewvc?view=rev&rev=452457).
The error log still spawns the process using rc = apr_procattr_cmdtype_set(procattr, APR_SHELLCMD_ENV)) == APR_SUCCESS) instead of rc = apr_procattr_cmdtype_set(procattr, APR_PROGRAM_ENV)) == APR_SUCCESS) as documented http://issues.apache.org/bugzilla/show_bug.cgi?id=16761 Typically this is an issue with Windows env . Any chance if the same patch proposal from 16761 can be applied here as well?
Backported to 2.2.4
(In reply to comment #25) > > Typically this is an issue with Windows env . Any chance if the same patch Have you really checked that there is an issue on windows? Anyway the original bug described here is now fixed thus I will close the report again. Feel free to open a new one if there is really a problem on windows with this.