Bug 58243 - rotatelogs goes infinite at startup
Summary: rotatelogs goes infinite at startup
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: support (show other bugs)
Version: 2.4.12
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-14 00:07 UTC by Ryan Guilbault
Modified: 2015-08-14 00:09 UTC (History)
0 users



Attachments
rotatelogs cpu spike process tress (39.60 KB, image/png)
2015-08-14 00:07 UTC, Ryan Guilbault
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Guilbault 2015-08-14 00:07:19 UTC
Created attachment 33000 [details]
rotatelogs cpu spike process tress

originally reported at ApacheLounge: http://www.apachelounge.com/viewtopic.php?t=6707

when using the -p parameter, e.g.:

ErrorLog "|bin/rotatelogs.exe -p maint/MaintainLogs.bat -l logs/error.%Y%m%d.log 86400"

CustomLog "|bin/rotatelogs.exe -p maint/MaintainLogs.bat -l logs/access.%Y%m%d.log 86400" access 

one instance of rotatelogs.exe will spike the CPU, caught in an infinite loop. after a bunch of whittling down, I've identified the cause of my issue resides in rotatelogs.c::post_rotate, here:

    /* Collect any zombies from a previous run, but don't wait. */
    while (apr_proc_wait_all_procs(&proc, NULL, NULL, APR_NOWAIT, pool) == APR_CHILD_DONE)
        /* noop */;

looking at the implementation of said function, we see:

        if (waithow != APR_WAIT) {
            if (nChilds && nChilds == nActive) {
                /* All child processes are running */
                rv = APR_CHILD_NOTDONE;
                proc->pid = -1;
            }
            else {
                /* proc->pid contains the pid of the
                 * exited processes
                 */
                rv = APR_CHILD_DONE;
            }
        }
        if (nActive == 0) {
            rv = APR_CHILD_DONE;
            proc->pid = -1;
        }
        return rv;

I would expect nActive == 0 on an initial startup, so I am confused why we're checking == APR_CHILD_DONE instead of != APR_CHILD_DONE, i.e. loop again if children exist.

with the original code, I got something like this:

httpd 
  |- rotatelogs (error) *cpu spike*
  |- rotatelogs (access) no spike
  |- httpd
      |- rotatelogs (error) no spike

switching the code to != check yields:

httpd 
  |- rotatelogs (error) no spike
  |- rotatelogs (access) no spike
  |- httpd
      |- rotatelogs (error) *cpu spike*
      |- rotatelogs (access) no spike

so there is still something amiss. I did trap out to verify that the return value when using != was APR_CHILD_NOTDONE and not some OS error.

I am not setup to compile apr runtime so I cannot further trap out values for nChilds and nActive, but the only way we should be able to get APR_CHILD_NOTDONE would be if nChilds > 0.

I've attached a screenshot of the process tree and it is perhaps worth noting that there exists a conhost child process.

we don't appear to do anything about the list of processes so I'm not entirely sure what the goal of this call to apr_proc_wait_all_procs is aiming to do, i.e. why it's in a while{} vs. an if{} or some such thing.
Comment 1 Ryan Guilbault 2015-08-14 00:09:09 UTC
note: this was first notice on a Windows 2008 R2 server, then subsequently a Windows 7 PC. originally, it was not a problem on my Windows 8.1 PC, but I can now reproduce here as well.