Bug 41743

Summary: Graceful restarts don't effect children in keepalive until they exit
Product: Apache httpd-2 Reporter: Frank T. Lofaro Jr. <ftlofaro>
Component: mpm_preforkAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED FIXED    
Severity: normal CC: andrew.punch, apache, lavr
Priority: P2 Keywords: PatchAvailable
Version: 2.2.4   
Target Milestone: ---   
Hardware: Sun   
OS: Solaris   
Attachments: keepalive.py - Python script to keep an apache process alive indefinitely by using the keepalive issue
Patch to solve this bug - based on head of 2.2.x subversion branch
Patch to solve this bug - based on head of 2.2.x subversion branch, including correctly updating mpm_state
Patch to solve this bug - based on head of 2.2.x subversion branch, including correctly updating mpm_state
Patch to solve this bug - based on head of trunk (r1066631), including correctly updating mpm_state
tweaked patch for 2.2.x

Description Frank T. Lofaro Jr. 2007-03-01 15:20:07 UTC
If I do a graceful restart, any children in keepalive will not exit until they
exit due to keepalive timeout/maxrequests or are killed.

I tested it, and a child in keepalive will keep running after the graceful until
killed manually, or the keep alive timeout or maxrequests limits are hit.

Graceful should avoid killing a current request, but keepalive connections may
be killed at any time when inactive; it should kill a child when it is not
currently servicing a request.

If I make a change to httpd.conf and do a graceful, and test in my browser, I
get the behavior specified by the old version of httpd.conf until the child
exits - this makes debugging difficult and means I need to wait an excessive
amount of time before changes take effect. Depending on what I am doing on the
server - I might have to wait before proceeding to prevent users from getting
bad/missing content, etc.
Comment 1 Dmytro Fedonin 2007-06-13 08:25:07 UTC
(In reply to comment #0)

> Graceful should avoid killing a current request, but keepalive connections may
> be killed at any time when inactive; it should kill a child when it is not
> currently servicing a request.
> 
It is not true. Some modules keep their state with connection. See bug# 41109
for instance.
BTW, if You do graceful on production You need to wait any way.
Comment 2 Eric Covener 2011-01-24 21:19:37 UTC
The prefork version of ap_graceful_stop_signalled is always false:

int ap_graceful_stop_signalled(void)
{
    /* not ever called anymore... */
    return 0;
}

Whereas worker overloads it to mean any kind of graceful exit is happening.  The core in 2.2.x uses this callback to determine if it should  do keepalive before committing the headers.

This appears to be resolved in trunk by using another API.
Comment 3 Eric Covener 2011-01-25 08:35:12 UTC
via users@ may not be fixed in trunk prefork, needs testing
Comment 4 Andrew 2011-01-26 20:02:18 UTC
Steps are:
1. Configure apache to use:
   - prefork mpm
   - KeepAlive On
   - KeepAliveTimeout 60
   - MaxKeepAliveRequests 0

2. Save ps output. e.g: ( while true; do date; ps -Hfg `cat httpd.pid`; sleep 1 ; done ) > ps.log

3. Run script: keepalive.py <hostname>

4. Send USR1 to the parent process e.g. sudo kill -USR1 `cat
httpd.pid`; date

5. Observe in ps.log that all the child processes exit, except for one. New
child process will start

6. sudo netstat -tp will indicate that the python script is connected the the
one child process that did not exit

7. Leave the system for 15 minutes or longer

8. The one child process will still not exit (check ps.log and netstat -tp)

9. Stop keepalive.py e.g. using ctrl+c

10. Observe that the one child process will exit once keepalive.py disconnects
Comment 5 Andrew 2011-01-26 20:05:42 UTC
Created attachment 26556 [details]
keepalive.py - Python script to keep an apache process alive indefinitely by using the keepalive issue
Comment 6 Andrew 2011-01-26 20:07:50 UTC
Observed on Redhat Enterprise Linux 5.5
Comment 7 Andrew 2011-01-30 23:03:40 UTC
ap_graceful_stop_signalled() in http_core.c still calls ap_graceful_stop_signalled() in 2.2.X trunk.
Comment 8 Andrew 2011-01-30 23:04:08 UTC
ap_process_http_async_connection() in http_core.c still calls ap_graceful_stop_signalled() in 2.2.X trunk.
Comment 9 Andrew 2011-01-31 20:08:10 UTC
Created attachment 26584 [details]
Patch to solve this bug - based on head of 2.2.x subversion branch
Comment 10 Joe Orton 2011-02-01 11:27:47 UTC
The fix for this on the trunk was r645434, which replaced use of ap_graceful_stop_signalled() with ap_mpm_query().

This does look insufficient to fix the bug for prefork, since the prefork signal handler does not change mpm_state (prefork.c:sig_term).
Comment 11 Andrew 2011-02-01 18:57:24 UTC
Which of these alternatives do you prefer:
1. Move ahead with my current patch
2. I modify prefork.c so the signal handler changes mpm_state
Comment 12 Andrew 2011-02-02 20:45:50 UTC
Created attachment 26598 [details]
Patch to solve this bug - based on head of 2.2.x subversion branch, including correctly updating mpm_state
Comment 13 Andrew 2011-02-02 20:51:06 UTC
Created attachment 26599 [details]
Patch to solve this bug - based on head of 2.2.x subversion branch, including correctly updating mpm_state
Comment 14 Andrew 2011-02-02 20:52:11 UTC
Created attachment 26600 [details]
Patch to solve this bug - based on head of trunk (r1066631), including correctly updating mpm_state
Comment 15 Andrew 2011-02-02 21:11:01 UTC
The two patches that have been attached to this bug use the approaches outlined below.

TRUNK
=====

1. Set the mpm_state to AP_MPMQ_STOPPING


2.2.x BRANCH
============

1. Set the mpm_state to AP_MPMQ_STOPPING
2. Return the correct value from ap_graceful_stop_signalled()

I considered rewriting http_core.c and http_protocol.c in 2.2.x to use ap_mpm_query(). However third party modules may use ap_graceful_stop_signalled(), so it needed to be fixed anyway, and I didn't want to risk breaking code in http_core and http_protocol that was working well. 

*All* other mpms support ap_graceful_stop_signalled() in 2.2.x, except Netware.


OTHER MPMS
==========

I had a quick check through other 2.2.x MPMs and noticed that the Netware MPM appears to have the same issue as prefork. I am not a netware expert, so it might be good to have someone check that out.
Comment 16 Joe Orton 2011-02-08 08:48:32 UTC
Thanks a lot for the patches, Andrew.

I tweaked the trunk patch slightly -

 static void just_die(int sig)
 {
+    mpm_state = AP_MPMQ_STOPPING;
     clean_child_exit(0);

was redundant since clean_child_exit sets mpm_state anyway.

Committed to trunk in r1068389
Comment 17 Joe Orton 2011-02-08 10:33:04 UTC
Created attachment 26623 [details]
tweaked patch for 2.2.x

Slightly tweaked version of 2.2.x patch for review.
Comment 18 Andrew 2011-02-08 23:10:56 UTC
My quick testing confirms the latest trunk (r1068671), which includes the patch. fixes the problem.

As mentioned by Joe the 2.2.x branch patch is still waiting.
Comment 19 Joe Orton 2011-02-10 11:19:10 UTC
2.2.x patch committed in r1069428
Comment 20 Andrew 2011-02-14 21:50:06 UTC
I can confirm that the 2.2.x patch is now in the 2.2.x and my testing indicates that the 2.2.x branch no longer has the problem.

Thanks Joe and thanks to my colleague James "Gerbs" Byrne who diagnosed this problem.
Comment 21 Andrew 2011-02-15 22:31:04 UTC
*** Bug 38994 has been marked as a duplicate of this bug. ***
Comment 22 Eric Covener 2014-01-19 19:10:58 UTC
*** Bug 47635 has been marked as a duplicate of this bug. ***