Bug 45497 - Scoreboard slot leaked using MaxRequestsPerChild and thread(s) do not exit gracefully
Summary: Scoreboard slot leaked using MaxRequestsPerChild and thread(s) do not exit gr...
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mpm_winnt (show other bugs)
Version: 2.2.9
Hardware: PC Windows Vista
: P3 normal with 7 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
Keywords: MassUpdate
Depends on:
Reported: 2008-07-29 09:55 UTC by Jason Riffel
Modified: 2018-11-07 21:09 UTC (History)
0 users

Patch to make win32 MaxRequestsPerChild restart processes correctly after a thread terminate timeout (7.06 KB, patch)
2008-07-29 09:55 UTC, Jason Riffel
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Riffel 2008-07-29 09:55:10 UTC
Created attachment 22326 [details]
Patch to make win32 MaxRequestsPerChild restart processes correctly after a thread terminate timeout

When using MaxRequestsPerChild to force Apache to cycle its process after a certain number of requests - if a child thread is serving a response which takes more than Timeout (Default of 300 seconds) to terminate, it will be killed with ThreadTerminate.

However, the slot(s) for the threads terminated the hard way in the scoreboard are leaked (Not marked as dead).  The child which has been started to replace the terminating child will not be able to start all 'ap_threads_per_child' threads because of the leaked scoreboard slot, and therefore never leave the starting thread state.  The effect of this is that the hung child thread stops cycling thus not honoring the MaxRequestsPerChild setting.

After review of the code I believe the intention was to use 'apr_hash_make' to setup a hash of child thread handles and scoreboard slots used on hard termination to find the slot of the aborted thread to mark it dead.  However, in practice this hash system fails to reproduce the slot, returns NULL, and finally throws an exception.

I tried to debug why this was happening and was not successful.  In general, a review of the code shows a rather sloppy implementation using global variables which are lazy initialized in the child thread, never really closed, shared with worker threads by the global scope, and never really cleaned up well.  Further, the fact that the child is restarting probably indicates a lot of handle leaking since they are never closed.  I believe the problem is related to the global scope of the 'pchild' context used with 'apr_pool_create' which is definitely in contention between the child threads which exist at the same time during the shutdown of one child and the start of the new one.  All in all, I think this whole file could use rewritten but is beyond my capabilities at this time.

As a solution/work around/hack I removed the use the apr hash lookup for mapping thread handles to slots, which was only used in the main child thread.  I replaced it simpler mapping using the variable sb_assignments, identical to the child_handles array, except it is never truncated as threads die.  Then on hard termination sb_assignments is used to find the handle of the thread being killed.  The offset of the thread handle in sb_assignments is the same offset as the scoreboard location for the thread - and thus can be cleaned up reliably.

This has been tested by myself, and is working in production - properly cycling threads and not throwing exceptions, even in the case of worker threads timing out and being killed the hard way.

Patch is attached - Sorry for the lengthy description.
Comment 1 Jo Schulze 2009-08-12 03:56:43 UTC
There have been 4 new releases of Apache 2.2 in the last year, but this bug still exists. There isn't even a discussion about the patch.

This is IMHO a very grave bug since it leaves apache unusable for production deployment if eg. mod_php is used and there are lots of HTTP requests.

With the prefork MPM, memleaks are not a big issue, since the worker process will be terminated sometime, thus the memory be freed. This doesn't happen with the winnt MPM because after the request has been processed, only the thread will terminate, thus memleake accumulate over time. I have seen apache processes growing over 1.5 GB!

Limiting MaxRequestsPerChild would be a workaround, but this bug prevents its use because there is a good chance that the worker process won't be restarted.

Ok you could blame mod_php (or any other apache module) for its memleaks, however that's not the point:

1) We all know there will be memleaks here and then 
2) The Apache directive MaxRequestsPerChild does not work as documented with the winnt MPM
Comment 2 Dan Poirier 2009-08-12 04:54:06 UTC
I don't follow this new comment - the original problem description said nothing about memory leaks in mod_php.  Was it supposed to go on a different bug report?
Comment 3 Jo Schulze 2009-10-15 09:09:26 UTC
No, it was not supposed to go on a different bug report.

Mentioning mod_php was only to explain the severity of this bug, using a real-life setup.
Comment 4 Jason Riffel 2009-10-15 09:25:40 UTC
I am also surprised that this has not been considered for inclusion in the official release.  If you run Apache on Windows and use modules, some times those modules leak memory.  Sure, I would like to fix those modules but that is not always practical and that is why Apache implements MaxRequestsPerChild to provide a work around to this wide spread issue by recycling processes to free leaked memory transparently.

The current implementation in production for Win32 works as long as the threads terminate gracefully - however, if a child thread becomes locked or takes too long to shut down the thread is terminated but its thread slot is never recovered.

The lack of recovery of all thread slots means that the next process never leaves the startup state because it requires 100% of its threads to be started.  It will continue to serve requests while in the start up state but it will never recyle processes again thus allowing the memory leaks to accumulate to a point of failure.

IMHO - On Win32 if you are using Apache modules and are using MaxRequestsPerChild your system is at risk of failure due to this defect.  I have attached a patch to fix this which has been running flawlessly in our environment for well over a year now.

It is possible that this is a pattern which is replicated in other processor modules - I did not check that myself.
Comment 5 wonlay 2009-12-16 06:37:39 UTC
Does this bug really show only on winnt platforms?
I'm encountering a similar problem on a Debian Linux system with the worker mpm.

some children died properly after MaxRequestsPerChild, but there are some wont die until you give it a kill -9(kill cannot stop it). but the new children are started.
so, after a period of time, there will be a lot of process consuming lots of memory.

and someone have the same problem on AIX, and has post it on the userslist:
Comment 6 matty 2010-05-24 08:28:32 UTC

Can any one tell me how to reroduce this problem on Unix system?

It would be helpful if someone can tell me the exacts stpes followed to reproduce the problem.

Comment 7 Jeff Trawick 2010-05-24 11:02:26 UTC
>Can any one tell me how to reroduce this problem on Unix system?
No; this bug report is for a specific problem in Windows-specific code.

Others may have problem symptoms related to graceful restart/MaxRequestsPerChild, but unless it is this same issue (scoreboard slot leaked) on Windows then they should look for a better match in the bug db or, failing that, open a new one.
Comment 8 William A. Rowe Jr. 2018-11-07 21:09:21 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.