Bug 54502

Summary: Apache deadlock on epoll_ctl error (1000 process limit)
Product: Apache httpd-2 Reporter: Etienne CHAMPETIER <etienne.champetier>
Component: mpm_preforkAssignee: Apache HTTPD Bugs Mailing List <bugs>
Severity: normal CC: etienne.champetier
Priority: P2 Keywords: MassUpdate
Version: 2.2.15   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Etienne CHAMPETIER 2013-01-29 10:47:03 UTC

With kernel 3.2.9 (included) to 3.2.17 (excluded) there was an arbitrary limitation on epoll path (1000) which cause apache to deadlock when having 1001+ process. The first patch is http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=28d82dc1c4edbc352129f97f4ca22624d1fe61de, which put the limit to 1000, and the second patch is http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=93dc6107a76daed81c07f50215fa6ae77691634f, which doesn't limit epoll for non-nested path (so apache work again).

This limitation show a bug in apache which lead to a deadlock: if a httpd process get an error when doing epoll_ctl, it continue to run, and if he get the accept_mutex, epoll_wait will return 0 because epoll_ctl just failed, and apache will be blocked.
Here follow a small strace of the 1001 process:
-epoll_create1(O_CLOEXEC)    = 39
-epoll_ctl(39, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=1010443880, u64=140193037952616}}) = -1 EINVAL (Invalid argument)
-epoll_ctl(39, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=1010443880, u64=140193037952616}}) = -1 EINVAL (Invalid argument)
-semop(14385470, {{0, -1, SEM_UNDO}}, 1 <unfinished ...>
<... semop resumed> )       = 0
-epoll_wait(39,  <unfinished ...>
<... epoll_wait resumed> {}, 2, 10000) = 0

To reproduce:
-get a kernel with the limitation (3.2.9 to 3.2.16 for the 3.2 branch)
-configure httpd to listen on at least 2 ports (80 and 81) so that it use accept_mutex
-configure httpd to "StartServers 1001"
-start it with strace -f /etc/init.d/httpd start > ~/debug.log
-make a lot of request until it stop responding

The httpd process that fail to epoll_ctl should kill it self or retry epoll_ctl.

This bug was uncovered on a centos 6.3 with httpd 2.2.15 and a 3.2.13 kernel, but i've read other thread speaking of the 1000 httpd process limit on ubuntu...
https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1028470 (so still present in 2.2.22 for sure)

I've put normal severity because by updating the kernel apache work again.
Comment 1 Mike Rumph 2013-01-30 22:37:17 UTC
In the latest Apache 2.2.x code,the child_main() function in prefork.c is not checking the status code after calling apr_pollset_add().

Here is an excerpt:

    for (lr = ap_listeners, i = num_listensocks; i--; lr = lr->next) {
        apr_pollfd_t pfd = { 0 };

        pfd.desc_type = APR_POLL_SOCKET;
        pfd.desc.s = lr->sd;
        pfd.reqevents = APR_POLLIN;
        pfd.client_data = lr;

        /* ### check the status */
        (void) apr_pollset_add(pollset, &pfd);

This code has been improved in Apache 2.4.x.
svn blame shows the following revisions:

101799     gstein     for (lr = ap_listeners, i = num_listensocks; i--; lr = lr->next) {
101799     gstein         apr_pollfd_t pfd = { 0 };
101799     gstein 
101799     gstein         pfd.desc_type = APR_POLL_SOCKET;
101799     gstein         pfd.desc.s = lr->sd;
101799     gstein         pfd.reqevents = APR_POLLIN;
101799     gstein         pfd.client_data = lr;
101799     gstein 
804764     rpluem         status = apr_pollset_add(pollset, &pfd);
804764     rpluem         if (status != APR_SUCCESS) {
1393382     jorton             /* If the child processed a SIGWINCH before setting up the
1393382     jorton              * pollset, this error path is expected and harmless,
1393382     jorton              * since the listener fd was already closed; so don't
1393382     jorton              * pollute the logs in that case. */
1393382     jorton             if (!die_now) {
1393382     jorton                 ap_log_error(APLOG_MARK, APLOG_EMERG, status, ap_server_conf, APLOGNO(00157)
1393382     jorton                              "Couldn't add listener to pollset; check system or user limits");
1393382     jorton                 clean_child_exit(APEXIT_CHILDSICK);
1393382     jorton             }
1393382     jorton             clean_child_exit(0);
804764     rpluem         }
757853    trawick 
757853    trawick         lr->accept_func = ap_unixd_accept;
 96102        rbb     }
Comment 2 William A. Rowe Jr. 2018-11-07 21:09:13 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.