Bug 50902

Summary: on major load on the server, poll() hangs
Product: Apache httpd-2 Reporter: tal.yalon
Component: mpm_workerAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED LATER    
Severity: major CC: tal.yalon
Priority: P2 Keywords: MassUpdate
Version: 2.2.16   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description tal.yalon 2011-03-09 09:34:35 UTC
Hi there,

Thought this bug is related to bug #50247, but as per Eric's suggestion I'm opening a new bug.

The httpd in question is 2.2.16 running on EC2.

We see that after ~10 minutes of significant load, some of the worker processes get hang on poll().

By hang I mean that poll() call didn't finish for more than 30 seconds.

This is the stacktrace from one of the processes that are in this state:

#0  0x00007f76fa70c748 in poll () from /lib64/libc.so.6
#1  0x00007f76fabe2822 in apr_wait_for_io_or_timeout () from
/usr/lib64/libapr-1.so.0
#2  0x00007f76fabdd2da in apr_socket_recv () from /usr/lib64/libapr-1.so.0
#3  0x00007f76fc0dfddd in ap_lingering_close ()
#4  0x00007f76fc0eaa7e in ?? ()
#5  0x00007f76fc0ead1a in ?? ()
#6  0x00007f76fc0eadd0 in ?? ()
#7  0x00007f76fc0eb908 in ap_mpm_run ()
#8  0x00007f76fc0c54fb in main ()

Please let me know if there's any more information I can provide - this problem
is obviously a major concern to us.

Thanks,
Tal
Comment 1 Jeff Trawick 2011-03-09 10:06:52 UTC
We have to see the parameters passed to poll() (perhaps from strace, perhaps from a backtrace on a debug build, perhaps from somebody understanding the assembly code with the same build you got the backtraces from).

The timeout is hard-coded to 2 seconds, so it isn't supposed to block longer than that.
A FIN has already been sent on the connection, so poll() could wake up on socket activity before "too long."

If server-status is enabled, a thread with this backtrace should show up as "C" (closing).
Comment 2 Ruediger Pluem 2011-03-09 11:47:33 UTC
(In reply to comment #1)
> We have to see the parameters passed to poll() (perhaps from strace, perhaps
> from a backtrace on a debug build, perhaps from somebody understanding the
> assembly code with the same build you got the backtraces from).
> 
> The timeout is hard-coded to 2 seconds, so it isn't supposed to block longer
> than that.

If the remote partner (in this case the client) does not close the socket we can return to the poll call over and over again for the next 30 seconds. We only stay in a single poll call for at max 2 seconds, but may have up to 15 calls to poll.
Comment 3 Jeff Trawick 2011-03-09 12:01:57 UTC
thanks, Ruediger
we need to distinguish between blocking in a single poll() call for a long time vs. repeatedly calling poll(); strace would make that clear; gdb could make that clear as well if used properly
Comment 4 William A. Rowe Jr. 2018-11-07 21:08:50 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.