Bug 61979 - some listener thread failures do no wake workers
Summary: some listener thread failures do no wake workers
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mpm_event (show other bugs)
Version: 2.5-HEAD
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: FixedInTrunk
Depends on:
Blocks:
 
Reported: 2018-01-09 16:59 UTC by Eric Covener
Modified: 2018-03-24 12:07 UTC (History)
0 users



Attachments
Listener resource shortage (7.15 KB, patch)
2018-01-09 22:09 UTC, Yann Ylavic
Details | Diff
Listener resource shortage (v2) (10.08 KB, patch)
2018-01-10 15:30 UTC, Yann Ylavic
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Covener 2018-01-09 16:59:17 UTC
it appears some listener thread failures do no wake workers, for example ptrans allocation failures.  

signal_threads(ST_GRACEFUL) expects close_listeners to wake everyone up, but sometimes this is not called within listener_thread() and instead NULL is returned.
Comment 1 Eric Covener 2018-01-09 17:08:46 UTC
maybe just s/return NULL/break for the ptrans failure
Comment 2 Eric Covener 2018-01-09 17:15:57 UTC
(In reply to Eric Covener from comment #1)
> maybe just s/return NULL/break for the ptrans failure

Trickier, we're iterating over the pollset results and doing a graceful shutdown. Currently "return NULL" ignores the other sockets with activity, but break would continue tryign the other sockets w/ likely OOM scenario.

break/goto to the outer for(;;) listener loop would at least wake up the
workers, but would still ignore any addl socket activity (later in the result loop or subsequent during the graceful shutdown)
Comment 3 Yann Ylavic 2018-01-09 22:09:00 UTC
Created attachment 35665 [details]
Listener resource shortage

Maybe something like this could work.

The patch moves the disabling (poll) of the listen sockets from close_listeners to wakeup_listener, where listener_may_exit is set already along with anything to start the listener graceful stop, but mainly a callee of signal_threads.
Thus after signal_threads is called, the listener can continue its loop(s) without creating new connections or pools (i.e. maintenance job only, though the workers may still encounter OOM...).

In case of allocation failure, the resource_shortage flag is also set to avoid creating a new child too quickly, that'll possibly help recovery.

Finally the patch also changes the init_pollset failure to be ungraceful, there is nothing to wait for at this point IMO, with resource_shortage=1 still.

Note: I had to make ap_child_slot global to avoid forwarding the local process_slot var/arg too much...

Wouldn't this "work"?
Comment 4 Yann Ylavic 2018-01-10 15:30:55 UTC
Created attachment 35671 [details]
Listener resource shortage (v2)

Same with a better handling of {en,dis}able_listensocks() concurrency (atomics), since disable_listensocks() can now be called by both the listener and workers via signal_threads().
Comment 5 Eric Covener 2018-01-18 19:23:28 UTC
(In reply to Yann Ylavic from comment #4)
> Created attachment 35671 [details]
> Listener resource shortage (v2)
> 
> Same with a better handling of {en,dis}able_listensocks() concurrency
> (atomics), since disable_listensocks() can now be called by both the
> listener and workers via signal_threads().

Thanks Yann -- I was only able to test the guts w/ simulate ptrans error,
but it looked good.
Comment 6 Yann Ylavic 2018-01-19 01:12:25 UTC
Committed in r1821558, and proposed for backport (incremental).
Comment 7 Christophe JAILLET 2018-03-24 12:07:29 UTC
Backported in 2.4.x in r1823644

This is part of 2.4.30