Bug 65832 - purge keepalive connections before 0 idlers?
Summary: purge keepalive connections before 0 idlers?
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mpm_event (show other bugs)
Version: 2.5-HEAD
Hardware: PC Mac OS X 10.1
: P2 enhancement (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-21 11:18 UTC by Eric Covener
Modified: 2022-01-21 16:37 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Covener 2022-01-21 11:18:34 UTC
When event hits zero idlers, a keepalive connection that becomes active can be abruptly closed in response to the new request being readable.


                    /* If we don't get a worker immediately (nonblocking), we
                     * close the connection; the client can re-connect to a
                     * different process for keepalive, and for lingering close
                     * the connection will be shutdown so the choice is to favor
                     * incoming/alive connections.
                     */
                    get_worker(&have_idle_worker, blocking,
                               &workers_were_busy);
                    if (!have_idle_worker) {
                        shutdown_connection(cs);
                    }

I don't know if normal browser requests will retry idempotent requests transparently, but something like XHR will require error handling to do it.

It seems like if we are "approaching" zero idlers (and know that we have an AsyncRequestWorkerFactor > 1) it would be better to shed some or all keepalive connections earlier, before we have to shutdown ones that have just been used.
Comment 1 Ruediger Pluem 2022-01-21 13:03:10 UTC
Confused now. My understanding is that only connections in keepalive will be stopped not ones that are in use. A client using a keepalive connection should be prepared that it gets closed. This cannot be handled gracefully always as non idempotent request if sent cannot be just resent. How would closing keepalives earlier improve the situation?
Comment 2 Eric Covener 2022-01-21 13:11:30 UTC
(In reply to Ruediger Pluem from comment #1)
> Confused now. My understanding is that only connections in keepalive will be
> stopped not ones that are in use. A client using a keepalive connection
> should be prepared that it gets closed. This cannot be handled gracefully
> always as non idempotent request if sent cannot be just resent. How would
> closing keepalives earlier improve the situation?

The one being closed in the snippet is closed shortly after the client has reused it. IOW It's not simultaneous or a race, we're responding to activity on the socket.

Closing them in advance (when the process has zero idlers) would be susceptible to races / simultaneous close (of more sockets, assuming the server was going to recover in time) but this kind of race is present on every timeout.
Comment 3 Yann Ylavic 2022-01-21 15:22:38 UTC
I think that get_worker() should never block, it does not in this case and there is no other choice than closing the connection (which is what we should address IMHO).
We could try to be preventive here and kill kept-alive connections earlier (when num_idlers < some threshold or alike?) as you propose, but this does not address the blocking cases for WRITE_COMPLETION and PT_ACCEPT. If we had a solution for these I think that the keepalive connection becoming active case is not very different than the PT_ACCEPT case.

What we could do possibly in the listener when get_worker() (all nonblocking) finds no worker is to:
1. disable_listensocks() so as not to make the situation worse
2. push those connections/sockets in a pending_q which workers would pop before idling (like the defer_linger_chain)
3. maintain the pending_q with the other queues to kill the connections at some point if no worker comes up soon enough (pathologically?)
4. teach should_enable_listensocks() about the pending_q to not recover too early

This sounds (to me) like a more robust alternative, good/bad/off-topic idea?
Comment 4 Eric Covener 2022-01-21 16:37:04 UTC
> This sounds (to me) like a more robust alternative, good/bad/off-topic idea?

I think it's a good idea. But I wonder if still we wouldn't want the preventive aspect, otherwise we encourage stuff to be delayed in the pending_q which might be tricky for debugging/observability.

Maybe preventive can be done without actively flushing idle keepalive connections, for example with a low but non-zero supply of idlers considered in connections_above_limit() to avoid adding fuel to the fire.  But i am also not excited about driving more flipping of disable_listensockets