Bug 63328 - Apparent race condition causes undeserved 500 / connection reset by peer errors
Summary: Apparent race condition causes undeserved 500 / connection reset by peer errors
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_fcgid (show other bugs)
Version: 2.4.25
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
Depends on:
Reported: 2019-04-09 03:15 UTC by tlhackque
Modified: 2019-04-09 03:15 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description tlhackque 2019-04-09 03:15:40 UTC
There appears to be a race condition in mod_fcgid.

Here's what I see:

A Perl (CGI::Fast) application decides (actually is told) to exit.

In response to a POST, it issues a (303) redirect to its status page and exit()s - expecting that a new instance will be started to service the redirect.

HTTPD reports:

[Sun Apr 07 13:24:13.991499 2019] [fcgid:warn] [pid 17236] (104)Connection reset by peer: [client ...] mod_
fcgid: error reading data from FastCGI server, referer: ...
[Sun Apr 07 13:24:13.994622 2019] [core:error] [pid 17236] [...] End of script output before headers
: notice.fcgi, referer: https://.../fastbrowser

Modsec's helpful audit log says:
Apache-Error: [file "fcgid_proc_unix.c"] [line 627] [level 4] [status 104] mod_fcgid: error reading data from FastCGI server
Apache-Error: [file "util_script.c"] [line 500] [level 3] %s: %s
Apache-Handler: fcgid-script

And the web browser gets a 500 error from httpd.

What's interesting is that if the server generates a 200 response, the error doesn't happen.

Further, if the application generates the 303 without doing anything else, the 500 isn't generated; the redirect works.

The crash seems to be timing sensitive.  My working theory is that:

The (Perl application)server exits, closing the FCGI server connection.  If a 200 is provided before the exit, all goes as expected.  A redirect takes time - the browser sends the GET some time later.  If it's much later, it hits a new server instance.  If it's at just the right time, it starts to get sent to the (now exiting) server; the connection close is noticed, and the request is lost to the 500.

This reproduces consistently with a real application.  I've tried to cut it down to a reproducer, but failed.

I tried various ways to prevent this - including sending 'LastCall' - none work in the real application.

httpd 2.4.25, mod_fcgid 2.3.9.  CGI::Fast 2.15 FCGI 0.78

Here is my attempt at a small reproducer.  While I haven't found the right magic to reproduce the problem, it clearly illustrates the failing application structure. (For simplicity, this is all done with GET, but that shouldn't matter.)


Setup shutdown.fcgi to run as a script, as, say /test.fcgi

Browse to /test.fcgi - hit refresh, you will see the Requests served counter increment.

Now Browse to /test.fcgi/shutdown - the server issues a redirect and exits.  You will see that the response has a new PID, the requests served goes back to 1, and the URL in the address bar is no /test.fcgi/LoopExit.

Or (change the if(01)), it invokes LastCall - which tells the library explicitly not to send more requests - then falls out of the loop synchronously to exit.  The GET invoked by the redirect should start a new server; instead you get the 500 error.  In the real application, the 500 errors are 100% reproducible.  I haven't found the right timing to make the reproducer fail - and if I did, I suspect that timing would not be portable to other machines.

What I expect is that once the server exits (and especially with LastCall invoked), mod_fcgid will pass incoming requests to another server instance.  Starting a new one if necessary.  (In the real app, it is guaranteed that there is only one server at this time.)  If one can't be found/started, the response should be something like "no servers available", not "Internal error" with logging that blames the server.

Here's the (very small) almost-reproducer.  The structure is the same as the real application.


use warnings;
use strict;

require CGI::Fast;

my $n;
my $q;
while( ( $q = CGI::Fast->new ) ) {
    # Variable work here

    if( $ENV{PATH_INFO} eq '/shutdown' ) {
        if( 01 ) {
            print( <<"xx" );
Status: 303 See other
Location: /test.fcgi/LoopExit

Server $$ shutdown after $n requests
        no warnings 'once';

    print( <<"XX" );
Status: 200 OK
Content-Type: text/plain

Server $$, Requests served: $n
# Here when CGI::Fast returns undef to shut down.
print STDERR ( "ERR: Server $$ shutdown after $n requests\n" ) if( 0 );


Finally, my work around is to send a buffer page - it waits 15 seconds and then does a javascript redirect.  This works every time  - but is a horrible user experience...