Bug 23528

Summary: httpd hangs for 300 sec with apr_bucket_read failed.
Product: Apache httpd-2 Reporter: Alan Lirette <alirette>
Component: mod_cgiAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED WONTFIX    
Severity: major    
Priority: P3    
Version: 2.0.52   
Target Milestone: ---   
Hardware: PC   
OS: Windows 2000   

Description Alan Lirette 2003-09-30 18:12:58 UTC
Support, 
 The browser hangs for 300sec.waiting for httpd to respond and ends up timing 
out.
When I decrease the amount of content I dont see this problem. I originally had 
this
with 2.0.40. Download and build the latest and get the same hang.  Heres the 
error_log:

-------------------- Start Error Log ---------------- 

Thu Sep 25 15:28:31 2003] [notice] Apache/2.0.47 (Unix) configured -- resuming 
normal operations

Thu Sep 25 15:28:31 2003] [info] Server built: Sep 25 2003 14:17:17
[Thu Sep 25 15:28:31 2003] 
[debug] prefork.c(1037): AcceptMutex: sysvsem (default: sysvsem)

[Thu Sep 25 15:37:41 2003] [error] [client 192.168.8.158] (70007)The timeout 
specified has expired: ap_content_length_filter: apr_bucket_read() failed, 
referer: http://192.168.8.158/cgi-bin/editBuildTags-branch3.cgi

-------------------- End -----------------------------
Comment 1 Jeff Trawick 2003-09-30 18:21:01 UTC
let me guess... your cgi script is writing a bunch of stuff to stderr?  change
that and see if it still occurs
Comment 2 Alan Lirette 2003-09-30 21:32:46 UTC
Im not writing anything to stderr, only to stdout with print statements and
writing/reading files. 
Comment 3 Alan Lirette 2003-10-02 18:29:50 UTC
Jeff, Any other ideas?  Thanks.
Comment 4 Jeff Trawick 2003-10-06 09:01:16 UTC
If it wasn't a CGI request, I dunno :(  Can you reproduce it with a certain type
of request?  Have you looked at the access log to see what type of request it is?
Comment 5 Alan Lirette 2003-10-06 11:53:48 UTC
Heres the access.log entry...

localhost.localdomain - - [03/Oct/2003:15:16:12 -0400] "GET /cgi-
bin/editBuildTags-branch4.cgi HTTP/1.1" 200 6265 "-" "Mozilla/5.0 (X11; U; 
Linux i686; en-US; rv:1.0.1) Gecko/20021003"
localhost.localdomain - - [03/Oct/2003:15:16:42 -0400] "POST /cgi-
bin/editBuildTags-branch4.cgi HTTP/1.1" 200 

I can execute the Perl script outside of httpd without any errors, if thats 
what your asking me.
Comment 6 Jeff Trawick 2003-10-10 12:35:38 UTC
Reproduce the problem, and during the 300 second hang, attach to the CGI process
with strace and see what syscall it is in.

If you have a simple testcase I can use to reproduce it on my system, please
post it.
Comment 7 Alan Lirette 2003-10-14 17:37:19 UTC
Unfortunately I cant get a simple testcase cuase everything I try works. The 
application itself will work 10-15% of the time.  Is there a debugger I need to 
attatch to the process when it hangs? Im not familar how to do this?
Comment 8 Co-Advisor 2003-12-12 17:31:09 UTC
This bug is probably the same as bug #20866.
Comment 9 Alan Lirette 2004-01-05 21:11:30 UTC
Has this  ever been resolved?
Comment 10 Jeff Trawick 2004-01-05 21:37:48 UTC
need more information to resolve it...  we don't know what code is hanging... 
info requested on 10/10 is expected to help...
Comment 11 fil 2004-02-04 20:35:20 UTC
I have several apache 2.0.48 servers exhibiting the
"ap_content_length_filter:  apr_bucket_read() failed" problem with little
bit of poking around I found that this happens avery time apache is
waiting for a large amount of data to be returned to it from the cgi
script (eg. large and time consuming database read/write or extensive
writing to stderr). This happens on both 32 and 64 bit (Linux and Solaris)
platform but only in the preforked mpm model, the worker (threaded) mpm
seems to handle this problem better.

I did couple of straces to my cgi script and it always gets stuck in the
write system call and at the same time the corresponding http server is
stuck in poll system call. this lasts for (5 min) 'Timeout 300' and the
error is displayed inside the error_log file

(70007)The timeout specified has expired: ap_content_length_filter:
apr_bucket_read() failed

and the cgi script either dies or becomes a zombie process.

If I am correct error is coming from httpd-2.0.48/server/protocol.c file,
(snippet included below) and has to do with BLOCK_READ vs. NONBLOCK_READ.
I might be totally wrong about this guess but....

Hope you have a fix for this bug or maybe an idea how I could work around
it.

thanks
fil

    /* Loop through this set of buckets to compute their length
     */
    e = APR_BRIGADE_FIRST(b);
    while (e != APR_BRIGADE_SENTINEL(b)) {
        if (APR_BUCKET_IS_EOS(e)) {
            eos = 1;
            break;
        }
        if (e->length == (apr_size_t)-1) {
            apr_size_t len;
            const char *ignored;
            apr_status_t rv;

            /* This is probably a pipe bucket.  Send everything
             * prior to this, and then read the data for this bucket.
             */
            rv = apr_bucket_read(e, &ignored, &len, eblock);
            if (rv == APR_SUCCESS) {
                /* Attempt a nonblocking read next time through */
                eblock = APR_NONBLOCK_READ;
                r->bytes_sent += len;
            }
            else if (APR_STATUS_IS_EAGAIN(rv)) {
                /* Output everything prior to this bucket, and then
                 * do a blocking read on the next batch.
                 */
                if (e != APR_BRIGADE_FIRST(b)) {
                    apr_bucket_brigade *split = apr_brigade_split(b, e);
                    apr_bucket *flush =
apr_bucket_flush_create(r->connection->bucket_alloc);

                    APR_BRIGADE_INSERT_TAIL(b, flush);
                    rv = ap_pass_brigade(f->next, b);
                    if (rv != APR_SUCCESS || f->c->aborted) {
                        apr_brigade_destroy(split);
                        return rv;
                    }
                    b = split;
                    e = APR_BRIGADE_FIRST(b);

                    ctx->data_sent = 1;
                }
                eblock = APR_BLOCK_READ;
                continue;
            }
            else {
                ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, r,
                              "ap_content_length_filter: "
                              "apr_bucket_read() failed");
                return rv;
            }
        }
        else {
            r->bytes_sent += e->length;
        }
        e = APR_BUCKET_NEXT(e);
    }

Comment 12 Sorin Mocanu 2004-03-31 14:39:57 UTC
Regarding the 2003-10-10 question from Jeff Trawick, I attached to the process
using strace and it shows 

write(2, "Use of uninitialized value in co"..., 126

That would be a warning printed by a Perl script that has a lot of undefined
variables embeded in strings and runs with -w. (Use of uninitialized value in
concatenation (.) or string at xxx.pl line YYY) Development server, duh :)

After the 300 seconds I got from strace this:

--- SIGPIPE (Broken pipe) @ 0 (0) ---
write(2, "Use of uninitialized value in co"..., 126) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGTERM (Terminated) @ 0 (0) ---

Regards
Sorin Mocanu
Comment 13 Jeff Trawick 2004-03-31 14:57:04 UTC
Sorin,

>Regarding the 2003-10-10 question from Jeff Trawick, I attached 
>to the process using strace and it shows 
>
>write(2, "Use of uninitialized value in co"..., 126

The script is hung writing to stderr.  You're suffering from PR 22030.
Comment 14 Co-Advisor 2004-07-13 19:40:01 UTC
Looks like a possibly related bug #20866 may have been fixed in version 2.0.50.
Folks should retry with that version.
Comment 15 Joe Orton 2004-07-13 19:45:20 UTC
To confirm whether this is related to stderr handling in mod_cgi please try to
reproduce the problem using 2.0.50.
Comment 16 Joe Orton 2004-08-24 20:41:20 UTC
Let's presume this is bug 22030 and hence fixed in 2.0.50. lacking evidence to
the contrary.

*** This bug has been marked as a duplicate of 22030 ***
Comment 17 Kevin Field 2005-08-09 19:54:16 UTC
I'm running a script on our intranet (apache_2.0.52-win32-x86-no_ssl,
ActivePerl-5.8.6.811-MSWin32-x86-122208) querying our MSSQL 2000 server, all
running on a Win2K server, and I'm getting the same error:

[Tue Aug 09 12:56:32 2005] [error] [client 10.0.0.110] (70007)The timeout
specified has expired: ap_content_length_filter: apr_bucket_read() failed,
referer: http://intranet/reports/order_me.pl

It normally takes about 5 seconds to run, if the results are small (less than 50
rows as far as I've experienced.)  In that case, there is no error, and the page
loads and completes just fine.  However, when I modify the script such that 1000
rows are returned, it always hangs in the browser for five minutes (as the
original poster described) after outputting about 41 kb of HTML.  It must be
some kind of buffer problem because it'll stop in mid-print-statement, like
after the '&nb' of an '&nbsp;' in an empty cell.  Worse than the original poster
described, however, is the fact that the results get cut short at that point and
the rest are never displayed.

Thanks,
Kev
Comment 18 Kevin Field 2005-08-22 22:00:09 UTC
We're now on the latest, ActivePerl-5.8.7.813-MSWin32 and apache_2.0.54-win32,
and still the exact same thing happens.

I ran a test using a simple query that's fast and returns a lot of data.  It ran
fine, returning 365 kb of data.

However, I tested the other query (the one that's causing problems) using a
query analyzer, and it took 9 seconds (expected, it's complex) but not any 300
seconds.

I would guess it's timing out waiting for the query results and something
strange is happening that prevents it from getting the rest, because it starts
returning results after only 2 seconds, gives one burst of data, and that's it.

Is nobody else still having things happen like this?  I see David Trusty
reopened one of the ones this was supposedly a duplicate of as well.

If you need any more info from me just let me know.

Thanks,
Kev
Comment 19 Kevin Field 2005-08-22 22:07:58 UTC
For now, anyway, Eric Pearce's suggestion works very well for me:

'I changed the begining of the script from "perl -w" to "perl -X" and all
problems ceased...'

Thanks,
Kev
Comment 20 Joe Orton 2005-10-28 17:35:31 UTC
The only remaining bug here is that Win32 can't poll() across non-socket fds so
mod_cgi's CGI bucket stuff can't be used on that platform.  This needs to be
fixed at APR or somewhere below mod_cgi, marking WONTFIX until it is.  (perhaps
"CANTFIX" is more appropriate)