Bug 20866 - cgi script just hangs at end, then apr_bucket_read error
Summary: cgi script just hangs at end, then apr_bucket_read error
Status: RESOLVED DUPLICATE of bug 22030
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_cgi (show other bugs)
Version: 2.0.45
Hardware: Sun Solaris
: P1 major with 7 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-06-18 11:53 UTC by Jon
Modified: 2004-11-16 19:05 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jon 2003-06-18 11:53:36 UTC
This was very weird.

Absolutely no scripts or files changed on my website in the last 4 weeks.  
Apache 2.0.45 was installed on April 23 (two months ago).  As of 1.5 weeks ago, 
my logged in users were being delivered the page successfully, but the script 
just hangs at the end for exactly 5 minutes then it finished with this error 
written to the error-log file:

(70007)The timeout specified has expired: ap_content_length_filter: 
apr_bucket_read() failed

Basically the page would be delivered normally, but the web browser would 
appear to continue loading the page even though it was already fully 
delivered.  After 5 minutes of this, finally the webserver would say "done" and 
the browser would stop.

Because my database data does change very frequently, I first thought my 
database data was calling a broken image or something that would cause the 
browser to hang.  Nope.  Then I thought perhaps my perl cgi script was broken 
even though it hadn't changed.  But if a perl script actually pauses for 
minutes, I would expect ps to show the perl script still running, but it was 
not - ps showed no perl scripts running while the browser was waiting.

So I focused on Apache.  I read somewhere on bugzilla that apr_bucket_read is 
caused when a script does not finish in time for the KeepAliveTimeOut value.  
But I *know* my perl script never takes more than 3 seconds to run.  Something 
in Apache was timing out after 5 minutes because the access-log shows every 
access to the script would take 303 seconds (3 for it to run, and 300 seconds 
would be a timeout somewhere).  But why was Apache hanging on to the script for 
5 minutes??

I read somewhere on a google search that one person had a problem with Apache2 
and non-buffered output.  So I made just TWO changes to my entire perl script, 
I removed the $|++ command which turned off auto-flushed output thus enabling 
buffered output.  I also removed a line that simply warns me of a error:

warn "weird";

Now the problem went away!

Humm, I never noticed the warn command in any of my apache error logs, but this 
perl script was being called within another perl script, i wonder of a warn 
message (or a simple STDERR message) from a perl script called within another 
perl script could cause this problem?

I have an example of this problem here.

A page that does not have the problem:
http://www.hotdlz.com/apachebug1.pl

A page that DOES have the problem:
http://www.hotdlz.com/apachebug/

Both scripts do a require "/path/other-script.pl" inside.  The only difference 
between the two above scripts is they call a different /path/other-script.pl.  
The only difference between the two /path/other-script.pl's are the following:

$|++; is at the top, and
warn "something weird happened"; is located within a foreach loop through the 
data if something weird happens.

The broken script has these two lines, while the fixed script has these two 
lines taken out.

I hope this information helps in identifying the bug, or at least helps point 
someone else in the right direction to fixing the problem on their site.
Comment 1 Jon 2003-06-18 14:27:12 UTC
by the way, due to security reasons, those URL's shown in the bug report will 
cease to function in 3 days (on or after Saturday June 21).
Comment 2 Jeff Trawick 2003-06-21 02:38:54 UTC
If the perl script is not running but Apache is still hanging on a pipe read
that is supposedly output from the script, maybe another process inherited the
descriptor and has not closed it?

Can you try lsof to see if another process has a descriptor to that same pipe?

(Yeah, this is probably an impractical thing to test on your real site :( )

A commmon problem is a script that writes a bunch to stderr and ends up hanging
on the write because Apache hasn't started reading that pipe yet.  (This may be
depending on scriptlog setting...  can't recall at the moment.)  Does the script
write a bunch to stderr?  This contradicts your statement that ps doesn't show
the script still running, but this is a fairly common problem so I have to ask.

If you can figure out a small-ish script to allow us to duplicate the problem,
we're in business.
Comment 3 Sander Holthaus 2003-06-22 20:13:06 UTC
I think this bug is related to bug 18348 and bug 20111. 
Comment 4 Jon 2003-06-22 21:12:02 UTC
I left the above-mentioned two URL's online, but I made a few changes to them.  
This is interesting.

When I adjust the mysql SQL query to get back only 5 or 10 rows, the bug goes 
away for the "bug" script.  But when I adjust it to get back 71 rows, then the 
bug exists.

Basically, I adjusted my "buged" script:
http://www.hotdlz.com/apachebug/

again, that is a script that does a perl "require" to another script that 
actually does the sql and html output.  That require'd script is now set to 
give out 71 rows from the mysql query, and the foreach loop has a 'warn "foo"' 
inside each iteration of the loop, and the bugged script has $|=1 at the top 
too.  The other script,

http://www.hotdlz.com/apachebug1.pl

is identical in that it calls scripts that are identical except the require'd 
perl script has no $|=1 and no warn "foo" in the foreach.  Both do return 71 
rows.

So the problem script would end up writing "foo" to STDERR over 70 times while 
the non-problem script would write nothing.  I wonder if that has something to 
do with the problem?

Note that I disabled the output of the SQL queries, so if you go to those URL's 
you won't see much, but you will see apache hang for 5 minutes on 
www.hotdlz.com/apachebug/

Now for some comments:

> Can you try lsof to see if another process has a
> descriptor to that same pipe?

I'm not sure how to do this, but I'll try:

I ran the http://www.hotdlz.com/apachebug/ let it do its pause, then I did this 
command as root:
lsof | grep IPv4 | grep 'my.ip.address'

and I found one line:
httpd     19860   nobody   37u  IPv4   59112398                   TCP 
server.ip.address:http->my.client.ip.address:61975 (ESTABLISHED)

In fact that lsof line stays there for the entire 5 minutes.  The moment the 5 
minute timeout occurrs, Apache writes that 
ap_content_length_filter: apr_bucket_read() failed
line to the error log, and the lsof line disappears.

> A commmon problem is a script that writes a bunch
> to stderr and ends up hanging on the write because
> Apache hasn't started reading that pipe yet.  
> (This may be depending on scriptlog setting...
> can't recall at the moment.)

I have no scriptlog setting in my httpd.conf

> Does the script write a bunch to stderr?

yes, as I mentioned above, the problem script will write a bunch of non-
buffered ($|=1) output to STDERR.

> This contradicts your statement that ps doesn't
> show the script still running

actually I believe I was mistaken.  A ps does indeed show the perl script is 
still running.

Obviously the first question would be, does the script in fact take 5 minutes 
to run, but no, if I run the script on the command line with the same 
enviornment variables as Apache, it runs and terminates normally in about 2 
seconds.

> If you can figure out a small-ish script to
> allow us to duplicate the problem, we're in business.

I tried to make a small script that shows this problem, but I had no luck.
Comment 5 Jon 2003-06-22 21:32:03 UTC
Yeah, I think the problem has to do with at least 70 or 80 lines of output to 
STDERR.

I was able to make a small script that shows the problem.

http://www.hotdlz.com/bug/

here is the source code, there are two files index.cgi and bug.pl:

----- index.cgi -----
#!/usr/bin/perl

print "Content-type: text/html\n";
print "Expires: Sun, 22 Jun 2003 20:27:21 GMT\n";
print "Last-Modified: Sun, 22 Jun 2003 20:27:21 GMT\n";
print "Set-Cookie: user=0\n";
print "\n";

if (1) {
  require "/my/www/path/bug/bug.pl";
}

----- bug.pl -----
#!/usr/bin/perl

$| ++;  # non-buffered, it does not matter if you comment this line, bug still 
exists

print "<HTML><BODY>\n";

for ($i=1; $i <= 100; $i++) {
  warn "foo";
  print "$i<BR>";
}

----- END OF SCRIPTS -----

If you adjust the $i <= 100 to something smaller, like 50, there is no 5 minute 
pause.

When I run these scripts through the URL, I do see them in ps and lsof for 5 
minutes until apache times it out with the apr_bucket_read error in its error 
log.

I hope this helps.
Comment 6 Jon 2003-06-22 21:58:56 UTC
More information...  with $i <= 50 i do see exactly 50 "foo" messages to my 
apache error log.  With $i <= 100, I noticed that there are exactly 200 "foo" 
messages in my apache error log, not the expected 100.  Very weird.  No error 
messages are written to the log during the 5 minute pause, but after 5 minutes 
is up, there are double the expected messages in my apache error log.

I also noticed that once the 5 minutes are up while running this script, there 
is no "ap_content_length_filter: apr_bucket_read() failed" right away, but the 
message does appear at odd times... sometimes a minute after I kick off the 
URL, sometimes a minute after the 5 min pause is completed.

Why perl's output to STDERR is important to me is because sometimes often I 
program I like to have debug output available, sometimes a lot of debug output, 
especially when I am trying to solve a problem with my code.  Having a lot of 
STDERR messages written to apache's log is very important to me.
Comment 7 Rabi 2003-06-26 16:21:11 UTC
I have developed a web based tool which uses Oracle Database and PERL-DBI. Since
last few days, one of the pages, where I query database and print out page,
hangs. It keeps on saying getting data but it just gets timed out. Same script
is running fine on development server (even when I point to live Database from
development server) Only difference on two servers is that development server is
running Apache 1.2.37 and production server is running Apache 2. In the log file
on development server I am getting apr_bucket_read() failed error 

I am using following cgi modules -

use CGI qw(:standard);
use DBI;
new CGI;
my $in=new CGI;
print $in->header();

and then print the html using print qq~ < html stuff > ~;

I have gone through the post here and want to say, if something is working  fine
on lower version as per the normal convention it should work in the latest
upgrade as well.
Comment 8 Jeff Trawick 2003-07-11 11:19:52 UTC
Can the original reporter verify that when omitting the stderr output the script
works fine?

Rabi, is this the situation with your problem as well?

Hangs when the script writes enough data to stderr to fill the kernel pipe
buffer is is a known bug that is slowly being addressed.
Comment 9 Toan Pham 2003-08-24 17:07:25 UTC
i got this problem when transfer a cgi website from a unix based os(alpha 
proccessor) to a linux based oS(dual xeon proccessors).
Comment 10 Eric Pearce 2003-10-29 23:10:41 UTC
I had same problem going from 1.3.26 to 2.0.47.  I have a perl CGI script that 
talks to MySQL -- queries that returned a lot of data starting hanging after 
moving from Apache 1.x to 2.x.  I changed the begining of the script 
from "perl -w" to "perl -X" and all problems ceased...
Comment 11 Co-Advisor 2003-12-12 17:25:29 UTC
This bug is probably the same as bug #23528.

I can confirm that the bug happens for CGI programs that do not write
much to stderr and do not fork anything. The bug may be related to
the size of the request. The bug is only present in Apache 2.x and
has been there for many months.

Comment 12 Joe Orton 2004-07-13 09:31:26 UTC
Can those who can reproduce this bug please try 2.0.50? 
Comment 13 Co-Advisor 2004-07-13 19:28:12 UTC
Works for me with httpd 2.0.50 on FreeBSD 4.7.
I cannot try on Solaris, but that's big progress already!
Thank you.
Comment 14 Joe Orton 2004-07-13 19:34:53 UTC
Let's presume that the issues here are all problems related to stderr handling
in mod_cgi.  Please reopen if anyone can reproduce problems using 2.0.50.

*** This bug has been marked as a duplicate of 22030 ***