|Summary:||Apache misbehaves if log file reaches 2GB|
|Product:||Apache httpd-2||Reporter:||Dave Dyer <ddyer>|
|Component:||Core||Assignee:||Apache HTTPD Bugs Mailing List <bugs>|
|Severity:||normal||CC:||apache-bugzilla, greg, sebastian, tdolan|
|URL:||no url available|
Description Dave Dyer 2002-10-10 19:38:05 UTC
If the error log can't be written, it appears that any process that tries to write to the error log is quietly terminated. This can cause serious errors where no serious errors existed. Suggest 1) "quiet termination" should be noisier, to attract attention. 2) there should be some sort of overall status information that could be routinely monitored for such "alarm" conditions as the inability to make log file entries. Here are some details from an event that led to this report. The "symptom" was that some scripted emails arrived blank. There was no other indication of a problem, but once someone complained about the blank emails, It was noticed that the error_log was 2GB in size and had stopped growing due to maximum file size. The web site was otherwise operating normally, except that errors weren't being logged. Further investigation showed that a CGI subprocess invlolved in sending the email was routinely emitting a warning message, which was logged, and the inability to log the warning caused the process to terminate. Since the main thread emitting HTML didn't fail, the user's screen didn't indicate any problem.
Comment 1 Joshua Slive 2002-10-16 16:58:56 UTC
Hmmm... I'm not really an expert in this, but it seems it is the CGI script that needs to guard against the out-of-space condition, not Apache. What could Apache do to prevent this?
Comment 2 Dave Dyer 2002-10-16 17:18:52 UTC
No, the CGI is not even aware that a log file is being written; it is merely emitting some text on stderr. For its pains, it gets terminated by apache.
Comment 3 Joshua Slive 2002-10-17 01:35:13 UTC
Again, I'm not an expert in this, but Apache is just attaching the stderr pipe to the error log. If this pipe can't be written, the OS is probably just sending SIGPIPE to your CGI. If you CGI doesn't handle that, it will die. I don't see anything that Apache could do about this.
Comment 4 Dave Dyer 2002-10-17 05:27:54 UTC
Apache can't be doing anything as simpleminded as you suggest, because it is running many concurrent CGI threads. So the fate of this particular thread could be better. Secondarily, as I suggested, apache must know that globally speaking, it's log file is not working. There should be some alert procedure - it shouldn't just allow shit to happen until someone notices the mess and deduces the cause and cure.
Comment 5 Joshua Slive 2002-10-17 13:30:42 UTC
I don't understand your first paragraph at all. How does multiple CGI processes have anything to do with it? They are all handled identically. Regarding behaving better when the error log is full, that seems kind of absurd to me. Apache writes errors to the the error log. If it can't do that, what is it supposed to do? Write to another log file? What if that log can't be written to? Send a message to your pager? Walk down the hall and tap you on the shoulder? ;-) At some point it needs to be the administrator's responsibility to maintain available system resources.
Comment 6 William A. Rowe Jr. 2002-10-17 14:52:17 UTC
In this case, this minimum requirement is that a 500 error (with no additional 'helpful information') should be sent to the client of this terminated script.
Comment 7 Dave Dyer 2002-10-17 18:08:01 UTC
Re: multiple CGI processes - it their output was tee'd directly from stderr to a log file, the output from multiple processes would be chaotically interspersed. That in itself would be a problem. The output from DTDOUT is clearly handled individually, so why not STDERR? For the momement, lets separate the question of what happened to this particular CGI from that happens in general. 1) For this particular CGI, a warning message caused the process to terminate, effectively turning friendly glitch audit trail into a black hole. Clearly not a desirable outcome. 2) From the viewpoint of apache as a whole, if log files are not being maintained and processes are being killed randomly, the system as a whole is in serious jeopordy. Perhaps Apache should shut down (which will surely get someone's attention) but at the very least there should be an emergency channel of some sort throug which apache reports serious internal problems. The standard log files are not a good place for this, because apache's (rare we hope) internal problems would be lost in mountains of routine log data. Consider the current case; Apache knows exactly what is wrong, and can/should tell the sysop "error log can't be written". Consider the current behavior as a debugging problem "my email arrived blank". Which would you rather deal with?
Comment 8 Jeff Trawick 2003-02-14 21:41:32 UTC
This is the 2GB log file problem :( 32-bit builds of Apache (1.3 or 2.0) don't currently handle "large" files. Apache child processes will die trying to write logs which have reached 2GB in size. The obvious place to report the proble is -- you guessed it -- the log file that we can't write to. The current message to users is that they need to use log rotation to keep the size well under 2GB. Yes this sucks, yes it will eventually get fixed, yes there are issues that make it problematic to enable on some platforms (including Linux). Sorry :(
Comment 9 Jeff Trawick 2003-05-22 18:39:46 UTC
*** Bug 20160 has been marked as a duplicate of this bug. ***
Comment 10 Joe Orton 2003-05-22 19:56:59 UTC
Yeah, you can improve the failure mode for this problem by setting SIGXFSZ (the signal which kills the process when it write()s past 2gb) to SIG_IGN. It's a trivial change - I'll submit a patch.
Comment 11 Joe Orton 2004-05-06 10:53:07 UTC
Fixed in HEAD by allowing >2Gb log files on platforms which have this limit; will be proposed for backport for the next 2.0 release.
Comment 12 Joe Orton 2004-05-14 11:06:17 UTC
*** Bug 28968 has been marked as a duplicate of this bug. ***
Comment 13 Joe Orton 2004-08-24 20:37:16 UTC
*** Bug 30170 has been marked as a duplicate of this bug. ***