APLOG_ERROR and APLOG_WARNING are used for some very important log messages, but in order to squelch "file does not exist" log messages, the admin must set LogLevel crit. This is suboptimal. A "file not found" type of condition is a perfectly normal, if somewhat notable, operation of any Web server and should not be logged at an error level on par with internal Web server problems. I would suggest APLOG_NOTICE for these log messages (server/core.c), as that preserves the obvious code intent of making these more notable conditions, but moves the priority below messages that may require admin user intervention.
Created attachment 15693 [details] Patch implementing APLOG_NOTICE for these messages
APLOG_NOTICE messages are *always* written, irrespective of log level.
Ah, in that case, a different level is needed -- in which case I go back to suggesting APLOG_INFO. In any case, I found that there are more places logging exactly the same message at APLOG_ERR (sigh). I'm attaching a new diff fixing all the places I can find. (In context, in case it helps this be recognized as a real bug: Mozilla, IE, et al., look for "/favicon.ico" everywhere, and rather few sites on my server have such an icon file. So error_log gets filled up amazingly fast with "File does not exist" errors -- eclipsing any *real* errors that may require admin intervention.)
Created attachment 15696 [details] Patch implementing APLOG_INFO for these messages
I disagree. If it's generating an HTTP error, as is the case in all the instances in your patch, then it's an error and should be logged as such.
OK, so a missing static content file (an *obvious* fix simply by looking at the files in the tree) is just as bad of an error as an Apache-internal problem or major dynamic content issue? I think there's a question of priorities here, which is why this bug was submitted in the first place. Certainly I can't be the only site admin seeing bazillions of logged messages simply because of missing static content, which flood out all the other truly important errors. That's just Wrong.
I agree with Nick. It is an error and from an admin perspective I want to see it at the error level and I do not want to be forced to increase the loglevel verbosity for doing this. If I do not want to see these "file not found messages" for further analysis I filter them out with grep -v. Furthermore it is important to keep this message at the error level for backwards compatibility to avoid breaking error analysis scripts.
Which is why I originally tried to find some sort of log level that meant "notable but not a fatal error", but APLOG_NOTICE does not function the same way as syslog's "NOTICE" level (which in syslog is the one between INFO and ERR for precisely this purpose). *shrug* I tried. I give up.
Let me rephrase this problem: the fact that this error setting works this way is an opening for a denial of service attack on the disk space of the server. When you are dealing with a high traffic site it is very important that nothing an end user can do can cause disk space to get used in proportion to activities they perform. On my server, I get tens of thousands of 404s every day, and I'm not even a "large" website (I only have a couple million users). If a user wanted to mess with me, they'd just start requesting 404s off my server until I run out of disk space. I go to great lengths to make certain I know what every single write that is occurring due to my traffic is, from what database entries are stored to what lines are logged. To that end, I've turned off almost all of the access logs (having switched to much more flexible database logging being done by my application), and have carefully stared at all of the message and error logs on this box. The only thing that I can't account for is why Apache insists on storing these "File does not exist" events as if they were "errors" Yes, I understand that this is "generating an HTTP error". However, there is a major difference in HTTP between a Client Error (4xx) and a Server Error (5xx). Lumping them all into the same error category makes no sense, on neither a technical nor an intuitive level. There is a reason why these two different types of "error" have fundamentally different codes in HTTP, and it is weird that the most popular (and most flexible, and all around otherwise awesomest) webserver doesn't honor the same distinction. In all seriousness: I can't do anything to fix or help a 404. If someone decides to go to a random URL that is either not supported anymore or never existed in the first place, I'm not going to wait anxiously at the logs and see if there's a problem. I also can't do anything to handle 403 errors (some user typed the wrong password) or 400 errors (someone's using a broken web browser). In essence, any error that causes a 4xx is completely useless to me, and isn't an /error/: it's nothing more than an interesting statistical anomaly. The server errors, though, are important to me (and are hopefully important to everyone else ;P): these are things I need to fix. More to the point: these are things I have some hope of fixing at all, and that for which it is soely my responsibility to minimize the occurrence. In fact, the example error listed in the documentation, "premature end of script headers", is so fundamentally more important than "file does not exist" that the point should already be clear: the first one is an error caused by me, and the second one is a mistake caused by a user. So, thinking about it from this perspective, we can ask whose fault it is that a line appears in the error log, or better: whose fault it is that 100 bytes of space just got eaten from the server's disk. There really needs to be a setting where things caused by me are logged and things caused by a user aren't. In fact, this is so fundamental that it almost seems like an orthogonal problem to error levels. The example warning is even more interesting to me than "file does not exist": "child process 1234 did not exit, sending another SIGHUP". While not terribly important, I'm actually kind of curious why my server instances are being blocked so badly that they aren't reloading cleanly. To take the fault angle: this is an event that I caused, and that I might have some interest in fixing. Maybe there needs to be a way to say "don't print errors to the logs if the errors would cause normal HTTP responses" or "are caused by the user, not by the administrator"? Regardless, the "file does not exist" spam needs to stop. :( Doing Google searches finds a bunch of cases of people being burned by this, from some guy who ended up with a 90GB log file from Google reindexing content he had taken offline to companies having to reassure users that these "errors" they are seeing in their log files since upgrading their product are harmless. I'm actually curious; to the people who insist that these entries end up in the error log: do you actually pay attention to them, and if you do, what do you do when you see them? What is your tactical response to seeing a 404? Does anyone out there actually want this in their logs? :(
Fixed in trunk in r1180328
fixed in 2.4.1
Re Jay Freeman/#9: > If a user wanted to mess with me, they'd just start requesting 404s off my server until I run out of disk space. How exactly does the requested patch deny them the ability to do so? Best case scenario, it slows them down by roughly 30%, unless you omit all 404 requests from the access log as well. > In all seriousness: I can't do anything to fix or help a 404. You're approaching this thing from a very narrow minded point of view. For you, 404 errors are apparently forced on your server from someone on the outside, possibly even with ill intent. From my perspective, 404 errors happen when I link to files that don't exist, and so the huge majority of 404 errors I'm seeing are caused by *me*, until I fix my site. I'd agree that the ratio is likely to have changed over time because all kinds of clients request all kinds of files nowadays that may or may not exist, in order to provide browser, platform, and crawler dependent content. > do you actually pay attention to them, and if you do, what do you do when you see them? What is your tactical response to seeing a 404? Does anyone out there actually want this in their logs? :( Yes, I absolutely do. Like I said, most of these errors are caused because *I* screwed up (or "my" web developers/designers etcetera), and thus *I* need to fix them by correcting broken links/image tags/script sources/background image urls. 404 errors that are caused by outside parties (apple-touch-icon foo and so on), well, let me just ignore them. IMHO, a better solution would have been to introduce new log levels, while making sure that the default behavior remains unchanged; maybe by splitting errors into error and http_error or some such. I guess now that the change has been implemented (5 years ago, I know... but I just came across this for the first time), there's little hope of getting the previous behavior restored. Anyway, the lack of documentation just cost me 3 hours of my life, so I'd have appreciated if this had been reflected in the 2.4 documentation - at first I honestly thought my recent Debian 8 upgrade had somehow broken Apache's error logging in a very weird way and checked all kinds of stuff before searching the web. Bug report regarding the documentation: [1] [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=60247
(In reply to ysdev from comment #12) > > In all seriousness: I can't do anything to fix or help a 404. > > You're approaching this thing from a very narrow minded point of view. For > you, 404 errors are apparently forced on your server from someone on the > outside, possibly even with ill intent. > > From my perspective, 404 errors happen when I link to files that don't > exist, and so the huge majority of 404 errors I'm seeing are caused by *me*, > until I fix my site. And you have a way to get them anywhere you want, without them being logged to error_log at the same level as major server problems: mod_log_config. You can even use CustomLog's expression support to log only 404s to a file of their own, if you prefer. And you get to choose how they're logged, including translated path and everything else you could need. Or just set the error log level to info if you want all content related errors (there are more than just not-found errors logged at this level, and by the same argument above, you may want to see those too). The problem fixed by this bug is that major server errors and content-missing errors had the same log level -- so the former could be drowned out by the latter, or the former would disappear completely if the error log level were tuned to avoid warning level messages.
(In reply to Todd Vierling from comment #13) > (In reply to ysdev from comment #12) > > > In all seriousness: I can't do anything to fix or help a 404. > > > > You're approaching this thing from a very narrow minded point of view. For > > you, 404 errors are apparently forced on your server from someone on the > > outside, possibly even with ill intent. > > > > From my perspective, 404 errors happen when I link to files that don't > > exist, and so the huge majority of 404 errors I'm seeing are caused by *me*, > > until I fix my site. > > And you have a way to get them anywhere you want, without them being logged > to error_log at the same level as major server problems: mod_log_config. How would that be done?
Thank you for this report and for the work on it. I experienced this annoyance several times. That being said, I am not satisfied by the new situation neither. While warn must have been excessive, I doubt info is sufficient. 404-s seem to fit the definition of notice pretty well: > Normal but significant condition. I'm afraid this highlights a relatively false dichotomy between access_log and error_log. nginx logs 404-s in its error log, and features a simple boolean directive to control that behavior: http://nginx.org/en/docs/http/ngx_http_core_module.html#log_not_found I believe httpd should either offer more sophisticated logging or make configuring this behavior as easy as nginx.