Bug 35768 - Missing file logs at far too high of log level
Summary: Missing file logs at far too high of log level
Status: REOPENED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Core (show other bugs)
Version: 2.5-HEAD
Hardware: All All
: P2 normal with 3 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: FixedInTrunk, PatchAvailable
Depends on:
Blocks:
 
Reported: 2005-07-17 03:16 UTC by Todd Vierling
Modified: 2021-07-16 19:13 UTC (History)
1 user (show)



Attachments
Patch implementing APLOG_NOTICE for these messages (922 bytes, patch)
2005-07-17 03:18 UTC, Todd Vierling
Details | Diff
Patch implementing APLOG_INFO for these messages (1.75 KB, patch)
2005-07-18 15:03 UTC, Todd Vierling
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Todd Vierling 2005-07-17 03:16:31 UTC
APLOG_ERROR and APLOG_WARNING are used for some very important log messages, but
in order to squelch "file does not exist" log messages, the admin must set
LogLevel crit.  This is suboptimal.  A "file not found" type of condition is a
perfectly normal, if somewhat notable, operation of any Web server and should
not be logged at an error level on par with internal Web server problems.

I would suggest APLOG_NOTICE for these log messages (server/core.c), as that
preserves the obvious code intent of making these more notable conditions, but
moves the priority below messages that may require admin user intervention.
Comment 1 Todd Vierling 2005-07-17 03:18:20 UTC
Created attachment 15693 [details]
Patch implementing APLOG_NOTICE for these messages
Comment 2 Jeff Trawick 2005-07-18 14:36:04 UTC
APLOG_NOTICE messages are *always* written, irrespective of log level.
Comment 3 Todd Vierling 2005-07-18 15:02:48 UTC
Ah, in that case, a different level is needed -- in which case I go back to
suggesting APLOG_INFO.

In any case, I found that there are more places logging exactly the same message
at APLOG_ERR (sigh).  I'm attaching a new diff fixing all the places I can find.

(In context, in case it helps this be recognized as a real bug:  Mozilla, IE, et
al., look for "/favicon.ico" everywhere, and rather few sites on my server have
such an icon file.  So error_log gets filled up amazingly fast with "File does
not exist" errors -- eclipsing any *real* errors that may require admin
intervention.)
Comment 4 Todd Vierling 2005-07-18 15:03:37 UTC
Created attachment 15696 [details]
Patch implementing APLOG_INFO for these messages
Comment 5 Nick Kew 2007-07-19 17:45:06 UTC
I disagree.  If it's generating an HTTP error, as is the case in all the
instances in your patch, then it's an error and should be logged as such.
Comment 6 Todd Vierling 2007-07-23 13:18:04 UTC
OK, so a missing static content file (an *obvious* fix simply by looking at the
files in the tree) is just as bad of an error as an Apache-internal problem or
major dynamic content issue?  I think there's a question of priorities here,
which is why this bug was submitted in the first place.

Certainly I can't be the only site admin seeing bazillions of logged messages
simply because of missing static content, which flood out all the other truly
important errors.  That's just Wrong.
Comment 7 Ruediger Pluem 2007-07-23 13:36:12 UTC
I agree with Nick. It is an error and from an admin perspective I want to see it
at the error level and I do not want to be forced to increase the loglevel
verbosity for doing this. If I do not want to see these "file not found
messages" for further analysis I filter them out with grep -v.
Furthermore it is important to keep this message at the error level for
backwards compatibility to avoid breaking error analysis scripts.
Comment 8 Todd Vierling 2007-07-23 13:39:02 UTC
Which is why I originally tried to find some sort of log level that meant
"notable but not a fatal error", but APLOG_NOTICE does not function the same way
as syslog's "NOTICE" level (which in syslog is the one between INFO and ERR for
precisely this purpose).

*shrug* I tried.  I give up.
Comment 9 Jay Freeman 2009-04-16 02:07:37 UTC
Let me rephrase this problem: the fact that this error setting works this way is an opening for a denial of service attack on the disk space of the server. When you are dealing with a high traffic site it is very important that nothing an end user can do can cause disk space to get used in proportion to activities they perform.

On my server, I get tens of thousands of 404s every day, and I'm not even a "large" website (I only have a couple million users). If a user wanted to mess with me, they'd just start requesting 404s off my server until I run out of disk space. I go to great lengths to make certain I know what every single write that is occurring due to my traffic is, from what database entries are stored to what lines are logged.

To that end, I've turned off almost all of the access logs (having switched to much more flexible database logging being done by my application), and have carefully stared at all of the message and error logs on this box. The only thing that I can't account for is why Apache insists on storing these "File does not exist" events as if they were "errors"

Yes, I understand that this is "generating an HTTP error". However, there is a major difference in HTTP between a Client Error (4xx) and a Server Error (5xx). Lumping them all into the same error category makes no sense, on neither a technical nor an intuitive level. There is a reason why these two different types of "error" have fundamentally different codes in HTTP, and it is weird that the most popular (and most flexible, and all around otherwise awesomest) webserver doesn't honor the same distinction.

In all seriousness: I can't do anything to fix or help a 404. If someone decides to go to a random URL that is either not supported anymore or never existed in the first place, I'm not going to wait anxiously at the logs and see if there's  a problem. I also can't do anything to handle 403 errors (some user typed the wrong password) or 400 errors (someone's using a broken web browser).

In essence, any error that causes a 4xx is completely useless to me, and isn't an /error/: it's nothing more than an interesting statistical anomaly. The server errors, though, are important to me (and are hopefully important to everyone else ;P): these are things I need to fix. More to the point: these are things I have some hope of fixing at all, and that for which it is soely my responsibility to minimize the occurrence.

In fact, the example error listed in the documentation, "premature end of script headers", is so fundamentally more important than "file does not exist" that the point should already be clear: the first one is an error caused by me, and the second one is a mistake caused by a user.

So, thinking about it from this perspective, we can ask whose fault it is that a line appears in the error log, or better: whose fault it is that 100 bytes of space just got eaten from the server's disk. There really needs to be a setting where things caused by me are logged and things caused by a user aren't.

In fact, this is so fundamental that it almost seems like an orthogonal problem to error levels. The example warning is even more interesting to me than "file does not exist": "child process 1234 did not exit, sending another SIGHUP". While not terribly important, I'm actually kind of curious why my server instances are being blocked so badly that they aren't reloading cleanly.

To take the fault angle: this is an event that I caused, and that I might have some interest in fixing. Maybe there needs to be a way to say "don't print errors to the logs if the errors would cause normal HTTP responses" or "are caused by the user, not by the administrator"?

Regardless, the "file does not exist" spam needs to stop. :( Doing Google searches finds a bunch of cases of people being burned by this, from some guy who ended up with a 90GB log file from Google reindexing content he had taken offline to companies having to reassure users that these "errors" they are seeing in their log files since upgrading their product are harmless.

I'm actually curious; to the people who insist that these entries end up in the error log: do you actually pay attention to them, and if you do, what do you do when you see them? What is your tactical response to seeing a 404? Does anyone out there actually want this in their logs? :(
Comment 10 Stefan Fritsch 2011-10-08 07:37:49 UTC
Fixed in trunk in r1180328
Comment 11 Stefan Fritsch 2012-02-26 16:36:46 UTC
fixed in 2.4.1
Comment 12 ysdev 2016-10-12 22:55:41 UTC
Re Jay Freeman/#9:

> If a user wanted to mess with me, they'd just start requesting 404s off my server until I run out of disk space.

How exactly does the requested patch deny them the ability to do so? Best case scenario, it slows them down by roughly 30%, unless you omit all 404 requests from the access log as well.

> In all seriousness: I can't do anything to fix or help a 404.

You're approaching this thing from a very narrow minded point of view. For you, 404 errors are apparently forced on your server from someone on the outside, possibly even with ill intent.

From my perspective, 404 errors happen when I link to files that don't exist, and so the huge majority of 404 errors I'm seeing are caused by *me*, until I fix my site.

I'd agree that the ratio is likely to have changed over time because all kinds of clients request all kinds of files nowadays that may or may not exist, in order to provide browser, platform, and crawler dependent content.

> do you actually pay attention to them, and if you do, what do you do when you see them? What is your tactical response to seeing a 404? Does anyone out there actually want this in their logs? :(

Yes, I absolutely do.

Like I said, most of these errors are caused because *I* screwed up (or "my" web developers/designers etcetera), and thus *I* need to fix them by correcting broken links/image tags/script sources/background image urls.

404 errors that are caused by outside parties (apple-touch-icon foo and so on), well, let me just ignore them.

IMHO, a better solution would have been to introduce new log levels, while making sure that the default behavior remains unchanged; maybe by splitting errors into error and http_error or some such.

I guess now that the change has been implemented (5 years ago, I know... but I just came across this for the first time), there's little hope of getting the previous behavior restored.

Anyway, the lack of documentation just cost me 3 hours of my life, so I'd have appreciated if this had been reflected in the 2.4 documentation - at first I honestly thought my recent Debian 8 upgrade had somehow broken Apache's error logging in a very weird way and checked all kinds of stuff before searching the web.

Bug report regarding the documentation: [1]

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=60247
Comment 13 Todd Vierling 2016-10-13 00:39:59 UTC
(In reply to ysdev from comment #12)
> > In all seriousness: I can't do anything to fix or help a 404.
> 
> You're approaching this thing from a very narrow minded point of view. For
> you, 404 errors are apparently forced on your server from someone on the
> outside, possibly even with ill intent.
> 
> From my perspective, 404 errors happen when I link to files that don't
> exist, and so the huge majority of 404 errors I'm seeing are caused by *me*,
> until I fix my site.

And you have a way to get them anywhere you want, without them being logged to error_log at the same level as major server problems: mod_log_config. You can even use CustomLog's expression support to log only 404s to a file of their own, if you prefer. And you get to choose how they're logged, including translated path and everything else you could need.

Or just set the error log level to info if you want all content related errors (there are more than just not-found errors logged at this level, and by the same argument above, you may want to see those too).

The problem fixed by this bug is that major server errors and content-missing errors had the same log level -- so the former could be drowned out by the latter, or the former would disappear completely if the error log level were tuned to avoid warning level messages.
Comment 14 Philippe Cloutier 2021-07-16 18:21:24 UTC
(In reply to Todd Vierling from comment #13)
> (In reply to ysdev from comment #12)
> > > In all seriousness: I can't do anything to fix or help a 404.
> > 
> > You're approaching this thing from a very narrow minded point of view. For
> > you, 404 errors are apparently forced on your server from someone on the
> > outside, possibly even with ill intent.
> > 
> > From my perspective, 404 errors happen when I link to files that don't
> > exist, and so the huge majority of 404 errors I'm seeing are caused by *me*,
> > until I fix my site.
> 
> And you have a way to get them anywhere you want, without them being logged
> to error_log at the same level as major server problems: mod_log_config.

How would that be done?
Comment 15 Philippe Cloutier 2021-07-16 18:32:48 UTC
Thank you for this report and for the work on it. I experienced this annoyance several times.

That being said, I am not satisfied by the new situation neither. While warn must have been excessive, I doubt info is sufficient. 404-s seem to fit the definition of notice pretty well:
> Normal but significant condition.

I'm afraid this highlights a relatively false dichotomy between access_log and error_log.

nginx logs 404-s in its error log, and features a simple boolean directive to control that behavior: http://nginx.org/en/docs/http/ngx_http_core_module.html#log_not_found
I believe httpd should either offer more sophisticated logging or make configuring this behavior as easy as nginx.