Bug 46688 - Child segfault when mmaped file truncated
Summary: Child segfault when mmaped file truncated
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Core (show other bugs)
Version: 2.5-HEAD
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-10 07:53 UTC by Dan Poirier
Modified: 2009-09-14 06:18 UTC (History)
1 user (show)



Attachments
A test module to force file truncation before a request is handled (3.49 KB, text/plain)
2009-02-10 07:53 UTC, Dan Poirier
Details
Patch for documentation to clarify that the crash problem isn't dependent on NFS file systems (808 bytes, patch)
2009-02-12 05:38 UTC, Dan Poirier
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Poirier 2009-02-10 07:53:01 UTC
Created attachment 23246 [details]
A test module to force file truncation before a request is handled

I'm seeing a child process segfault when:

1) mmap enabled
2) an output filter is looking at the response body (e.g. mod_deflate)
3) the file being served is truncated between the initial stat() and the handler running

Apache does an mmap for the original file length and when the output filter tries to read the mapped memory past where the current file ends, the child seg faults.

The enableMMap documentation warns about possible seg faults when a mapped NFS file is truncated, but I'm seeing this on a local file system.

I've reproduced this on Linux and z/OS, not sure about other platforms but it seems likely to affect them too.

If nothing is actually looking at the response body before we send it, then there's not a seg fault.  The memory is passed to writev() and it seems to catch the problem and return an error.  Maybe this is the part that only fails on NFS?

A file getting truncated in the middle of a request is unlikely to happen often by chance, I suppose.  I'll attach a test module from Jeff Trawick that forces the truncation and makes this easy to reproduce.

I've thought about this some but haven't come up with a good approach to avoid this problem.  We could stat() the file again to see if it has shrunk, but there will always be a window where it could be truncated between when we stat() it and when we actually look at the data.  

The consequences of not fixing it aren't too bad anyway -- Apache just logs the child process failure and starts a new one.  

Maybe the best we can do is add to the warning in the enableMMap documentation.
Comment 1 Joe Orton 2009-02-12 00:56:28 UTC
I've no idea where the comment about NFS filesystems came from in the docs.  Yes, this is expected behaviour for any filesystem.

Regardless of whether EnableMMAP is enabled, you generally don't want to be modifying files in-place, since that may result in broken responses to active clients.
Comment 2 Nick Kew 2009-02-12 02:02:17 UTC
Of course if you modify files in place you should expect undefined behaviour.  But that's not the same as segfaulting, which is what the OP describes!
Comment 3 Joe Orton 2009-02-12 04:46:45 UTC
Well, shall we play component ping-pong?  Here is my shocking thesis:

If you mmap a file, truncate it, then try to read from a portion of the mapped memory beyond the offset to which the file was truncated, then the kernel will SIGBUS or SIGSEGV the process.

This is:

a) documented behaviour, see mmap(2) on any Unix, the mmap definition in POSIX, or any book on Unix e.g. Stevens, and
b) observable behaviour, see comment 1.

and therefore, it is expected behaviour, and should be documented.
Comment 4 Dan Poirier 2009-02-12 05:38:54 UTC
Created attachment 23252 [details]
Patch for documentation to clarify that the crash problem isn't dependent on NFS file systems

I agree, if the file is modified while we're trying to serve it, there's not much we can do for the current client.

Here's a patch to remove the mention of NFS from the doc, so the warning applies regardless of file system type.
Comment 5 Dan Poirier 2009-02-17 06:49:53 UTC
Does the proposed doc change seem reasonable?  I wish we could avoid the seg fault, but it doesn't appear that we have that option.  At least we can make the warning more comprehensive.
Comment 6 Greg Ames 2009-03-31 08:00:04 UTC
I believe this and other bugs we've seen with truncated files could be solved by switching from stat() followed by open() to open() followed by fstat().  That lets us work with the same version of the file throughout a request, and makes it easy to detect a zero length file and abort before any harm is done.  It also saves the kernel from having to do a path walk for the stat(), a potential performance/scalability improvement.

Unfortunately it's not clear how to implement this change in httpd because we use the finfo early and depend on the handlers to open the file.
Comment 7 Jeff Trawick 2009-03-31 09:22:00 UTC
I think it is helpful to chop this issue in two, by the mechanism used to update files under document root.

1) truncate old file, then write new contents (fopen() and Java equivalent do this, so many programs act this way)

this will segfault whether we stat() then open() or open() then fstat(), because there is always a race condition between files truncated and us touching the pages we've previously mapped

so short of passing control back to mainline from a signal handler, this simply isn't a valid way to update served files; and while mmap is (hopefully) the only write mechanism that segfaults, the others still fail (e.g., httpd will hit EOF on sendfile before it expects)

2) put new contents into new/temporary file then rename/unlink

there is a timing window that would be fixed by using open()+fstat(), as Greg suggests; this timing window is very narrow compared to the truncate issue, so we don't hear about it much

--/--

So:

Users simply can't update served files using a mechanism that truncates, or bogosity will occur (hopefully the mmap segfault is by far the worst remaining bogosity).

(I hope your mileage doesn't vary; corrections most appreciated.)
Comment 8 Dan Poirier 2009-08-25 06:24:22 UTC
Updated warning in trunk doc.

Revision: 807609
Comment 9 Dan Poirier 2009-09-14 06:18:56 UTC
Backported doc change to 2.2.x in r814629.