After load on our production mod_dav server increased, we saw sporadic non-functioning of DAV requests (PROPFIND especially), while GET requests would still work. Each DAV request would hereafter block a child. This would sometimes lead to effective DoS due to the continous stream of new incoming requests blocking more and more children. After long investigation of the problem, I am able to reproduce the bug, but only using low-level flowreplay (application layer tcpdump replay tool, part of tcpreplay). I think the bug is caused by a blocking poll()-call (similar to bug 22030) after a writev() call due to (missing TCP ACKs|some other network problem) The PROPFIND causing it requests a large amount of data, so poll() blocks several times [Timeout] sec. Before blocking, this child has allocated a filesystem lock 10:53:01.096934 [ffffe410] fcntl64(9, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 <0.000010> on /tmp/DavLock.dir, which is the file the other children (new DAV requests) require a lock for, too, hence blocking in the lock call: 10:53:01.370153 [ffffe410] fcntl64(9, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 <716.870032> The GET requests still work (when enough non-blocked children are still available) since they do not require the /tmp/DavLock.dir fs-lock. The part where the offending child is blocking poll() is straced here: [...] 10:53:01.216048 [ffffe410] open("/srv/www/webdav/..........", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014> 10:53:01.216220 [ffffe410] writev(8, [{"1f5e\r\n", 6}, {"<insert large amount of data here>"}, {"\r\n", 2}], 4) = 2487 <0.000011> 10:53:01.216489 [ffffe410] poll([{fd=8, events=POLLOUT}], 1, 300000) = 0 <299.956071> 10:58:01.172688 [ffffe410] open("/srv/www/webdav/..........", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000023> [...] 10:58:01.174614 [ffffe410] writev(8, [{"1f60\r\n", 6}, {"<insert large amount of data here>"}, {"\r\n", 2}], 4) = -1 EAGAIN <0.000013> 10:58:01.174957 [ffffe410] poll([{fd=8, events=POLLOUT}], 1, 300000) = 0 <299.844216> 11:03:01.019320 [ffffe410] open("/src/www/webdav/..........", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000015> [...] 11:03:01.021422 [ffffe410] writev(8, [{"1f67\r\n", 6}, {"<insert large amount of data here>"}, {"\r\n", 2}], 4) = -1 EAGAIN <0.000012> 11:03:01.021711 [ffffe410] poll([{fd=8, events=POLLOUT, revents=POLLERR|POLLHUP}], 1, 300000) = 1 <113.248743> 11:04:54.270509 [ffffe410] writev(8, [{"1f67\r\n", 6}, {"<insert large amount of data here>"}, {"\r\n", 2}], 4) = -1 EPIPE (Broken pipe) <0.000016> 11:04:54.270812 [ffffe410] --- SIGPIPE (Broken pipe) @ 0 (0) --- [...] After this last blocking poll(), all writev()s are repeated similar to the last one ("1f67"...) resulting in EPIPE and the server starts working again. A backtrace on this process during a blocking poll() gives: #0 0xffffe410 in ?? () #1 0xbfffc7d8 in ?? () #2 0x000493e0 in ?? () #3 0x00000001 in ?? () #4 0x404207bd in poll () from /lib/tls/libc.so.6 #5 0x402e9aad in apr_poll (aprset=0xbfffc860, num=1, nsds=0xbfffc85c, timeout=Variable "timeout" is not available. ) at poll.c:130 #6 0x402e9e9f in apr_wait_for_io_or_timeout (f=Variable "f" is not available. ) at waitio.c:54 #7 0x402dfbee in apr_socket_sendv (sock=0x81bdb80, vec=0xbfffea58, nvec=3, len=0xbfffc908) at sendrecv.c:208 #8 0x402dfcf6 in apr_sendv (sock=0x81bdb80, vec=0xbfffea58, nvec=3, len=0xbfffc908) at sendrecv.c:991 #9 0x0807850a in writev_it_all (s=0x81bdb80, vec=0x493e1, nvec=1, len=8038, nbytes=0xbfffc988) at core.c:2914 #10 0x08078fed in core_output_filter (f=0x81bdff8, b=0x81c7c58) at core.c:4247 #11 0x0807236b in ap_pass_brigade (next=0x493e1, bb=0x1) at util_filter.c:512 #12 0x08060ca8 in chunk_filter (f=0x81d9a90, b=0x81c7c58) at http_core.c:218 #13 0x0807236b in ap_pass_brigade (next=0x493e1, bb=0x1) at util_filter.c:512 #14 0x08075653 in ap_content_length_filter (f=0x81cdc60, b=0x81c7c58) at protocol.c:1232 #15 0x0807236b in ap_pass_brigade (next=0x493e1, bb=0x1) at util_filter.c:512 #16 0x4002f7bb in apr_brigade_write (b=0x81c7c58, flush=0x80723a0 <ap_filter_flush>, ctx=0x81cdc60, str=0x82580a8 "<lp1:getetag>\"2229f-1e5e-3f81de40\"</lp1:getetag>\n", nbyte=49) at apr_brigade.c:393 #17 0x4002fa28 in apr_brigade_puts (bb=0x81c7c58, flush=0x80723a0 <ap_filter_flush>, ctx=0x81cdc60, str=0x82580a8 "<lp1:getetag>\"2229f-1e5e-3f81de40\"</lp1:getetag>\n") at apr_brigade.c:569 #18 0x404c2be4 in dav_send_one_response (response=Variable "response" is not available. ) at mod_dav.c:464 #19 0x404c2d73 in dav_stream_response (wres=Variable "wres" is not available. ) at mod_dav.c:1100 #20 0x404c439a in dav_propfind_walker (wres=0xbfffeeb4, calltype=1) at mod_dav.c:1917 #21 0x404d7cb8 in dav_fs_walker (fsctx=Variable "fsctx" is not available. ) at repos.c:1485 #22 0x404d826e in dav_fs_internal_walk (params=Variable "params" is not available. ) at repos.c:1729 #23 0x404c7dd2 in dav_handler (r=0x81ccff0) at mod_dav.c:2045 #24 0x080674eb in ap_run_handler (r=0x81ccff0) at config.c:152 #25 0x08069f45 in ap_invoke_handler (r=0x81ccff0) at config.c:364 #26 0x080656ef in ap_process_request (r=0x81ccff0) at http_request.c:249 #27 0x08060e79 in ap_process_http_connection (c=0x81bdc58) at http_core.c:251 #28 0x080702fb in ap_run_process_connection (c=0x81bdc58) at connection.c:43 #29 0x080665f8 in child_main (child_num_arg=Variable "child_num_arg" is not available. ) at prefork.c:610 #30 0x08066781 in make_child (s=Variable "s" is not available. ) at prefork.c:704 #31 0x08066811 in startup_children (number_to_start=3) at prefork.c:722 #32 0x08066edd in ap_mpm_run (_pconf=0x809c0a8, plog=0x80c6150, s=0x809de58) at prefork.c:941 #33 0x0806bffa in main (argc=3, argv=0xbffff504) at main.c:618 Please let me know if you need more information. I am sorry not being able to provide an easy way to reproduce the bug, since it seems specific to network effects. I can provide the .pcap dumps and part of the data used in the DAV, or, alternatively, test the bug with patched httpd versions. Thank you for your efforts! Regards, Arne Voigtlaender System-Info: ------------ Tested on:Linux 2.6.7, 2.6.13 (Distro: SuSE SLES 9) httpd 2.0.49, 2.0.53, 2.0.54 Hardware: 2x Xeon 3.4GHz, 5GB RAM, 8GB HDD, 2x Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (IBM Blade) References: ----------- http://archives.free.net.ph/message/20050909.014740.64bfea69.en.html http://issues.apache.org/bugzilla/show_bug.cgi?id=22030
The bug is similar/identical to bug 34221.
Created attachment 16394 [details] stop sending for aborted connections The problem is that mod_dav never checks whether the connection has been aborted and continues blindly trying to send. Can you try this patch?
Thank you for your quick response! The patch does not fix the problem completely. As it seems, it does change the behaviour of the problem, reduces the timeout a bit by trying less writev()s, but still hangs around 600 seconds (Timeout 300) when trying to write data to the socket: [...] 19:02:19.105921 [ffffe410] writev(8, [{"1f6a\r\n", 6}, {"<insert data>"}, 76}, {"\r\n", 2}], 4) = 1777 <0.000012> 19:02:19.106186 [ffffe410] poll([{fd=8, events=POLLOUT, revents=POLLOUT}], 1, 300000) = 1 <0.010639> 19:02:19.116867 [ffffe410] writev(8, [{"<insert data>"}, , 76}, {"\r\n", 2}], 3) = 6273 <0.000013> 19:02:19.119128 [ffffe410] writev(8, [{"1f5e\r\n", 6}, {"<insert data>"}, 49}, {"\r\n", 2}], 4) = 2487 <0.000013> 19:02:19.119565 [ffffe410] poll([{fd=8, events=POLLOUT}], 1, 300000) = 0 <299.954921> [...] 19:07:22.418883 [ffffe410] writev(8, [{"1a7\r\n", 5}, {"<insert some data>"}, {"\r\n", 2}, {"0\r\n\r\n", 5}], 4) = -1 EAGAIN <0.0000 19:07:22.419221 [ffffe410] poll([{fd=8, events=POLLOUT}], 1, 300000) = 0 <299.955070> 19:12:22.374387 [ffffe410] write(7, "172.17.225.123 - - [13/Sep/2005:19:02:18 +0200] \"PROPFIND /.......", 160) = 160 <0.000021> 19:12:22.374512 [ffffe410] close(8) = 0 <0.000016> [...] (the stack backstrace remains as previously stated) I observed a correlation between number of CLOSE_WAIT connections and the behaviour described earlier. That is, once the poll() call locks, the CLOSE_WAIT connections accumulate up to a number of 131 (when MaxClients is set to 3). In the time period before this happens, the server can deal well with my test load (which is constant). But once it appears, CLOSE_WAIT connections add up. It looks like this is a deeper problem in apr_poll() or deeper, do you agree? Thank you again! Regards, Arne Voigtlaender
Actually it is weird that you see mod_dav taking an exclusive lock on the lockdb for a PROPFIND, if that's really what this is. ("p r->the_request" in gdb somewhere up the stack to confirm?) That should be a read lock AFAIK. 10:53:01.096934 [ffffe410] fcntl64(9, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 <0.000010>
It is a write-lock, but setting it "read-only" does not fix the problem, I already tried that. modules/dav/main/mod_dav.c:1300: /* open lock database, to report on supported lock properties */ /* ### should open read-only */ if ((err = dav_open_lockdb(r, 0, &lockdb)) != NULL) { modules/dav/main/mod_dav.c:2005: /* ### should open read-only */ if ((err = dav_open_lockdb(r, 0, &ctx.w.lockdb)) != NULL) {
I can confirm that it is in fact the PROPFIND: #14 0x08075653 in ap_content_length_filter (f=0x11107ef0, b=0x105884f0) at protocol.c:1232 1232 return ap_pass_brigade(f->next, b); (gdb) p r->the_request $1 = 0x11107f48 "PROPFIND /webdav/prod-umgebung/data/datenaustausch/beratung/ HTTP/1.1" The connection, which is being served by this process is: Proto Recv-Q Send-Q Local Address Foreign Address State tcp 1 20440 172.17.223.146:80 172.17.225.123:43681 CLOSE_WAIT
Well, if mod_dav is changed to only use a read lock on the database, then it should not block other children from trying to obtain read access and they should be able to process PROPFIND requests. What behaviour do you seen in that case? What system call do the other children block in? Can you attach the patch you used, for reference?
Created attachment 16402 [details] abort connection handling and readonly db-locking for mod_dav I tried this patch and the just the "16394: stop sending for aborted connections" patch by itself, both cases still excibit the described behaviour.
Regarding the other behaviour, I am still investigating that. There are more readlocks now, but it does acquire a writelock for the DavLockDB somewhere in the code, I'm trying to find the place right now. You are right, it decreases chances that the behaviour will reoccur as described, but for one, there are additional writelocks on the DB (which might not be necessary either) and also, shouldn't writev_it_all _not block_ no matter how much data you push in? As I see it, fixing mod_dav will reduce the chances that the problem will come about (ideally only blocking one child instead of all of them if no write locks are required), but do you think it is possible to stop this one child from blocking?
I just went through the code in mod_dav.c and checked the cases in which the lockdb would be opened read/write. The cases which I found for (MOVE|COPY|PUT|MKCOL) seem reasonable, so I checked my packet dump and it seems the problem is caused by: one PROPFIND (depth:1 causes a 13MB response), this response is written into a broken socket (since ACKs do not come in), => poll() locks up the rest of the children is then locked up in the write-lock request (as straced above) when doing MKCOL/PUTs.
OK, well that sounds like the expected behaviour. To mitigate this you can reduce the Timeout setting to e.g. 10 or 20 seconds. It would also be useful to add checking in more places (e.g. core_output_filter) for c->aborted to prevent repeated write timeouts. Also it's possible that using a different database library for the lockdb might help - you could expirement with using apr_dbm_open_ex instead of apr_dbm_open in dav/fs/lock.c and specifying the database type. But essentially this is a design limitation of the mod_dav_fs backend; everything has to serialize through a single lock database. Other mod_dav backends might not have this limitation, e.g. catacomb. (http://www.webdav.org/catacomb/)
Thank you so much for your help! I put a check for f->c->aborted in core.c:core_output_filter and thereby reduced the number of timed out writes to one. But the core issue still remains. I am sending this prepared PROPFIND which causes a large answer, and each request will just stop sending ACKs in the middle of the transaction. In about 1 in 10 cases (or so), the behaviour changes and the poll() locks, it is always with the same data to write, with the exact same packets being sent by the client. A "normal" situation looks like this: 20:52:09.407647 [ffffe410] writev(8, [...], 3) = 3686 <0.000013> 20:52:09.409004 [ffffe410] writev(8, [{"1f5e\r\n", 6}, ...], 4) = 5074 <0.000014> 20:52:09.409095 [ffffe410] poll([{fd=8, events=POLLOUT, revents=POLLERR|POLLHUP}], 1, 300000) = 1 <0.246947> 20:52:09.656085 [ffffe410] writev(8, [...], 3) = -1 EPIPE (Broken pipe) <0.000013> 20:52:09.656163 [ffffe410] --- SIGPIPE (Broken pipe) @ 0 (0) --- This I can strace all the time, but occasionally, it is this: 20:57:26.463493 [ffffe410] writev(8, [...], 4) = 1800 <0.000014> 20:57:26.463608 [ffffe410] poll([{fd=8, events=POLLOUT, revents=POLLOUT}], 1, 300000) = 1 <0.011306> 20:57:26.474956 [ffffe410] writev(8, [...], 3) = 6250 <0.000022> 20:57:26.476328 [ffffe410] writev(8, [{"1f5e\r\n", 6},...], 4) = 2510 <0.000012> 20:57:26.476424 [ffffe410] poll([{fd=8, events=POLLOUT}], 1, 300000) = 0 <299.851111> In the latter case, poll does not return the POLLERR. The connection is then in CLOSED_WAIT, after a while, it goes to LAST_ACK (while the send-Q is 20k). I read something about the behaviour on http://www.ussg.iu.edu/hypermail/linux/kernel/0407.3/0495.html but I'm not sure if this is intended in this case. Is it the kernel or apache that is loosing track of the FD? Why is the kernel not signalling the SIGPIPE / POLLERR?
Created attachment 16409 [details] core_output_filter patch that destroys bucket_brigades when connection is aborted here is the mentioned patch for core_output_filter
SIGPIPE (and I guess POLLERR) only happens if the client sends an RST and the connection is forcibly reset. I'm not sure what *should* happen at LAST_ACK or CLOSE_WAIT in a poll for POLLOUT. Maybe nothing until that last ACK is received. Maybe Jeff is listening, he's the one with the TCP state diagram above his desk ;) w.r.t to patch, there is some funky (and somewhat annoying) logic at the end of the core output filter which deliberately hides the write error from the caller so I'm not sure if this should be duplicated here and just return APR_SUCCESS instead.
Thanks for the remark on SIGPIPE (I know, I should have rtfm :) Anyway, when I block outgoing RST packets from my test machine so I can send my packet dump replay without interference from the host TCP/IP stack (since it uses a libndis userspace stack for replaying), the host stack does not reply with a RST to attempts from the server that is trying to push the data to the client [of the aborted replayed PROPFIND request]. That lead to the ability to reproduce the problem in all cases. Since your patch to mod_dav (check for aborted) and my analogous patch for core_output_filter reduce the number of writes to one, while the readonly DavLockDB patch further reduces the "problem" to serialized write-access, the remaining poll() call most likely conforms to the protocol. We are reducing the timeout and looking into the proposed alternative backends for mod_dav. Thank you very much for your help in pinning down the problem! What are the chances these patches (or similar, improved ones) make it into mainstream Apache?
The abort handling can be tracked by bug 39605 - the mod_dav database locking issue can be tracked here.
Hello, I'am use httpd-2.0.52 anda http2.0.59 with mod_dav, any news the fix bug? regards Hernani
Created attachment 21173 [details] changes dav_method_propfind to lockdb only for readonly changes dav_method_propfind to lockdb only for readonly, instead of exclusive locking (propfind should be readonly) in our production environment we still find a problem with locking the db exclusive when the method propfind is used: #0 0xffffe410 in __kernel_vsyscall () #1 0xb7cae443 in fcntl () from /lib/libpthread.so.0 #2 0xb7d3cbe9 in apr_file_lock (thefile=0x816e440, type=1) at flock.c:46 #3 0xb7d78bf1 in apr_sdbm_lock (db=0x816ef50, type=1) at sdbm_lock.c:50 #4 0xb7d789e4 in apr_sdbm_open (db=0xbfdc0c98, file=0x80eb380 "/tmp/DavLock", flags=1, perms=4095, p=0x814fb90) at sdbm.c:138 #5 0xb7d799d8 in vt_sdbm_open (pdb=0xbfdc0cf8, pathname=0x80eb380 "/tmp/DavLock", mode=1, perm=4095, pool=0x814fb90) at apr_dbm_sdbm.c:114 #6 0xb7d7929d in apr_dbm_open (pdb=0xbfdc0cf8, pathname=0x80eb380 "/tmp/DavLock", mode=1, perm=4095, pool=0x814fb90) at apr_dbm.c:86 #7 0xb7b10f66 in dav_dbm_open_direct (p=0x814fb90, pathname=0x80eb380 "/tmp/DavLock", ro=1, pdb=0x816e40c) at dbm.c:131 #8 0xb7b11540 in dav_fs_really_open_lockdb (lockdb=0x816e3f0) at lock.c:305 #9 0xb7b30bee in dav_get_resource_state (r=0x814fbc8, resource=0x816e3c8) at util_lock.c:715 #10 0xb7b2b844 in dav_handler (r=0x814fbc8) at mod_dav.c:904 #11 0x08068959 in ap_run_handler (r=0x814fbc8) at config.c:152 #12 0x0806b941 in ap_invoke_handler (r=0x814fbc8) at config.c:364 #13 0x08066428 in ap_process_request (r=0x814fbc8) at http_request.c:249 #14 0x08061138 in ap_process_http_connection (c=0x8143500) at http_core.c:253 #15 0x08072d29 in ap_run_process_connection (c=0x8143500) at connection.c:43 #16 0x0806731a in child_main (child_num_arg=<value optimized out>) at prefork.c:610 #17 0x080675a3 in make_child (s=0x80a1e88, slot=299) at prefork.c:704 #18 0x08067dd0 in ap_mpm_run (_pconf=0x80a00a8, plog=0x80cc158, s=0x80a1e88) at prefork.c:839 #19 0x0806e37e in main (argc=134865184, argv=0x812d280) at main.c:623 many processes get stuck as "SendingReply" because one process had locked exclusive the DavLockDB and has not released, and it leads to a DoS behave. This patch is Arne's simplified patch, just taking care about the DavLockDB. it was tested against apache 2.0.59 and 2.0.61, and it seems to affect 2.2.6 as well.
I can confirm this problem still exists in the 2.4.x releases and in 2.5/trunk (r1734315). A workaround is to use mod_buffer to buffer the entire PROPFIND response before sending it to the client: <If "toupper(%{REQUEST_METHOD}) == 'PROPFIND'" > SetOutputFilter BUFFER </If> Of course this increases memory use and request latency, but it does prevent a single client on a slow connection from locking up the entire DAV repository.
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd. As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd. If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question. If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with. Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.
Confirmed in 2.4.x in comment 20. So re-opening and updating the Version to 2.4-HEAD