39605 – core_output_filter sends misaligned chunks after timeout

Bug 39605 - core_output_filter sends misaligned chunks after timeout

Summary: core_output_filter sends misaligned chunks after timeout

Status:	RESOLVED FIXED

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	Core (show other bugs)
Version:	2.2.2
Hardware:	Other other

Importance:	P2 normal (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-05-18 07:20 UTC by Scott Lamb
Modified:	2009-07-06 05:31 UTC (History)
CC List:	1 user (show)

Attachments
Fix for httpd-2.2.x branch (614 bytes, patch) 2006-05-18 07:22 UTC, Scott Lamb	Details \| Diff
Patch to mod_dav_svn to avoid this problem (512 bytes, patch) 2006-05-18 21:43 UTC, Garrett Rooney	Details \| Diff
Patch to mod_dav_svn to avoid this problem (570 bytes, patch) 2006-05-18 21:45 UTC, Garrett Rooney	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Scott Lamb 2006-05-18 07:20:56 UTC

Affects at least versions 2.0.52 and 2.2.x. Not sure about 2.3.x; it's pretty different.

I ran into bug 34332 with httpd 2.0.52 and apr 0.9.6. It was frustrating to diagnose because Apache 
just skipped the end of one chunk and went on to the next. My Subversion client said this:

    $ svn co http://example.com/svn/foo
    ...
    svn: REPORT request failed on '/svn/!svn/vcc/default'
    svn: REPORT of '/svn/!svn/vcc/default': Chunk delimiter was invalid (http://example.com)

Although bug 34332 has been fixed in APR, Apache's behavior still confounds diagnosis of such 
conditions. It continues to use the connection if a write has returned EAGAIN. In core_output_filter it 
logs this:

            ap_log_error(APLOG_MARK, APLOG_INFO, rv, c->base_server,
                         "core_output_filter: writing data to the network");

but that's a priority below the default LogLevel, so I didn't see it. This bit here:

            /* No need to check for SUCCESS, we did that above. */
            if (!APR_STATUS_IS_EAGAIN(rv)) {
                c->aborted = 1;
            }

            /* The client has aborted, but the request was successful. We
             * will report success, and leave it to the access and error
             * logs to note that the connection was aborted.
             */

is broken. The status should never be EAGAIN here, and if it is, aborting the connection would be better 
than ignoring it. The conditional just makes the comment wrong.

Comment 1 Scott Lamb 2006-05-18 07:22:19 UTC

Created attachment 18310 [details]
Fix for httpd-2.2.x branch

Comment 2 Scott Lamb 2006-05-18 15:53:36 UTC

Tested it this morning; my fix was not enough.

Stepping back a second, I see this on an strace of brokenness:

here it's trying to send a 130203-byte chunk. Only 27487 bytes get sent out, leaving 102716.

    writev(12, [{"1fc9b\r\n", 7}, {"5xwAAIagAJRlhdwPgIoCRoJ0lkWDEICH"..., 3611}, 
{"QtsAfKloOGu5EQW50evUC2wwSilrLzKE"..., 126592}, {"\r\n", 2}], 4) = 27494

Here it succeeds in sending 66608 bytes, leaving 36108.

    poll([{fd=12, events=POLLOUT, revents=POLLOUT}], 1, 120000) = 1
    writev(12, [{"IEBwkBEAAABQMCAwMIBwUJAgMEBREGBw"..., 102716}, {"\r\n", 2}], 2) = 66608

Here it succeeds in sending 33304 bytes, leaving 2804.

    poll([{fd=12, events=POLLOUT, revents=POLLOUT}], 1, 120000) = 1
    writev(12, [{"ePFoc8+VaNcmDB5ycecSbTE9/Ps2R7ch"..., 36108}, {"\r\n", 2}], 2) = 33304

...next write times out:

    poll([{fd=12, events=POLLOUT}], 1, 120000) = 0
    [...some reads from another descriptor...]

And here it goes on to the next chunk, completely forgetting about those 2804 bytes.

    writev(12, [{"12e8e\r\n", 7}, {"AACD4lSOD4OxYICVU1GVF41QlFGFRZRl"..., 2452}, {"hDB/
jOkpPxHqDQJh4NcLNujizjHb8wSl"..., 75002}, {"\r\n", 2}], 4) = 10136

With the patch below and another one that makes writev_it_all log everything (different run), I see this:

[Thu May 18 08:32:42 2006] [error] writev_it_all 130212 bytes: take 1: apr_sendv 130212 bytes -> 
27494
[Thu May 18 08:32:42 2006] [error] writev_it_all 130212 bytes: take 2: apr_sendv 102718 bytes -> 
66608
[Thu May 18 08:32:42 2006] [error] writev_it_all 130212 bytes: take 3: apr_sendv 36110 bytes -> 
33304
[Thu May 18 08:34:42 2006] [error] (70007)The timeout specified has expired: writev_it_all 130212 
bytes: take 4: apr_sendv 2806 bytes -> 0
[Thu May 18 08:34:42 2006] [info] (70007)The timeout specified has expired: core_output_filter: 
writing data to the network
[Thu May 18 08:34:42 2006] [error] writev_it_all 77463 bytes: take 1: apr_sendv 77463 bytes -> 10136

There are two problems here:

(1) APR_TIMEUP; our network's pretty messed up right now so this isn't that surprising.

(2) core_output_filter sees the error and gives up on data it has to send without aborting the 
connection, causing further chunks to be messed up and diagnosis to be frustrating.

I opened this bug for #2. Apparently setting c->aborted isn't enough, though it looks like a bunch of 
callers check it before writing stuff. I'll look around a bit more.

Comment 3 Ruediger Pluem 2006-05-18 20:26:34 UTC

If c->aborted gets set unconditionally as your patch proposes and things still
do not work I think this is a bug of the caller (seems to be subversion in your
case) as it should not call ap_pass_brigade again if c->aborted is set. But
maybe the following patch is a useful sanity check to ensure that no further
data gets sent:

Index: core_filters.c
===================================================================
--- core_filters.c      (Revision 406722)
+++ core_filters.c      (Arbeitskopie)
@@ -542,6 +542,20 @@
     apr_read_type_e eblock = APR_NONBLOCK_READ;
     apr_pool_t *input_pool = b->p;

+    /*
+     * Maybe we have been called by a buggy part in the lower filter chain
+     * or a buggy handler on a connection that has been already aborted.
+     * Just destroy the brigade and return with APR_SUCCESS.
+     */
+    if (c->aborted) {
+        ap_log_cerror(APLOG_MARK, APLOG_ERR, rv, c, "core_output_filter:"
+                      " Trying to write on a connection aborted by the"
+                      " client.");
+        if (b)
+            apr_brigade_destroy(b);
+        return APR_SUCCESS;
+    }
+
     if (ctx == NULL) {
         ctx = apr_pcalloc(c->pool, sizeof(*ctx));
         net->out_ctx = ctx;

Comment 4 Scott Lamb 2006-05-18 21:29:25 UTC

Yeah, that's helpful. I'm totally in favor of anything that makes stuff easier to diagnose.

Looks like the call stack is this:

window_handler
* svn_stream_write
** brigade_write_fn
*** apr_brigade_write
**** ap_filter_flush
***** ap_pass_brigade
****** ap_byterange_filter
******* ap_pass_brigade
******** ap_content_length_filter
********* ap_pass_brigade
********** chunk_filter
*********** ap_pass_brigade
************ core_output_filter <- sets aborted
...
** svn_stream_write again
...
************ core_output_filter again <- errors

Subversion's only use of aborted seems to be this in dav_svn__send_xml:

  /* ### check for an aborted connection, since the brigade functions
     don't appear to be return useful errors when the connection is
     dropped. */
  if (output->c->aborted)
    return svn_error_create(SVN_ERR_APMOD_CONNECTION_ABORTED, 0, NULL);

If the contract on brigade writing is that the caller should check c->aborted before and after, then 
perhaps this should be moved into brigade_write_fn. I will take it up on the Subversion mailing list and 
see what happens.

Comment 5 Garrett Rooney 2006-05-18 21:43:14 UTC

Created attachment 18319 [details]
Patch to mod_dav_svn to avoid this problem

Could you try this patch to mod_dav_svn and see if it fixes the problem?

Comment 6 Garrett Rooney 2006-05-18 21:45:45 UTC

Created attachment 18320 [details]
Patch to mod_dav_svn to avoid this problem

Oops, here's one that'll probably compile, and use the correct error
constant...

Comment 7 Scott Lamb 2006-05-18 22:18:37 UTC

With the mod_dav_svn patch, I end up in core_output_filter on an aborted connection with this call 
stack:

#3  0x00c526a8 in core_output_filter (f=0x9f97080, b=0xa013bb0) at /usr/src/debug/httpd-2.0.52/
server/core.c:3966
#4  0x00c49009 in ap_pass_brigade (next=0x9f97080, bb=0xa013bb0) at /usr/src/debug/
httpd-2.0.52/server/util_filter.c:511
#5  0x00c33423 in chunk_filter (f=0xa013ab0, b=0xa013bb0) at /usr/src/debug/httpd-2.0.52/
modules/http/http_core.c:217
#6  0x00c49009 in ap_pass_brigade (next=0xa013ab0, bb=0xa013bb0) at /usr/src/debug/
httpd-2.0.52/server/util_filter.c:511
#7  0x00c4bae5 in ap_content_length_filter (f=0x9fcce48, b=0xa013bb0) at /usr/src/debug/
httpd-2.0.52/server/protocol.c:1241
#8  0x00c49009 in ap_pass_brigade (next=0x9fcce48, bb=0xa013bb0) at /usr/src/debug/
httpd-2.0.52/server/util_filter.c:511
#9  0x00c4b640 in end_output_stream (r=0x9fcc078) at /usr/src/debug/httpd-2.0.52/server/
protocol.c:1032
#10 0x00c387de in ap_process_request (r=0x9fcc078) at /usr/src/debug/httpd-2.0.52/modules/http/
http_request.c:256
#11 0x00c3363f in ap_process_http_connection (c=0x9f96b80) at /usr/src/debug/httpd-2.0.52/
modules/http/http_core.c:250
#12 0x00c46349 in ap_run_process_connection (c=0x9f96b80) at /usr/src/debug/httpd-2.0.52/
server/connection.c:42
#13 0x00c466b4 in ap_process_connection (c=0x9f96b80, csd=0x0) at /usr/src/debug/httpd-2.0.52/
server/connection.c:175
#14 0x00c39901 in child_main (child_num_arg=Variable "child_num_arg" is not available.
) at /usr/src/debug/httpd-2.0.52/server/mpm/prefork/prefork.c:609
#15 0x00c39af8 in make_child (s=0x9eb8e88, slot=0) at /usr/src/debug/httpd-2.0.52/server/mpm/
prefork/prefork.c:649
#16 0x00c39c17 in startup_children (number_to_start=8) at /usr/src/debug/httpd-2.0.52/server/
mpm/prefork/prefork.c:721
#17 0x00c3a4db in ap_mpm_run (_pconf=0x9eb70a8, plog=0x9ee3158, s=0x9eb8e88) at /usr/src/
debug/httpd-2.0.52/server/mpm/prefork/prefork.c:940
#18 0x00c4134a in main (argc=2, argv=0xbfff3694) at /usr/src/debug/httpd-2.0.52/server/main.c:
618

which I guess means Subversion isn't the only such caller.

Is it just me or is checking c->aborted all the time a pretty awkward thing to expect callers to do? I'm 
not familiar enough with Apache's innards to know what returning failures from core_output_filter 
would do, but it does seem more inline with callers' usual expectations.

Comment 8 Joe Orton 2006-05-19 12:31:00 UTC

Now that bug 31759 is fixed there should be no need for the core output filter
to be doing this trick of returning APR_SUCCESS on failure AFAIK.

It should now be safe and correct to always just return the error on that error
path (I don't know why the EAGAIN check would be needed there either); and also
add the check per comment 3, which can also just return APR_ECONNABORTED rather
than hiding the failure.

Comment 9 Mark Phippard 2008-07-29 12:53:38 UTC

What is the status of this bug and what are possible solutions?

We have Subversion users running Apache 2.2.8, with mod_dav_svn 1.5.0 that still get this error pretty regularly, so I assume there is still some kind of fix needed.

Comment 10 Joe Orton 2009-07-06 05:31:18 UTC

This should be fixed on the 2.2.x branch now as of r791454.