Bug 64234

Summary: Fix random memory-corruption in proxy error-reading-response path for http2 worker setup
Product: Apache httpd-2 Reporter: Barnim Dzwillo <dzwillo>
Component: mod_proxyAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED FIXED    
Severity: normal Keywords: FixedInTrunk, PatchAvailable
Priority: P2    
Version: 2.4.41   
Target Milestone: ---   
Hardware: HP   
OS: Linux   
Attachments: Bugfix for above mod_proxy_http issue

Description Barnim Dzwillo 2020-03-17 16:14:55 UTC
Created attachment 37102 [details]
Bugfix for above mod_proxy_http issue

Hello,

in our company we run a busy httpd service with http2-connections to the client 
side and http-proxy-connections to the backend side. The httpd is configured 
with 64 worker-threads per process, and the machines use redhat7 linux.

Without the attached patch, we saw random memory corruptions, mostly related to
bucket-structs which got overridden with other data or strings.

- this often lead to busy threads, because one of the many brigade-loops in the 
  apache could not find its sentinel and looped forever.

- in other cases this simply segfaulted an apache-thread or triggered one of the 
  assertions like 'assert(b->length != (apr_size_t)-1)' in modules/http2/h2_stream.c:add_buffered_data(),
  where a bucket_type_socket appeared in a brigade where it did not belong.

What happens here?
- the proxy-http-thread runs in a different thread than the http2-server-side.

- in mod_proxy_http.c:ap_proxy_http_process_response() the response-data which is
  read from the backend-proxy-connection is stored in the temporary brigade 'bb'.
  In the normal datapath the reference-count of the heap-buckets in 'bb' is first
  increased, before the transformed brigade 'pass_bb' is passed to the output-filters
  of the http2-client-connection.

  mod_proxy_http.c:ap_proxy_http_process_response() the response-data which is
  -> rv = ap_get_brigade(backend->r->input_filters, bb, AP_MODE_READBYTES, ...);
     -> ap_proxy_buckets_lifetime_transform(r, bb, pass_bb);
        -> apr_bucket_read(e, &data, &bytes, APR_BLOCK_READ);
           -> increases refcount on HEAP-buckets
           new = apr_bucket_transient_create(data, bytes, bucket_alloc);
           -> stores buffer-pointer in simple 'transient' bucket
           APR_BRIGADE_INSERT_TAIL(to, new);
        ap_pass_brigade(r->output_filters, pass_bb);
        -> modules/http2/h2_task:h2_filter_slave_output()->slave_out()->send_out():
           -> h2_beam_send()->append_bucket()
              -> creates beam-bucket around HEAP-bucket and copies it into
                 beam->send_list without an extra refcount.
        apr_brigade_cleanup(pass_bb);
        apr_brigade_cleanup(bb);
        -> all HEAP-buckets with refcount 1 were freed() here
 
  The http2 module later works from another thread on these buckets and frees
  them up when finished. This is ok as long as the refcount on the HEAP-buckets
  was increased correctly. a.e:  
  -> h2_session.c:stream_data_cb()
     -> h2_stream_out_prepare()
        -> h2_beam_receive()
           -> while(bsender = H2_BLIST_FIRST(&beam->send_list))
                APR_BUCKET_REMOVE(bsender);
                [...]

- In the 'error-reading-response' path in ap_proxy_http_process_response() the 
  reference count of the heap-buckets in 'bb' is not increased, but the full brigade 
  is still passed to the http2-thread - with the impact of write-after-free operations.
  If the memory of these heap-buckets was already allocated again, a corruption happens.

I hope this will help.

Greetings,
Barnim
Comment 1 Ruediger Pluem 2020-03-17 19:49:08 UTC
Thanks for the patch an the analysis. Two comments:

1. We need to have a patch against trunk. This will later be backported to 2.4.x. I already saw that both code bases already differ in that area.
2. I cannot think of anything useful that can be contained in bb in case of an error before we add the the error bucket and the eoc bucket. So I think the better fix would be the following (against trunk):

Index: mod_proxy_http.c
===================================================================
--- mod_proxy_http.c	(revision 1875177)
+++ mod_proxy_http.c	(working copy)
@@ -1787,6 +1787,7 @@
                          * through a response, our only option is to
                          * disconnect the client too.
                          */
+                        apr_brigade_cleanup(bb);
                         e = ap_bucket_error_create(HTTP_GATEWAY_TIME_OUT, NULL,
                                 r->pool, c->bucket_alloc);
                         APR_BRIGADE_INSERT_TAIL(bb, e);
Comment 2 Ruediger Pluem 2020-03-17 19:52:08 UTC
Same for 2.4.x:

Index: mod_proxy_http.c
===================================================================
--- mod_proxy_http.c	(revision 1875177)
+++ mod_proxy_http.c	(working copy)
@@ -1764,6 +1764,7 @@
                          */
                         ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, r, APLOGNO(01110)
                                       "error reading response");
+                        apr_brigade_cleanup(bb);
                         ap_proxy_backend_broke(r, bb);
                         ap_pass_brigade(r->output_filters, bb);
                         backend_broke = 1;


Can you please check if this patch fixes your issue as well?
Comment 3 Barnim Dzwillo 2020-03-18 09:59:51 UTC
Yes, this patch also seems to fix the issue.

I tested the patch on one of over servers last night, and the error did not trigger in the last 12 hours.

Greetings,
Barnim
Comment 4 Ruediger Pluem 2020-03-18 10:54:12 UTC
(In reply to Barnim Dzwillo from comment #3)
> Yes, this patch also seems to fix the issue.
> 
> I tested the patch on one of over servers last night, and the error did not
> trigger in the last 12 hours.
> 
> Greetings,
> Barnim

Thanks for feedback. Committed to trunk as r1875353.
Comment 5 Yann Ylavic 2021-03-09 17:27:28 UTC
Backported to 2.4.47 in r1887378.