Summary: | segfault in apr_brigade_cleanup() under high load with slow clients | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Tabby <s.tabacchiera> |
Component: | mpm_worker | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED FIXED | ||
Severity: | critical | CC: | s.tabacchiera, voyageursp, ylavic.dev |
Priority: | P2 | ||
Version: | 2.4.16 | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: |
cleanup mod_proxy_wstunnel's backend brigades
Cleanup mod_proxy_wstunnel's backend brigades |
Description
Tabby
2015-10-16 20:23:09 UTC
*** Bug 58502 has been marked as a duplicate of this bug. *** Created attachment 33184 [details]
cleanup mod_proxy_wstunnel's backend brigades
Can you please try this patch?
Created attachment 33185 [details]
Cleanup mod_proxy_wstunnel's backend brigades
Also cleans up EPIPE case.
Hi and thanks for your quick feedback, Yann! I applied your patch and did some quick load tests and all seems to work smoothly. Next, I'm trying it in the real scenario and let you know. Finger crossed, for now. ST After 2 days, no more segfaults. Stil heavily testing, but I'm pretty confident the issue has gone. ST While it is nice that the patch works for you I suspect that it hides something else. Can you please do a print e print *e in gdb in the thread that causes the segfault? Actually I think mod_proxy_wstunnel falls into the same pitfall mod_proxy_http was in and it needs to do the same / something similar then mod_proxy_http.c with proxy_buckets_lifetime_transform Just have no further time investigating right now. (In reply to Ruediger Pluem from comment #7) > Actually I think mod_proxy_wstunnel falls into the same pitfall > mod_proxy_http was in and it needs to do the same / something similar then > mod_proxy_http.c with > > proxy_buckets_lifetime_transform Yes I agree, just proposed a quick patch to determine whether it came from mod_proxy_wstunnel or some failure in the core deferred/pipelined write logic... We need to either use lifetime_transform like in proxy_http, or I was thinking of modifying all the input filters that create their buckets on f->r/c's pool/bucket_alloc so that they now use their given bb->p and bb->bucket_alloc. By doing the latter, we wouldn't have to transform the lifetime, it would be determined by the caller... Discussion moved to the dev@ mailing-list, will report here afterward. (In reply to Ruediger Pluem from comment #6) > While it is nice that the patch works for you I suspect that it hides > something else. Can you please do a > > print e > print *e > > in gdb in the thread that causes the segfault? Sorry, I've deleted the old core. Consider this new one, with ssl enabled: #0 0x00007eff54952e17 in apr_brigade_cleanup (data=0x7eff2c01c2c8) at buckets/apr_brigade.c:44 #1 0x00007eff5450caee in run_cleanups (pool=0x7eff2c01a2b8) at memory/unix/apr_pools.c:2352 #2 apr_pool_destroy (pool=0x7eff2c01a2b8) at memory/unix/apr_pools.c:814 #3 0x00007eff54952e2c in apr_brigade_cleanup (data=0x7eff2c0062a8) at buckets/apr_brigade.c:44 #4 0x00007eff5165db0e in ssl_io_filter_output (f=0x7eff480567f8, bb=0x7eff2c0062a8) at ssl_engine_io.c:1612 #5 0x00007eff5165e125 in ssl_io_filter_coalesce (f=0x7eff480567d0, bb=0x7eff2c0062a8) at ssl_engine_io.c:1599 #6 0x0000000000463fa3 in ap_process_request_after_handler (r=<value optimized out>) at http_request.c:267 #7 0x00000000004645af in ap_process_request (r=0x7eff2c01a330) at http_request.c:373 #8 0x00000000004608ee in ap_process_http_sync_connection (c=0x7eff48056388) at http_core.c:210 #9 ap_process_http_connection (c=0x7eff48056388) at http_core.c:251 #10 0x0000000000457fc0 in ap_run_process_connection (c=0x7eff48056388) at connection.c:41 #11 0x000000000046b468 in process_socket (thd=0xfa9928, dummy=<value optimized out>) at worker.c:631 #12 worker_thread (thd=0xfa9928, dummy=<value optimized out>) at worker.c:990 #13 0x0000003f3a807a51 in start_thread () from /lib64/libpthread.so.0 #14 0x0000003f3a0e893d in clone () from /lib64/libc.so.6 (gdb) p e $1 = (apr_bucket *) 0x7efee4018348 (gdb) p *e $2 = {link = {next = 0x2c6c6c756e3a226e, prev = 0x4974616d726f6622}, type = 0x4f444e49223a2264, length = 3180190329187026254, start = 6229711307259471394, data = 0x2d223a2265756c61, free = 0x7b2c7d2235322e31, list = 0x6e6f697372657622} (In reply to Tabby from comment #10) > (In reply to Ruediger Pluem from comment #6) > > While it is nice that the patch works for you I suspect that it hides > > something else. Can you please do a > > > > print e > > print *e > > > > in gdb in the thread that causes the segfault? > > Sorry, I've deleted the old core. > > > (gdb) p e > $1 = (apr_bucket *) 0x7efee4018348 > > (gdb) p *e > $2 = {link = {next = 0x2c6c6c756e3a226e, prev = 0x4974616d726f6622}, type = > 0x4f444e49223a2264, length = 3180190329187026254, start = > 6229711307259471394, > data = 0x2d223a2265756c61, free = 0x7b2c7d2235322e31, list = > 0x6e6f697372657622} Thanks. What does print *(e->type) x/10x 0x2c6c6c756e3a226e, x/10x 0x4974616d726f6622 x/10x 0x7b2c7d2235322e31 dump_brigade b dump_bucket e deliver? Keep in mind that you need to have the macros .gdbinit (top level of source directory) loaded for the dump_* commands to work. (gdb) p *(e->type) Cannot access memory at address 0x4f444e49223a2264 (gdb) x/10x 0x2c6c6c756e3a226e 0x2c6c6c756e3a226e: Cannot access memory at address 0x2c6c6c756e3a226e (gdb) x/10x 0x4974616d726f6622 0x4974616d726f6622: Cannot access memory at address 0x4974616d726f6622 (gdb) x/10x 0x7b2c7d2235322e31 0x7b2c7d2235322e31: Cannot access memory at address 0x7b2c7d2235322e31 (gdb) dump_brigade b dump of brigade 0x7eff2c01c2c8 | type (address) | length | data addr | contents | rc -------------------------------------------------------------------------------- 0 | Cannot access memory at address 0x4f444e49223a2264 (gdb) dump_bucket e bucket=Cannot access memory at address 0x4f444e49223a2264 After 1 month I can confirm the issue is gone. Will this patch be included in the upcoming release? Thanks ST Hi there, any news about this patch? It was not included in 2.4.18, so I'm wondering if/when will it ever be. Thanks! ST Thanks ST for this report and follow up, a different fix was backported to upcoming 2.4.19 in r1732266, so I'm closing here. If you manage to test the final fix in your enviroment and have any issue with it, please reopen. (In reply to Yann Ylavic from comment #15) > Thanks ST for this report and follow up, a different fix was backported to > upcoming 2.4.19 in r1732266, so I'm closing here. > > If you manage to test the final fix in your enviroment and have any issue > with it, please reopen. Here is the link of the patch (all in one) which was backported (and hence could be applied to 2.4.18): http://home.apache.org/~rpluem/patches/proxy_correctlifetime.diff |