Summary: | Segment fault in brigade_consume caused by header file (APR_RING/APR_BRIGADE)/GCC optimization confusion - workaround is using gcc 4.5.1 | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Joel <j-comm> |
Component: | mod_ssl | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED DUPLICATE | ||
Severity: | major | CC: | j-comm, silversens |
Priority: | P2 | ||
Version: | 2.2.17 | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux |
Description
Joel
2010-12-24 14:53:12 UTC
Marked P1 because this just stops everything - no SSL capability on site at all. Here is extra info. Note that "bb->list.next->type" is really bogus. It has a garbage name, garbage name_func, is_metadata is a mess, and the 'read' function is the value '0x58'. It looks like the data stored here makes no sense at all, and whatever caused that is the core problem. This is in brigade_consume (gdb) print *b $4 = {link = {next = 0x8541bf8, prev = 0x85490c4}, type = 0x8541ad0, length = 139759840, start = -5190357751035555528, data = 0x808ca4c, free = 0x853d7e8, list = 0x854915c} (gdb) print *(b->type) $5 = {name = 0x853b7e0 "\250\020'\310\372S\370\361\063\254\020'\b\271S\b", num_func = 139704152, is_metadata = 139729632, destroy = 0x8541ab8, read = 0x58, setaside = 0x8541ad0, split = 0x8541ab8, copy = 0} (gdb) print bb $6 = (apr_bucket_brigade *) 0x85490c0 (gdb) print *bb $7 = {p = 0x853d7e8, list = {next = 0x854913c, prev = 0x8541af0}, bucket_alloc = 0x8541ad0} (gdb) print *(bb->list.next) $8 = {link = {next = 0x8541bf8, prev = 0x85490c4}, type = 0x8541ad0, length = 139759840, start = -5190357751035555528, data = 0x808ca4c, free = 0x853d7e8, list = 0x854915c} (gdb) print *(bb->list.next->type) $9 = {name = 0x853b7e0 "\250\020'\310\372S\370\361\063\254\020'\b\271S\b", num_func = 139704152, is_metadata = 139729632, destroy = 0x8541ab8, read = 0x58, setaside = 0x8541ad0, split = 0x8541ab8, copy = 0} (gdb) print *(bb->list.prev->type) $10 = {name = 0xb7f94f40 "HEAP", num_func = 5, is_metadata = APR_BUCKET_DATA, destroy = 0xb7f817a0 <heap_bucket_destroy>, read = 0xb7f81780 <heap_bucket_read>, setaside = 0x808c96c <apr_bucket_setaside_noop@plt>, split = 0x808ca8c <apr_bucket_shared_split@plt>, copy = 0x808ce7c <apr_bucket_shared_copy@plt>} (gdb) up #2 bio_filter_in_read (bio=0x853f968, in=0x85515de "", inlen=79) at ssl_engine_io.c:534 534 inctx->rc = brigade_consume(inctx->bb, block, in, &inl); (gdb) I was able to work around this by: 1) Reverting to GCC 4.5.1 2) Rebuilding the entire 'webserver' toolchain (PHP, OpenSSL, HTTPD, APR, APR-util, etc.) compiling with "-O0". Yes, I did an experiment by changing two things, but I needed to get the server up and running. In a few days, I will try to build with optimizations back on and GCC 4.5.1, meaning the only difference will be the compiler variant. My guess is that this has uncovered an optimization bug in GCC 4.5.2. I have no idea exactly WHERE the failure is in the compiler, so I have NO IDEA how to report this problem. Nor do I know if the compiler is ok, and its discovering a sloppy piece of code in the 'webserver toolchain' that was wrong, but wasn't creating an 'evident' problem before. I just know that as more people move to GCC 4.5.2, they will hit this. How to best report this to the GCC folks? Help on how to proceed, please! :D I've verified this as definitely a problem going from 4.5.1 to 4.5.2, as I returned all the optimization levels back to their defaults, and the bug happens with 4.5.2 compilation, not 4.5.1. Not sure if this is in OpenSSL or in HTTPD, but either way it's not Apache's fault (or the OpenSSL project). I am working to narrow this down to submit to the GCC team. (In reply to comment #4) > I've verified this as definitely a problem going from 4.5.1 to 4.5.2, as I > returned all the optimization levels back to their defaults, and the bug > happens with 4.5.2 compilation, not 4.5.1. Can you please try if adding -fno-strict-aliasing to the CFLAGS fixes the problem even with 4.5.2 and optimization? If yes, this may be the same as bug 50190 This problem is indeed solved by adding -fno-strict-aliasing to CFLAGS. (In reply to comment #5) > (In reply to comment #4) > > I've verified this as definitely a problem going from 4.5.1 to 4.5.2, as I > > returned all the optimization levels back to their defaults, and the bug > > happens with 4.5.2 compilation, not 4.5.1. > > Can you please try if adding -fno-strict-aliasing to the CFLAGS fixes the > problem even with 4.5.2 and optimization? If yes, this may be the same as bug > 50190 I tried this, and some web pages now worked, but later, I discovered others did not. (In reply to comment #5) > (In reply to comment #4) > > I've verified this as definitely a problem going from 4.5.1 to 4.5.2, as I > > returned all the optimization levels back to their defaults, and the bug > > happens with 4.5.2 compilation, not 4.5.1. > > Can you please try if adding -fno-strict-aliasing to the CFLAGS fixes the > problem even with 4.5.2 and optimization? If yes, this may be the same as bug > 50190 *** Bug 50564 has been marked as a duplicate of this bug. *** (In reply to comment #8) > *** Bug 50564 has been marked as a duplicate of this bug. *** Downgrading to 4.5.1 fixed the problem for me too. *** This bug has been marked as a duplicate of bug 50190 *** |