Bug 50520

Summary: Segment fault in brigade_consume caused by header file (APR_RING/APR_BRIGADE)/GCC optimization confusion - workaround is using gcc 4.5.1
Product: Apache httpd-2 Reporter: Joel <j-comm>
Component: mod_sslAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED DUPLICATE    
Severity: major CC: j-comm, silversens
Priority: P2    
Version: 2.2.17   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Joel 2010-12-24 14:53:12 UTC
As mentioned in header:

glibc 2.12.2
OpenSSL: 1.0.0c
gcc 4.5.2
HTTPD: 2.2.17
APR: 1.4.2
APR-Util: 1.3.10

Trivially reproducable.

Please let me know if anything else is needed.
 
Stack trace:

(gdb) bt full
#0  0x00000058 in ?? ()
No symbol table info available.
#1  0x080c6a94 in brigade_consume (bio=0x853f938, in=0x85515ae "", inlen=79) at ssl_engine_io.c:419
        b = 0x854910c
        str = 0x806be8b "memmove"
        str_len = 3087003636
        consume = <value optimized out>
        actual = 0
        status = 0
#2  bio_filter_in_read (bio=0x853f938, in=0x85515ae "", inlen=79) at ssl_engine_io.c:534
        inl = 79
        inctx = 0x8547048
        block = APR_BLOCK_READ
#3  0x081122e4 in BIO_read ()
No symbol table info available.
#4  0x080f9a27 in ssl3_read_n ()
No symbol table info available.
#5  0x080fa72a in ssl3_read_bytes ()
No symbol table info available.
#6  0x080fbea4 in ssl3_get_message ()
No symbol table info available.
#7  0x080ec8c9 in ssl3_get_client_hello ()
No symbol table info available.
#8  0x080f0c39 in ssl3_accept ()
No symbol table info available.
#9  0x080e2acb in SSL_accept ()
No symbol table info available.
#10 0x080da461 in ssl23_get_client_hello ()
No symbol table info available.
#11 0x080da5ec in ssl23_accept ()
No symbol table info available.
#12 0x080e2acb in SSL_accept ()
No symbol table info available.
#13 0x080c5e8a in ssl_io_filter_connect (filter_ctx=0x853dea8) at ssl_engine_io.c:1111
        c = 0x853d990
        sslconn = 0x853de50
        sc = <value optimized out>
        cert = <value optimized out>
        n = <value optimized out>
        ssl_err = <value optimized out>
        verify_result = <value optimized out>
        server = 0x831fd38
#14 0x080c649f in ssl_io_filter_input (f=0x8549078, bb=0x854b010, mode=AP_MODE_GETLINE, block=APR_BLOCK_READ, readbytes=0)
    at ssl_engine_io.c:1357
        status = <value optimized out>
        inctx = 0x8547048
        len = 8192
        is_init = 0
#15 0x080959fb in ap_rgetline_core (s=0x854a0a8, n=8192, read=0xbffff3ac, r=0x854a090, fold=0, bb=0x854b010)
    at protocol.c:231
        rv = <value optimized out>
        e = <value optimized out>
        bytes_handled = 0
        current_alloc = 0
        pos = <value optimized out>
        last_char = 0x0
        do_alloc = 1
        saw_eos = 0
#16 0x080977d6 in read_request_line (conn=0x853d990) at protocol.c:596
        rv = <value optimized out>
        ll = <value optimized out>
        pro = <value optimized out>
        major = 1
        minor = 0
        http = "\350\363\377\277"
        len = 139712912
        num_blank_lines = 0
        max_blank_lines = 100
        uri = <value optimized out>
#17 ap_read_request (conn=0x853d990) at protocol.c:891
        r = 0x854a090
        p = 0x854a050
        expect = <value optimized out>
        access_status = <value optimized out>
        tmp_bb = 0x854b010
        csd = <value optimized out>
        cur_timeout = <value optimized out>
#18 0x081b7e35 in ap_process_http_connection (c=0x853d990) at http_core.c:183
        r = <value optimized out>
        csd = 0x0
#19 0x080aa876 in ap_run_process_connection (c=0x853d990) at connection.c:43
        pHook = <value optimized out>
        n = <value optimized out>
        rv = <value optimized out>
#20 0x081ed792 in child_main (child_num_arg=<value optimized out>) at prefork.c:662
        current_conn = <value optimized out>
        csd = 0x853d7f8
        ptrans = 0x853d7b8
        allocator = 0x853b728
        status = <value optimized out>
        i = <value optimized out>
        lr = <value optimized out>
        pollset = 0x853b858
        sbh = 0x853b850
        bucket_alloc = 0x8541aa0
        last_poll_idx = 1
#21 0x081eda9f in make_child (s=0x82758b0, slot=0) at prefork.c:707
        pid = <value optimized out>
#22 0x081ee2fc in ap_mpm_run (_pconf=0x82710a8, plog=0x82b71c0, s=0x82758b0) at prefork.c:983
        index = <value optimized out>
        remaining_children_to_start = <value optimized out>
        rv = <value optimized out>
#23 0x0808fb55 in main (argc=2, argv=0xbffff7d4) at main.c:739
        c = 88 'X'
        configtestonly = 0
        confname = 0x81f8267 "conf/httpd.conf"
        def_server_root = 0x81f8254 "/usr/local/apache2"
        temp_error_log = 0x0
        error = <value optimized out>
        process = 0x826f130
        server_conf = 0x82758b0
        pglobal = 0x826f0a0
        pconf = 0x82710a8
        plog = 0x82b71c0
        ptemp = 0x82790c8
        pcommands = 0x82730b0
        opt = 0x8273150
        rv = 0
        mod = <value optimized out>
        optarg = 0x0
        signal_server = <value optimized out>
Comment 1 Joel 2010-12-24 14:55:58 UTC
Marked P1 because this just stops everything - no SSL capability on site at all.
Comment 2 Joel 2010-12-24 15:30:08 UTC
Here is extra info. Note that "bb->list.next->type" is really bogus. It has a garbage name, garbage name_func, is_metadata is a mess, and the 'read' function is the value '0x58'. It looks like the data stored here makes no sense at all, and whatever caused that is the core problem.

This is in brigade_consume


(gdb) print *b
$4 = {link = {next = 0x8541bf8, prev = 0x85490c4}, type = 0x8541ad0, length = 139759840, start = -5190357751035555528,
  data = 0x808ca4c, free = 0x853d7e8, list = 0x854915c}
(gdb) print *(b->type)
$5 = {name = 0x853b7e0 "\250\020'\310\372S\370\361\063\254\020'\b\271S\b", num_func = 139704152, is_metadata = 139729632,
  destroy = 0x8541ab8, read = 0x58, setaside = 0x8541ad0, split = 0x8541ab8, copy = 0}
(gdb) print bb
$6 = (apr_bucket_brigade *) 0x85490c0
(gdb) print *bb
$7 = {p = 0x853d7e8, list = {next = 0x854913c, prev = 0x8541af0}, bucket_alloc = 0x8541ad0}
(gdb) print *(bb->list.next)
$8 = {link = {next = 0x8541bf8, prev = 0x85490c4}, type = 0x8541ad0, length = 139759840, start = -5190357751035555528,
  data = 0x808ca4c, free = 0x853d7e8, list = 0x854915c}
(gdb) print *(bb->list.next->type)
$9 = {name = 0x853b7e0 "\250\020'\310\372S\370\361\063\254\020'\b\271S\b", num_func = 139704152, is_metadata = 139729632,
  destroy = 0x8541ab8, read = 0x58, setaside = 0x8541ad0, split = 0x8541ab8, copy = 0}
(gdb) print *(bb->list.prev->type)
$10 = {name = 0xb7f94f40 "HEAP", num_func = 5, is_metadata = APR_BUCKET_DATA, destroy = 0xb7f817a0 <heap_bucket_destroy>,
  read = 0xb7f81780 <heap_bucket_read>, setaside = 0x808c96c <apr_bucket_setaside_noop@plt>,
  split = 0x808ca8c <apr_bucket_shared_split@plt>, copy = 0x808ce7c <apr_bucket_shared_copy@plt>}
(gdb) up
#2  bio_filter_in_read (bio=0x853f968, in=0x85515de "", inlen=79) at ssl_engine_io.c:534
534         inctx->rc = brigade_consume(inctx->bb, block, in, &inl);
(gdb)
Comment 3 Joel 2010-12-24 20:03:36 UTC
I was able to work around this by:

1) Reverting to GCC 4.5.1
2) Rebuilding the entire 'webserver' toolchain (PHP, OpenSSL, HTTPD, APR, APR-util, etc.) compiling with "-O0".

Yes, I did an experiment by changing two things, but I needed to get the server up and running.

In a few days, I will try to build with optimizations back on and GCC 4.5.1, meaning the only difference will be the compiler variant.

My guess is that this has uncovered an optimization bug in GCC 4.5.2.

I have no idea exactly WHERE the failure is in the compiler, so I have NO IDEA how to report this problem. Nor do I know if the compiler is ok, and its discovering a sloppy piece of code in the 'webserver toolchain' that was wrong, but wasn't creating an 'evident' problem before.

I just know that as more people move to GCC 4.5.2, they will hit this.

How to best report this to the GCC folks? Help on how to proceed, please! :D
Comment 4 Joel 2010-12-25 09:49:31 UTC
I've verified this as definitely a problem going from 4.5.1 to 4.5.2, as I returned all the optimization levels back to their defaults, and the bug happens with 4.5.2 compilation, not 4.5.1.

Not sure if this is in OpenSSL or in HTTPD, but either way it's not Apache's fault (or the OpenSSL project). I am working to narrow this down to submit to the GCC team.
Comment 5 Stefan Fritsch 2010-12-27 17:09:04 UTC
(In reply to comment #4)
> I've verified this as definitely a problem going from 4.5.1 to 4.5.2, as I
> returned all the optimization levels back to their defaults, and the bug
> happens with 4.5.2 compilation, not 4.5.1.

Can you please try if adding -fno-strict-aliasing to the CFLAGS fixes the problem even with 4.5.2 and optimization? If yes, this may be the same as bug 50190
Comment 6 Joel 2010-12-30 10:47:19 UTC
This problem is indeed solved by adding -fno-strict-aliasing to CFLAGS.

(In reply to comment #5)
> (In reply to comment #4)
> > I've verified this as definitely a problem going from 4.5.1 to 4.5.2, as I
> > returned all the optimization levels back to their defaults, and the bug
> > happens with 4.5.2 compilation, not 4.5.1.
> 
> Can you please try if adding -fno-strict-aliasing to the CFLAGS fixes the
> problem even with 4.5.2 and optimization? If yes, this may be the same as bug
> 50190
Comment 7 Joel 2010-12-30 11:11:00 UTC
I tried this, and some web pages now worked, but later, I discovered others did not. 

(In reply to comment #5)
> (In reply to comment #4)
> > I've verified this as definitely a problem going from 4.5.1 to 4.5.2, as I
> > returned all the optimization levels back to their defaults, and the bug
> > happens with 4.5.2 compilation, not 4.5.1.
> 
> Can you please try if adding -fno-strict-aliasing to the CFLAGS fixes the
> problem even with 4.5.2 and optimization? If yes, this may be the same as bug
> 50190
Comment 8 Eric Covener 2011-01-10 18:49:37 UTC
*** Bug 50564 has been marked as a duplicate of this bug. ***
Comment 9 Sÿl 2011-01-11 02:52:41 UTC
(In reply to comment #8)
> *** Bug 50564 has been marked as a duplicate of this bug. ***

Downgrading to 4.5.1 fixed the problem for me too.
Comment 10 Joe Orton 2011-01-17 05:31:30 UTC

*** This bug has been marked as a duplicate of bug 50190 ***