Bug 39738

Summary: Random core dumps in crc32 and bndm
Product: Apache httpd-2 Reporter: Paul Lindner <lindner>
Component: AllAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: P2    
Version: 2.2.2   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Paul Lindner 2006-06-06 19:30:05 UTC
I'm having a hard time tracking down random core dumps on a large
production site at work.  It appears to be a memory corruption problem
somewhere.  Checking the core files with gdb I find crc32() and bndm()
called with pointers to invalid memory locations.
 I've followed the steps in the debugging page
   http://httpd.apache.org/dev/debugging.html

But I'm unsure what to do with the core dumps I've found. The problem seems to
manifest itself in two ways.  In the first case crc32() calls
in the expires module get passed an invalid pointer. These are requests for
index.xml or index.rdf files.  The second case occurs in
SSI processing in the bndm() call.  Again, we have corrupted pointers into memory.
 Should I delve into the memory debugging here?
  http://httpd.apache.org/docs/2.0/developer/debugging.html

Or is there an easier way to trace this down?

Some details:

* Worker mpm, 5 servers, 32 threads each
* MaxRequestsPerChild  0
* Linux 2.6.latest - Fedora Core 4
* To prevent runaways we have 'ulimit -v 600000' to give us an upper limit on
  virtual memory use
* Nominal memory usage is less..  from /proc/XXXX/status
  * VmPeak:   402152 kB
    VmSize:   400068 kB
    VmHWM:     24968 kB
    VmRSS:     23016 kB
    VmData:   364080 kB* Core dumps are not larger than the , around 400MB each.
* Apache 2.2.2 compiled from source
  * various modules compiled in   * --enable-nonportable-atomics on
* Sample backtraces (many more available on request.)
 
Thread 29 (process 7660): #0  0x000000308582f5f7 in kill () from /lib64/libc.so.6
#1  0x000000000044f6a4 in sig_coredump (sig=7) at mpm_common.c:1170
#2  <signal handler called> #3  0x0000003085f0265b in crc32 () from
/usr/lib64/libz.so.1
#4  0x000000000046d268 in deflate_out_filter (f=0x150cd78, bb=0x150d368) at
mod_deflate.c:524 #5  0x0000000000450c88 in ap_pass_brigade (next=0x150cd78,
bb=0x150d368) at util_filter.c:526
#6  0x0000000000473c32 in expires_filter (f=0x150cf70, b=0x150d368) at
mod_expires.c:515 #7  0x0000000000450c88 in ap_pass_brigade (next=0x150cf70,
bb=0x150d368) at util_filter.c:526
#8  0x000000000043bbd0 in default_handler (r=0x2aaaaca62368) at core.c:3701 #9 
0x00000000004434d7 in ap_run_handler (r=0x2aaaaca62368) at config.c:157
#10 0x0000000000443d8c in ap_invoke_handler (r=0x2aaaaca62368) at config.c:371


Thread 28 (process 8782):
#0  0x000000308582f5f7 in kill () from /lib64/libc.so.6
#1  0x000000000044f6a4 in sig_coredump (sig=7) at mpm_common.c:1170
#2  <signal handler called>
#3  0x0000003085f0265b in crc32 () from /usr/lib64/libz.so.1
#4  0x000000000046d268 in deflate_out_filter (f=0xd5c690, bb=0xd5cc80) at
mod_deflate.c:524
#5  0x0000000000450c88 in ap_pass_brigade (next=0xd5c690, bb=0xd5cc80) at
util_filter.c:526
#6  0x0000000000473c32 in expires_filter (f=0xd5c888, b=0xd5cc80) at
mod_expires.c:515
#7  0x0000000000450c88 in ap_pass_brigade (next=0xd5c888, bb=0xd5cc80) at
util_filter.c:526
#8  0x000000000043bbd0 in default_handler (r=0x13e1fd8) at core.c:3701
#9  0x00000000004434d7 in ap_run_handler (r=0x13e1fd8) at config.c:157
...
#0  0x000000308582f5f7 in kill () from /lib64/libc.so.6
#1  0x000000000044f6a4 in sig_coredump (sig=7) at mpm_common.c:1170
#2  <signal handler called> #3  0x0000003085f0265b in crc32 () from
/usr/lib64/libz.so.1
#4  0x000000000046d268 in deflate_out_filter (f=0x2aaaabfd7688, bb=0x2aaaabfd7c80)
    at mod_deflate.c:524 #5  0x0000000000450c88 in ap_pass_brigade
(next=0x2aaaabfd7688, bb=0x2aaaabfd7c80)
    at util_filter.c:526 #6  0x0000000000473c32 in expires_filter
(f=0x2aaaabfd7880, b=0x2aaaabfd7c80) at mod_expires.c:515
#7  0x0000000000450c88 in ap_pass_brigade (next=0x2aaaabfd7880, bb=0x2aaaabfd7c80)
    at util_filter.c:526 #8  0x000000000043bbd0 in default_handler
(r=0x2aaaabff2688) at core.c:3701
... 
Thread 7 (process 7120): #0  0x000000308582f5f7 in kill () from /lib64/libc.so.6
#1  0x000000000044f6a4 in sig_coredump (sig=7) at mpm_common.c:1170 #2  <signal
handler called>
#3  0x0000000000466ef4 in bndm (t=0x9a4898,
    h=0x2aaaabc476ca <Address 0x2aaaabc476ca out of bounds>, hl=27993) at
mod_include.c:2473
#4  0x0000000000466fe7 in find_start_sequence (ctx=0x9a4798,
    data=0x2aaaabc476ca <Address 0x2aaaabc476ca out of bounds>, len=27993) at
mod_include.c:2514
#5  0x000000000046854d in send_parsed_content (f=0x9a4118, bb=0x9a4758) at
mod_include.c:3191
#6  0x0000000000469ce4 in includes_filter (f=0x9a4118, b=0x9a4758) at
mod_include.c:3601
#7  0x0000000000450c88 in ap_pass_brigade (next=0x9a4118, bb=0x9a4758) at
util_filter.c:526
#8  0x000000000043bbd0 in default_handler (r=0x95a0b8) at core.c:3701
#9  0x00000000004434d7 in ap_run_handler (r=0x95a0b8) at config.c:157
#10 0x0000000000443d8c in ap_invoke_handler (r=0x95a0b8) at config.c:371
#11 0x00000000004add20 in ap_process_request (r=0x95a0b8) at http_request.c:258
....
#0  0x0000000000466ef4 in bndm (t=0x152e730,
    h=0x2aaaaba6cf13 <Address 0x2aaaaba6cf13 out of bounds>, hl=8338) at
mod_include.c:2473
#1  0x0000000000466fe7 in find_start_sequence (ctx=0x152e630,
    data=0x2aaaaba6cf13 <Address 0x2aaaaba6cf13 out of bounds>, len=8338) at
mod_include.c:2514
#2  0x000000000046854d in send_parsed_content (f=0x152dfa8, bb=0x152e5f0) at
mod_include.c:3191
#3  0x0000000000469ce4 in includes_filter (f=0x152dfa8, b=0x152e5f0) at
mod_include.c:3601
....


#0  0x000000308582f5f7 in kill () from /lib64/libc.so.6
#1  0x000000000044f6a4 in sig_coredump (sig=7) at mpm_common.c:1170
#2  <signal handler called>
#3  0x0000003085f0265b in crc32 () from /usr/lib64/libz.so.1
#4  0x000000000046d268 in deflate_out_filter (f=0xc68108, bb=0xc68700) at
mod_deflate.c:524
#5  0x0000000000450c88 in ap_pass_brigade (next=0xc68108, bb=0xc68700) at
util_filter.c:526
#6  0x0000000000473c32 in expires_filter (f=0xc68300, b=0xc68700) at
mod_expires.c:515
#7  0x0000000000450c88 in ap_pass_brigade (next=0xc68300, bb=0xc68700) at
util_filter.c:526
#8  0x000000000043bbd0 in default_handler (r=0xa06698) at core.c:3701
....
Comment 1 Joe Orton 2006-06-07 09:15:32 UTC
This is on x86_64?  --enable-nonportable-atomics shouldn't make a difference on
that platform.  What compiler flags did you use? Does this go away if use
prefork rather than worker? You are sure that these segfaults are happening
*only* in those two functions? 

Some things to try:

- reconfigure and rebuild passing CFLAGS="-g -O0" to configure; makes debugging
more reliable, might rule out compiler issues
- try prefork
- try a fresh build with --enable-pool-debug

Comment 2 Paul Lindner 2006-06-07 12:47:35 UTC
Thanks for the response.  We don't use prefork since worker has done so well for
us over the years, and I'd like to avoid increasing the memory footprint on
these servers.

So far all the cores die in these two functions.  We get about 10-12 hits on
this per day.

CFLAGS is set to -g, (verified by checking config.status)  Our original build
used CFLAGS set by rpmbuild, which was more aggressive.  We can certainly try
-O0, but it seems to me that this is a memory corruption bug, not a gcc 4.1 error.

What do I look for when adding --enable-pool-debug ?  Log messages?


Here's the exact build recipe we use for Apache 2.2.2, pretty straightforward:

CFLAGS := -g

.....

httpd-2.2.2:
        test -f $@.tar.bz2 || wget -c
http://archive.apache.org/dist/httpd/$@.tar.bz2
        tar xjf $@.tar.bz2
        cd $@; env CFLAGS="$(CFLAGS)" ./configure  \
        --enable-nonportable-atomics --enable-access \
        --enable-actions             --enable-alias \
        --enable-asis                --enable-auth \
        --disable-autoindex          --disable-cgi \
        --disable-cgid               --enable-auth-digest\
        --disable-dav                --disable-dav-fs\
        --enable-deflate             --enable-dir\
        --enable-env                 --enable-headers\
        --disable-imap               --enable-include \
        --enable-log-config          --enable-logio\
        --enable-log-forensic        --enable-mime \
        --enable-negotiation         --enable-proxy \
        --enable-proxy-http          --enable-rewrite \
        --enable-setenvif            --enable-status \
        --disable-suexec             --disable-userdir\                   
        --disable-vhost-alias        --enable-ssl=static \
        --enable-so                  --enable-cache\
        --enable-file-cache          --enable-disk-cache\
        --enable-mem-cache           --enable-expires\
        --enable-exception-hook      --with-mpm=worker \
        --prefix=$(INSTDIR)\
        --with-berkeley-db \
        --with-dbm=db4\
        --with-program-name=httpd2
        cd $@; make
        cd $@; make install
Comment 3 Joe Orton 2006-06-07 13:00:29 UTC
I mention -O0 not only to help eliminate compiler optimisation issues but also
to make the core dumps easier to debug.

If you build with --enable-pool-debug you can then run with

 export MALLOC_CHECK_=2

to get glibc to use a more paranoid malloc - this is often useful for tracking
down memory corruption issues.
Comment 4 Joe Orton 2007-10-11 06:22:41 UTC
I recently worked with a user to track down an issue with identical backtraces -
the root cause was PR 36780; (optimistically?) marking as a duplicate.

*** This bug has been marked as a duplicate of 36780 ***