42086 – Possible bug in mpm/worker/fdqueue.c:ap_queue_push()

Bug 42086 - Possible bug in mpm/worker/fdqueue.c:ap_queue_push()

Summary: Possible bug in mpm/worker/fdqueue.c:ap_queue_push()

Status:	RESOLVED DUPLICATE of bug 44402

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	mpm_worker (show other bugs)
Version:	2.2.2
Hardware:	Sun SunOS

Importance:	P2 normal (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-04-11 03:43 UTC by Michal Rousal
Modified:	2008-02-27 15:27 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michal Rousal 2007-04-11 03:43:40 UTC

Hi,

We are getting occasional httpd coredumps when doing performance/stress testing
of our module (like 1 core per few hours of stress test). We are using apache
2.2.2 with worker mpm. Based on the cores, it always crashes in apache code. It
is obviously some memory corruption: SIGSEGV, SIGBUS (like 50% together -
usually on something related to apr_buckets), and more often SIGILL (like 50%)
with always same backtrace as bellow.

(dbx) where
current thread: t@1
  [1] 0x6a2808(0x6a07c0, 0xfeebc008, 0x2f40c0, 0xfeebc008, 0x0, 0x19ce54), at
0x6a2808
  [2] apr_pool_destroy(0x19cce0, 0x0, 0x0, 0x0, 0x19cce0, 0x0), at 0xff0db8c8
  [3] child_main(0xa, 0xf0ca8, 0xf88f4, 0xebe90, 0xf0ce4, 0xf8990), at 0xb13f0
  [4] perform_idle_server_maintenance(0x109ca8, 0x0, 0xf0cb8, 0x0, 0xebe90,
0xf8918), at 0xb1e48
  [5] server_main_loop(0x0, 0xffffffff, 0x3, 0xfeac0050, 0xef968, 0xebe90), at
0xb226c
  [6] ap_mpm_run(0x11400, 0x0, 0xf8990, 0xda7d8, 0xebe90, 0xf8924), at 0xb25ec
  [7] main(0xee8a0, 0x0, 0xef9dc, 0xef9ec, 0x100498, 0xebe90), at 0x2c3cc

Actually, we though (and still think) that it is be some bug in our module code,
but with some additional testing (using Parasoft Insure + apache-2.2.2 build
with debug info) we got assert crash (SIGABRT) in httpd with this report (same
as core backtrace) – see bellow (got 3x same core during 3-day test with server
perma under 100% load). 

...
"unknown", line unknown: Insure trapped signal: 6
  Stack trace where the error occurred:
                   __sigprocmask()
                   sigacthandler()
                          _sigon()
                      _thrp_kill()
                           raise()
                           abort()
                           abort()  (interface)
                   ap_log_assert()  log.c, 778
                   ap_queue_push() 
/tmp/apache-2.2.2.rousalm.build/httpd-2.2.2/server/mpm/worker/fdqueue.c, 294
                 listener_thread()  worker.c, 755
                    dummy_worker()  threadproc/unix/thread.c, 138
"unknown", line unknown: Insure trapped signal: 6
...

the problem seems to be in:

apr_status_t ap_queue_push(fd_queue_t *queue, apr_socket_t *sd, apr_pool_t *p)
{
...
AP_DEBUG_ASSERT(!ap_queue_full(queue));

    elem = &queue->data[queue->nelts];
    elem->sd = sd;
    elem->p = p;
    queue->nelts++;
...
}

from core I can see that both 'queue->nelts' and 'queue->bounds' are equal to 10
(see config bellow), so the queue is full, but apache tries to add new
connection to it (+ there is no nondebug error check except on mutex lock/unlock
failure), without the assert this can for sure cause memory corruption.

Our httpd.conf worker config looks like this:
...
<IfModule mpm_worker_module>
    StartServers          4
    MaxClients            150
    MinSpareThreads       40
    MaxSpareThreads       80
    ThreadsPerChild       10
    MaxRequestsPerChild   10000
</IfModule>
...

I had no time to check all the code related to worker_queue access, so it is
still quite possible that this "bug" is caused by some previous problem caused
by our module. As there is only assert check I expect this (queue is full)
should not happen under normal circumstances. 

Regards,
Michal

Comment 1 Nick Kew 2008-02-27 15:27:49 UTC

This looks like PR#44402 - closing as duplicate.
If the fix to that doesn't work, you can reopen.

*** This bug has been marked as a duplicate of bug 44402 ***