Bug 53609 - Apache hangs with terminated signal 6
Summary: Apache hangs with terminated signal 6
Status: NEW
Alias: None
Product: APR
Classification: Unclassified
Component: APR (show other bugs)
Version: HEAD
Hardware: Other Linux
: P2 critical (vote)
Target Milestone: ---
Assignee: Apache Portable Runtime bugs mailinglist
URL:
Keywords: FixedInTrunk
Depends on:
Blocks:
 
Reported: 2012-07-27 10:26 UTC by vinoth
Modified: 2019-01-06 12:53 UTC (History)
0 users



Attachments
Its a complete core dump (651.86 KB, text/plain)
2012-07-27 13:11 UTC, vinoth
Details
Error log of the patch apr_table.c (169.60 KB, text/plain)
2012-07-28 07:48 UTC, vinoth
Details

Note You need to log in before you can comment on or make changes to this bug.
Description vinoth 2012-07-27 10:26:28 UTC
We have an apache module which handles all the http request from the client and update those data into our mysql database.

All the request data be on plain/text and some of in binary dump. We did a performance test on the module by creating more POST request to hit the apache module.

The current test hits with 100 concurrent request to the apache all the time through out the day with a 1 min gap between the request.

Our MPM worker config is:

ThreadLimit 200
StartServers 3
MaxRequestWorkers 150
MinSpareThreads 50
MaxSpareThreads 200
ThreadsPerChild 150
MaxRequestsPerChild 0


In between the test apache get hangs into the state with <defunt > . From there it is not able to process the any of the request.

Core Dump Info:

Core was generated by `/usr/local/apache2/bin/httpd -k start'.
Program terminated with signal 6, Aborted.
[New process 8667]
[New process 8919]
[New process 8918]
[New process 8917]
[New process 8916]
[New process 8915]
[New process 8914]
[New process 8913]
[New process 8912]
[New process 8911]
[New process 8910]
[New process 8909]
[New process 8908]
[New process 8907]
[New process 8906]
[New process 8905]
[New process 8904]
[New process 8903]
[New process 8902]
[New process 8901]
[New process 8900]
[New process 8899]
[New process 8898]
[New process 8897]
[New process 8896]
[New process 8895]
[New process 8894]
[New process 8893]
[New process 8892]
[New process 8891]
[New process 8890]
[New process 8889]
[New process 8888]
[New process 8887]
[New process 8886]
[New process 8885]
[New process 8884]
[New process 8883]
[New process 8882]
[New process 8881]
[New process 8880]
[New process 8879]
[New process 8878]
[New process 8877]
[New process 8876]
[New process 8875]
[New process 8874]
[New process 8873]
[New process 8872]
[New process 8871]
[New process 8870]
[New process 8869]
[New process 8868]
[New process 8867]
[New process 8866]
[New process 8865]
[New process 8864]
[New process 8863]
[New process 8862]
[New process 8861]
[New process 8860]
[New process 8859]
[New process 8858]
[New process 8857]
[New process 8856]
[New process 8855]
[New process 8854]
[New process 8853]
[New process 8852]
[New process 8851]
[New process 8850]
[New process 8849]
[New process 8848]
[New process 8847]
[New process 8846]
[New process 8845]
[New process 8844]
[New process 8843]
[New process 8842]
[New process 8841]
[New process 8840]
[New process 8839]
[New process 8838]
[New process 8837]
[New process 8836]
[New process 8835]
[New process 8834]
[New process 8833]
[New process 8832]
[New process 8831]
[New process 8830]
[New process 8829]
[New process 8828]
[New process 8827]
[New process 8826]
[New process 8825]
[New process 8824]
[New process 8823]
[New process 8822]
[New process 8821]
[New process 8820]
[New process 8819]
[New process 8818]
[New process 8817]
[New process 8816]
[New process 8815]
[New process 8814]
[New process 8813]
[New process 8812]
[New process 8811]
[New process 8810]
[New process 8809]
[New process 8808]
[New process 8807]
[New process 8806]
[New process 8805]
[New process 8804]
[New process 8803]
[New process 8802]
[New process 8801]
[New process 8800]
[New process 8799]
[New process 8798]
[New process 8797]
[New process 8796]
[New process 8795]
[New process 8794]
[New process 8793]
[New process 8792]
[New process 8791]
[New process 8790]
[New process 8789]
[New process 8788]
[New process 8787]
[New process 8786]
[New process 8785]
[New process 8784]
[New process 8783]
[New process 8782]
[New process 8781]
[New process 8780]
[New process 8779]
[New process 8778]
[New process 8777]
[New process 8776]
[New process 8775]
[New process 8774]
[New process 8773]
[New process 8772]
[New process 8771]
[New process 8770]
[New process 8769]
[New process 8768]
[New process 8767]
[New process 8766]
[New process 8765]
[New process 8764]
[New process 8763]
[New process 8762]
[New process 8761]
[New process 8760]
[New process 8759]
[New process 8758]
[New process 8757]
[New process 8756]
[New process 8755]
[New process 8754]
[New process 8753]
[New process 8752]
[New process 8751]
[New process 8750]
[New process 8749]
[New process 8748]
[New process 8747]
[New process 8746]
[New process 8745]
[New process 8744]
[New process 8743]
[New process 8742]
[New process 8741]
[New process 8740]
[New process 8739]
[New process 8738]
[New process 8737]
[New process 8736]
[New process 8735]
[New process 8734]
[New process 8733]
[New process 8732]
[New process 8731]
[New process 8730]
[New process 8729]
[New process 8728]
[New process 8727]
[New process 8726]
[New process 8725]
[New process 8724]
[New process 8723]
[New process 8722]
[New process 8721]
[New process 8720]
[New process 8719]
[New process 8718]
[New process 8717]
[New process 8716]
[New process 8715]
[New process 8714]
[New process 8713]
[New process 8712]
[New process 8711]
[New process 8710]
[New process 8709]
[New process 8708]
[New process 8707]
[New process 8706]
[New process 8705]
[New process 8704]
[New process 8703]
[New process 8702]
[New process 8701]
[New process 8700]
[New process 8699]
[New process 8698]
[New process 8697]
[New process 8696]
[New process 8695]
[New process 8694]
[New process 8693]
[New process 8692]
[New process 8691]
[New process 8690]
[New process 8689]
[New process 8688]
[New process 8687]
[New process 8686]
[New process 8685]
[New process 8684]
[New process 8683]
[New process 8682]
[New process 8681]
[New process 8680]
[New process 8679]
[New process 8678]
[New process 8677]
[New process 8676]
[New process 8675]
[New process 8674]
[New process 8673]
[New process 8672]
[New process 8671]
[New process 8670]
[New process 8669]
#0  0x00000035b420d5cb in read () from /lib64/libpthread.so.0
(gdb) bt full
#0  0x00000035b420d5cb in read () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00002aaab590cb91 in ap_worker_pod_check (pod=0x72da960) at pod.c:57
        c = 7 '\a'
        fd = 7
        rc = 0
#2  0x00002aaab5908fa7 in child_main (child_num_arg=0) at worker.c:1332
        threads = (apr_thread_t **) 0x7350e90
        rv = 0
        ts = (thread_starter *) 0x7351670
        thread_attr = (apr_threadattr_t *) 0x73516a0
        start_thread_id = (apr_thread_t *) 0x7351720
#3  0x00002aaab5909149 in make_child (s=0x730f210, slot=0) at worker.c:1417
        pid = 0
#4  0x00002aaab590986a in perform_idle_server_maintenance () at worker.c:1618
        i = 0
        j = 250
        idle_thread_count = 0
        ws = (worker_score *) 0x2b5b7a806358
        ps = (process_score *) 0x2b5b7a7f7020
        free_length = 1
        totally_free_length = 1
        free_slots = {0, 0, 120648208, 0, 676492864, 32767, 4399419, 0, -1216176009, 10922, 65350085, -1732919508, -1283343424, 53, -1280786531, 53, 676492952, 32767, -1280520480, 53, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 676492864, 32767}
        last_non_dead = 0
        total_non_dead = 1
        active_thread_count = 0
#5  0x00002aaab5909ba4 in server_main_loop (remaining_children_to_start=0) at worker.c:1739
        old_gen = 119249696
        child_slot = 0
        exitwhy = 6
        status = 6
        processed_status = 0
        pid = {pid = -1, in = 0x7fff28529a96, out = 0x730f210, err = 0x71edcb0}
        i = 250
#6  0x00002aaab5909f78 in worker_run (_pconf=0x71b9b50, plog=0x71edcb0, s=0x730f210) at worker.c:1810
        remaining_children_to_start = 0
        rv = 0
---Type <return> to continue, or q <return> to quit---
#7  0x0000000000430d36 in ap_run_mpm (pconf=0x71b9b50, plog=0x71edcb0, s=0x730f210) at mpm_common.c:96
        pHook = (ap_LINK_mpm_t *) 0x7234930
        n = 0
        rv = -1
#8  0x0000000000429761 in main (argc=3, argv=0x7fff28527ad8) at main.c:777
        c = 0 '\0'
        showcompile = 0
        showdirectives = 0
        confname = 0x474073 "conf/httpd.conf"
        def_server_root = 0x474083 "/usr/local/apache2"
        temp_error_log = 0x0
        error = 0x0
        process = (process_rec *) 0x71b9b20
        pconf = (apr_pool_t *) 0x71b9b50
        plog = (apr_pool_t *) 0x71edcb0
        ptemp = (apr_pool_t *) 0x72c4040
        pcommands = (apr_pool_t *) 0x71d8680
        opt = (apr_getopt_t *) 0x71edc40
        rv = 0
        mod = (module **) 0x68fa18
        opt_arg = 0x2b5b796066a0 "ÖìA"
        signal_server = (apr_OFN_ap_signal_server_t *) 0x4604af <ap_signal_server>
Comment 1 Eric Covener 2012-07-27 11:26:46 UTC
What does your error log say?  What's the backtrace of the thread that crashed?

"thread apply all bt full" and post the suspicious one.
Comment 2 vinoth 2012-07-27 13:09:25 UTC
Apologize previous core info was with other pid. Here is the right one.

The Error Log says:
===================
[Thu Jul 26 22:38:55.033453 2012] [core:notice] [pid 30703:tid 47671878384272] AH00051: child pid 26692 exit signal Aborted (6), possible coredump in /piroot/apache2-gdb-dump
apr_table_mergen: key not in ancestor pool of t
[Thu Jul 26 22:43:42.038833 2012] [core:notice] [pid 30703:tid 47671878384272] AH00051: child pid 6351 exit signal Aborted (6), possible coredump in /piroot/apache2-gdb-dump
apr_table_mergen: key not in ancestor pool of t


Attached the core dump of thread apply all bt full.

In the dump we removed the sql statements and the client Ipaddr.
Comment 3 vinoth 2012-07-27 13:11:34 UTC
Created attachment 29121 [details]
Its a complete core dump
Comment 4 Eric Covener 2012-07-27 13:17:16 UTC
#5  0x00002b5b793d9d2e in apr_table_mergen (t=0x2aab180d29b0, key=0x48169c "Connection", val=0x481704 "Keep-Alive") at tables/apr_tables.c:739
	next_elt = <value optimized out>
	end_elt = <value optimized out>
	checksum = <value optimized out>
	hash = <value optimized out>
#6  0x0000000000468898 in ap_set_keepalive (r=0x2aab181b9a60) at http_protocol.c:249
Comment 5 Ruediger Pluem 2012-07-27 14:31:20 UTC
I guess this is a bug in APR:

Can you please test the following patch:

Index: tables/apr_tables.c
===================================================================
--- tables/apr_tables.c (revision 1366342)
+++ tables/apr_tables.c (working copy)
@@ -730,15 +730,18 @@
     apr_table_entry_t *next_elt;
     apr_table_entry_t *end_elt;
     apr_uint32_t checksum;
+    apr_pool_t *pool;
     int hash;

 #if APR_POOL_DEBUG
     {
-       if (!apr_pool_is_ancestor(apr_pool_find(key), t->a.pool)) {
+    pool = apr_pool_find(key);
+       if ((pool != key) && (!apr_pool_is_ancestor(pool, t->a.pool))) {
            fprintf(stderr, "apr_table_mergen: key not in ancestor pool of t\n");
            abort();
        }
-       if (!apr_pool_is_ancestor(apr_pool_find(val), t->a.pool)) {
+    pool = apr_pool_find(val);
+       if ((pool != val) && (!apr_pool_is_ancestor(pool, t->a.pool))) {
            fprintf(stderr, "apr_table_mergen: val not in ancestor pool of t\n");
            abort();
        }
Comment 6 vinoth 2012-07-28 05:08:39 UTC
Thanks for the patch ruediger. I started the test with the patch against our module.

I keep you posted if its gets solved our issue.
Comment 7 vinoth 2012-07-28 07:46:30 UTC
we again hit with the same error in the patch too while testing. It is not happened immediately. It occurs during one to two hours of our test and makes the apache to hang at one stage.

Please let me know whats the real issue with APR table mergen/addn functions. Why its is behave in that manner when he hit with higher concurrency (100).

Any help to overcome is greatly appreciated.
Comment 8 vinoth 2012-07-28 07:48:39 UTC
Created attachment 29123 [details]
Error log of the patch apr_table.c

Apache2 error logs
Comment 9 vinoth 2012-07-29 04:37:43 UTC
Found one interesting in the error log .apache2 gets restarted every time when the worker getting shutdown when it caught SIGTERM. While worker is getting shutting down we don't have any core dump associated with that.

I set the maxreqworker is 150. The concurrency hit to the apache is always less than the maxreqworker config.

Is this the expected behaviour?
Comment 10 Ruediger Pluem 2012-07-30 09:11:00 UTC
(In reply to comment #7)
> we again hit with the same error in the patch too while testing. It is not
> happened immediately. It occurs during one to two hours of our test and
> makes the apache to hang at one stage.

I cannot see that you hit the same error again. You still have segfaults / aborts, but you don't have the "key not in ancestor pool of " any longer in the log. This means that the patch works and resolved one of your issues.
Comment 11 Ruediger Pluem 2012-07-30 12:27:47 UTC
(In reply to comment #10)
> (In reply to comment #7)
> > we again hit with the same error in the patch too while testing. It is not
> > happened immediately. It occurs during one to two hours of our test and
> > makes the apache to hang at one stage.
> 
> I cannot see that you hit the same error again. You still have segfaults /
> aborts, but you don't have the "key not in ancestor pool of " any longer in
> the log. This means that the patch works and resolved one of your issues.

The patch was committed as r1367050.
Comment 12 vinoth 2012-07-30 16:31:52 UTC
We see those error "key not in ancestor pool of t"  in the attached Error logs

===============
[Fri Jul 27 21:36:59.029498 2012] [core:notice] [pid 449:tid 47538503809680] AH00094: Command line: '/usr/local/apache2/bin/httpd'
apr_table_mergen: key not in ancestor pool of t
[Fri Jul 27 22:29:38.067056 2012] [core:notice] [pid 449:tid 47538503809680] AH00051: child pid 1083 exit signal Aborted (6), possible coredump in /piroot/apache2-gdb-dump
apr_table_mergen: key not in ancestor pool of t
[Fri Jul 27 22:50:13.089987 2012] [core:notice] [pid 449:tid 47538503809680] AH00051: child pid 23412 exit signal Aborted (6), possible coredump in /piroot/apache2-gdb-dump
apr_table_addn: val not in ancestor pool of t
[Fri Jul 27 22:50:22.091015 2012] [core:notice] [pid 449:tid 47538503809680] AH00051: child pid 32280 exit signal Aborted (6), possible coredump in /piroot/apache2-gdb-dump
*** glibc detected *** /usr/local/apache2/bin/httpd: double free or corruption (!prev): 0x00002aab90008b30 ***
*** glibc detected *** /usr/local/apache2/bin/httpd: double free or corruption (!prev): 0x00002aab90008b30 ***
*** glibc detected *** /usr/local/apache2/bin/httpd: double free or corruption (!prev): 0x00002aab90008b30 ***
*** glibc detected *** /usr/local/apache2/bin/httpd: double free or corruption (!prev): 0x00002aab90008b30 ***
*** glibc detected *** /usr/local/apache2/bin/httpd: double free or corruption (!prev): 0x00002aab90008b30 ***
*** glibc detected *** /usr/local/apache2/bin/httpd: double free or corruption (!prev): 0x00002aab90008b30 ***
[Fri Jul 27 22:50:33.091944 2012] [core:notice] [pid 449:tid 47538503809680] AH00051: child pid 32614 exit signal Segmentation fault (11), possible coredump in /piroot/apache2-gdb-dump
apr_table_addn: val not in ancestor pool of t
[Fri Jul 27 22:50:54.093086 2012] [core:notice] [pid 449:tid 47538503809680] AH00051: child pid 525 exit signal Aborted (6), possible coredump in /piroot/apache2-gdb-dump
apr_table_mergen: key not in ancestor pool of t
[Fri Jul 27 23:19:04.121667 2012] [core:notice] [pid 449:tid 47538503809680] AH00051: child pid 959 exit signal Aborted (6), possible coredump in /piroot/apache2-gdb-dump
apr_table_addn: val not in ancestor pool of t
[Fri Jul 27 23:35:08.139140 2012] [core:notice] [pid 449:tid 47538503809680] AH00051: child pid 13036 exit signal Aborted (6), possible coredump in /piroot/apache2-gdb-dump

===============

Today we tried out the prefork mpm for the test instead of worker mpm. In that we dont see any aborts and seg faults in the test and also the error "key not in ancestor pool of t".

Is there is anything associated with the issue in worker mpm.

When we hit with the issue while in worker mpm the backtrace always say something that releated to worker.c file

Is there any other information that you need from my side.
Comment 13 Stefan Fritsch 2013-05-03 19:33:02 UTC
1.5 commit: r1368813
1.4 commit: r1368817