Bug 58491 - Core dumps appear when the load increases
Summary: Core dumps appear when the load increases
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_http (show other bugs)
Version: 2.4.16
Hardware: Other Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-09 09:21 UTC by Sylvain Goulmy
Modified: 2015-11-13 09:24 UTC (History)
3 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sylvain Goulmy 2015-10-09 09:21:29 UTC
Hi,

We are currently facing an issue with Apache since our 2.4 upgrade. When the load increase too much (250 hits/s) the http instance generates core files.

Here is the kind of core that we have :

Core was generated by `/tech/apache/runtime/httpd-2.4.16/bin/httpd -D inst-switch -d /tech/apache -f /'.
Program terminated with signal 11, Segmentation fault.
#0  apr_pool_cleanup_register (p=0x400000010, data=0x3181c78, plain_cleanup_fn=0x7f15bb4e7cc0 <thread_cond_cleanup>, child_cleanup_fn=0x427b10 <apr_pool_cleanup_null@plt>)
    at memory/unix/apr_pools.c:2213
2213	memory/unix/apr_pools.c: No such file or directory.
	in memory/unix/apr_pools.c

(gdb) bt
#0  apr_pool_cleanup_register (p=0x400000010, data=0x3181c78, plain_cleanup_fn=0x7f15bb4e7cc0 <thread_cond_cleanup>, child_cleanup_fn=0x427b10 <apr_pool_cleanup_null@plt>)
    at memory/unix/apr_pools.c:2213
#1  0x00007f15bb4e7de8 in apr_thread_cond_create (cond=0x3181c20, pool=0x3181168) at locks/unix/thread_cond.c:55
#2  0x00007f15bb93f5ec in apr_reslist_create (reslist=0x1e46300, min=<value optimized out>, smax=<value optimized out>, hmax=25, ttl=<value optimized out>, con=<value optimized out>, 
    de=0x7f15b745a3e0 <connection_destructor>, params=0x291d9e8, pool=0x3181168) at misc/apr_reslist.c:299
#3  0x00007f15b745af2d in ap_proxy_initialize_worker (worker=0x291d9e8, s=0x1eaaa38, p=0x1e46148) at proxy_util.c:1848
#4  0x00007f15b7455cac in proxy_handler (r=0x7f1570002970) at mod_proxy.c:1087
#5  0x000000000044b210 in ap_run_handler ()
#6  0x000000000044f38e in ap_invoke_handler ()
#7  0x000000000046201a in ap_process_async_request ()
#8  0x000000000046217f in ap_process_request ()
#9  0x000000000045e4ce in ap_process_http_connection ()
#10 0x0000000000456020 in ap_run_process_connection ()
#11 0x00007f15ba0c7f2a in process_socket (thd=0x2fce5f0, dummy=<value optimized out>) at worker.c:619
#12 worker_thread (thd=0x2fce5f0, dummy=<value optimized out>) at worker.c:978
#13 0x00007f15bac739d1 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f15ba9c08fd in clone () from /lib64/libc.so.6

This http instance is a reverse proxy with the following configuration at server level :

	ProxyPreserveHost On
	BalancerPersist On
	RewriteRule /(.*)   balancer://tomcat7_80/$1  [proxy]


With the balancer defined at the server level :

<Proxy balancer://tomcat7_80>
        ProxySet lbmethod=bybusyness maxattempts=1

        BalancerMember http://127.0.6.49:80           lbset=0 smax=0 ttl=3 connectiontimeout=2 retry=60

        BalancerMember http://10.70.6.48:80           lbset=1 smax=0 ttl=3 connectiontimeout=2 retry=60

</Proxy>

We have the same configuration on live environment with a much higher load without facing any issue with Apache 2.2.

Details about our platform :

- OS : RHEL 6.4
- Apache 2.4.16 compiled with apr-1.5.2 and apr-util-1.5.4
- MPM_worker
Comment 1 Yann Ylavic 2015-10-16 21:38:39 UTC
Do you insert/update load-balancer members via the balancer-manager during runtime?
Comment 2 Sylvain Goulmy 2015-10-26 10:20:38 UTC
Sorry for this late answer.

The definitons of the balancer members are static, we don't add/remove any definitons during runtime.

The status can be modified during runtime by disabling/enabling a member with the balancer manager. However i have checked in the log files, the status of the balancer members have NOT been modified when the core dump appeared.

Besides that we still have many core dumps on this http instance but many of them are different from the one i submited. They are still relative to the mod_proxy module but some of them appears during request processing and not process launching.

Would you like that i add the new core dumps to the current ticket or should i open a ticket for each type of error ?
Comment 3 Yann Ylavic 2015-10-26 23:23:22 UTC
(In reply to Sylvain Goulmy from comment #2)
> Would you like that i add the new core dumps to the current ticket or should
> i open a ticket for each type of error ?

Please use this same ticket since the error may be a corrupted memory which could be better diagnosed with other backtraces (the previous one made few sense so far).
Comment 4 Sylvain Goulmy 2015-10-27 13:08:06 UTC
Please find two core that appeared on the same http instance with an interval of one minute :

Core was generated by `/tech/apache/runtime/httpd-2.4.16/bin/httpd -D inst-switch -d /tech/apache -f /'.
Program terminated with signal 11, Segmentation fault.
#0  run_cleanups (pool=0x7f15bbf7a018) at memory/unix/apr_pools.c:2351
2351	memory/unix/apr_pools.c: No such file or directory.
	in memory/unix/apr_pools.c
Missing separate debuginfos, use: debuginfo-install AF-httpd-2.4.16-0-el6.4.x86_64
(gdb) bt full
#0  run_cleanups (pool=0x7f15bbf7a018) at memory/unix/apr_pools.c:2351
        c = 0x2e362e302e373231
#1  apr_pool_destroy (pool=0x7f15bbf7a018) at memory/unix/apr_pools.c:804
        active = <value optimized out>
        allocator = <value optimized out>
#2  0x00007f15b745a3fb in connection_destructor (resource=<value optimized out>, params=<value optimized out>, pool=<value optimized out>) at proxy_util.c:1499
        conn = <value optimized out>
#3  0x00007f15bb93f315 in destroy_resource (reslist=0x31d9600, resource=0x7f15a7df1ac8) at misc/apr_reslist.c:135
No locals.
#4  apr_reslist_acquire (reslist=0x31d9600, resource=0x7f15a7df1ac8) at misc/apr_reslist.c:345
        rv = <value optimized out>
        res = 0x31d9638
        now = 1445846735860297
#5  0x00007f15b745a969 in ap_proxy_acquire_connection (proxy_function=0x7f15b724a1fe "HTTP", conn=0x7f15a7df1ac8, worker=0x31b77c8, s=0x2c8a4a8) at proxy_util.c:2125
        rv = <value optimized out>
#6  0x00007f15b7247879 in proxy_http_handler (r=0x7f155c006990, worker=0x31b77c8, conf=0x1edfd38, 
    url=0x7f155c003488 "http://127.0.6.48/UpdateMyAccountCustomerWS/ws/passenger/marketing/000443v05", proxyname=0x0, proxyport=0) at mod_proxy_http.c:1957
        status = <value optimized out>
        server_portstr = "\001\000\000\000\000\000\000\000\225\312h\270\025\177\000\000\000\000\000\000\000\000\000\000\344\217N\273\025\177\000"
        scheme = 0x7f155c003530 "http"
        proxy_function = 0x7f15b724a1fe "HTTP"
        u = <value optimized out>
        backend = 0x0
        is_ssl = 0
        c = 0x7f159c01f968
        retry = 0
        p = 0x7f155c006918
        uri = 0x7f155c0034d8

		
Core was generated by `/tech/apache/runtime/httpd-2.4.16/bin/httpd -D inst-switch -d /tech/apache -f /'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f15bb93efbd in get_container (reslist=0x31d9760) at misc/apr_reslist.c:97
97	misc/apr_reslist.c: No such file or directory.
	in misc/apr_reslist.c
Missing separate debuginfos, use: debuginfo-install AF-httpd-2.4.16-0-el6.4.x86_64
(gdb) bt full
#0  0x00007f15bb93efbd in get_container (reslist=0x31d9760) at misc/apr_reslist.c:97
        res = 0x31d9780
#1  0x00007f15bb93f02e in create_resource (reslist=0x31d9760, ret_res=0x7f15afffe868) at misc/apr_reslist.c:121
        rv = 52270976
        res = <value optimized out>
#2  0x00007f15bb93f416 in apr_reslist_acquire (reslist=0x31d9760, resource=0x7f15afffeac8) at misc/apr_reslist.c:395
        rv = <value optimized out>
        res = 0x81
        now = 1445846790718097
#3  0x00007f15b745a969 in ap_proxy_acquire_connection (proxy_function=0x7f15b724a1fe "HTTP", conn=0x7f15afffeac8, worker=0x31b77c8, s=0x2c8a4a8) at proxy_util.c:2125
        rv = <value optimized out>
#4  0x00007f15b7247879 in proxy_http_handler (r=0x7f158c002970, worker=0x31b77c8, conf=0x1edfd38, 
    url=0x7f158c0074a8 "http://127.0.6.48/UpdateMyAccountCustomerWS/ws/passenger/marketing/000443v05", proxyname=0x0, proxyport=0) at mod_proxy_http.c:1957
        status = <value optimized out>
        server_portstr = "p\032\276\002", '\000' <repeats 12 times>, "wh?\025\177\000\000\063\242N\273\025\177\000"
        scheme = 0x7f158c007550 "http"
        proxy_function = 0x7f15b724a1fe "HTTP"
        u = <value optimized out>
        backend = 0x0
        is_ssl = 0
        c = 0x7f15b0007368
        retry = 0
        p = 0x7f158c0028f8
        uri = 0x7f158c0074f8
#5  0x00007f15b7450f12 in proxy_run_scheme_handler (r=0x7f158c002970, worker=0x31b77c8, conf=0x1edfd38, 
    url=0x7f158c0074a8 "http://127.0.6.48/UpdateMyAccountCustomerWS/ws/passenger/marketing/000443v05", proxyhost=0x0, proxyport=0) at mod_proxy.c:2789
        pHook = <value optimized out>
        n = <value optimized out>
        rv = -1

		
Once again, i have checked that there was no configuration reload or any balancer updates at that time.

I have kept the core files if necessary.

Hope it helps.
Comment 5 Sylvain Goulmy 2015-10-30 14:39:26 UTC
Hi,

For information, i have upgraded to Apache 2.4.17, since i have had two new cores :

The first one :

#0  0x00007f8531da63a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f8532a70076 in apr_reslist_maintain (reslist=0x17c9630) at misc/apr_reslist.c:184
        now = <value optimized out>
        rv = <value optimized out>
        res = <value optimized out>
        created_one = 0
#2  0x00007f8532a705fe in apr_reslist_create (reslist=0xc3f230, min=<value optimized out>, smax=<value optimized out>, hmax=25, ttl=<value optimized out>, con=<value optimized out>, 
    de=0x7f852e58a440 <connection_destructor>, params=0x12b6698, pool=0x17c92d8) at misc/apr_reslist.c:305
        rv = 0
        rl = 0x17c9630
#3  0x00007f852e58afdd in ap_proxy_initialize_worker (worker=0x12b6698, s=0xc34110, p=0xc3f0e8) at proxy_util.c:1854
        rv = <value optimized out>
        mpm_threads = 32645
#4  0x00007f852e585c73 in proxy_handler (r=0x7f8508002970) at mod_proxy.c:1088
        url = 0x7f8508007148 "http://127.0.6.65/daemon/ws/COMET32_AMADEUS"
        uri = 0x7f8508007086 "balancer://tomcat6_80/daemon/ws/COMET32_AMADEUS"
        scheme = 0x7f85080070e8 "balancer"
        p = <value optimized out>
        p2 = <value optimized out>
        sconf = <value optimized out>
        conf = 0xc3aef8
        proxies = 0xc3b028
        ents = 0xc361f8
        i = <value optimized out>
        access_status = 0
        direct_connect = <value optimized out>
        str = <value optimized out>
        maxfwd = <value optimized out>
        balancer = 0x12ad028
        worker = 0x12b6698
        attempts = 0
        max_attempts = 0
        list = <value optimized out>
        saved_status = <value optimized out>
#5  0x000000000044ce00 in ap_run_handler ()
No symbol table info available.
#6  0x0000000000450f7e in ap_invoke_handler ()
No symbol table info available.
#7  0x00000000004641ba in ap_process_async_request ()
No symbol table info available.
#8  0x000000000046431f in ap_process_request ()
No symbol table info available.
#9  0x000000000046065e in ap_process_http_connection ()
No symbol table info available.
#10 0x0000000000457d30 in ap_run_process_connection ()
No symbol table info available.

With the following messages in the error log :

[Fri Oct 30 06:18:20.094520 2015] [mpm_worker:emerg] [pid 1804:tid 140209616217856] (35)Resource deadlock avoided: AH00272: apr_proc_mutex_lock failed before this child process served any requests.
[Fri Oct 30 11:27:34.378987 2015] [core:notice] [pid 20080:tid 140210063746880] AH00051: child pid 6479 exit signal Segmentation fault (11), possible coredump in /tech/apache/var/logs/switch

And the second one :

#0  ap_proxy_connect_backend (proxy_function=0x7f286014c1fe "HTTP", conn=0x298d300, worker=0x2281ef8, s=0x1e69448) at proxy_util.c:2706
        rv = <value optimized out>
        connected = 0
        loglevel = <value optimized out>
        backend_addr = 0x612e6474732e3034
        local_addr = <value optimized out>
        newsock = 0x2997540
        sconf = <value optimized out>
        conf = 0x1c07ef8
#1  0x00007f28601499dc in proxy_http_handler (r=0x7f2834002970, worker=0x2281ef8, conf=0x1c07ef8, url=0x7f28340073e0 "http://127.0.6.48/TravelDbServices/ws/ProvideTravelDbPassengerList-v1", 
    proxyname=0x0, proxyport=0) at mod_proxy_http.c:1991
        locurl = 0x7f28340074e8 "/TravelDbServices/ws/ProvideTravelDbPassengerList-v1"
        status = 0
        server_portstr = "\000\342\370\002", '\000' <repeats 12 times>, "w\230\267c(\177\000\000\063\322>d(\177\000"
        scheme = <value optimized out>
        proxy_function = 0x7f286014c1fe "HTTP"
        u = <value optimized out>
        backend = 0x298d300
        is_ssl = 36183800
        c = 0x7f2844002d28
        retry = <value optimized out>
        p = 0x7f28340028f8
        uri = 0x7f2834007428
#2  0x00007f2860352ef2 in proxy_run_scheme_handler (r=0x7f2834002970, worker=0x2281ef8, conf=0x1c07ef8, 
    url=0x7f28340073e0 "http://127.0.6.48/TravelDbServices/ws/ProvideTravelDbPassengerList-v1", proxyhost=0x0, proxyport=0) at mod_proxy.c:2806
        pHook = <value optimized out>
        n = <value optimized out>
        rv = -1
#3  0x00007f2860357f77 in proxy_handler (r=0x7f2834002970) at mod_proxy.c:1162
        url = 0x7f28340073e0 "http://127.0.6.48/TravelDbServices/ws/ProvideTravelDbPassengerList-v1"
        uri = 0x7f28340072ce "balancer://tomcat7_80/TravelDbServices/ws/ProvideTravelDbPassengerList-v1"
        scheme = 0x7f2834007360 "balancer"
        p = <value optimized out>
        p2 = <value optimized out>
        sconf = <value optimized out>
        conf = 0x1c07ef8
        proxies = 0x1c08028
        ents = 0x1c031f8
        i = <value optimized out>
        access_status = 0
        direct_connect = <value optimized out>
        str = <value optimized out>
        maxfwd = <value optimized out>
        balancer = 0x227e128
        worker = 0x2281ef8
        attempts = 0
        max_attempts = 1
        list = <value optimized out>
        saved_status = <value optimized out>


With the following messages in the error log :
		
[Thu Oct 29 19:53:32.564666 2015] [proxy:error] [pid 31761:tid 139811349370624] (22)Invalid argument: AH00957: HTTP: attempt to connect to ?:21768 (127.0.6.48) failed
[Thu Oct 29 19:53:32.564661 2015] [proxy:error] [pid 31761:tid 139811359860480] (22)Invalid argument: AH00957: HTTP: attempt to connect to ?:21768 (127.0.6.48) failed
[Thu Oct 29 19:53:33.555683 2015] [core:notice] [pid 30780:tid 139811468363584] AH00051: child pid 31761 exit signal Bus error (7), possible coredump in /tech/apache/var/logs/switch
Comment 6 Sylvain Goulmy 2015-11-03 14:20:13 UTC
Hi,

Today i have had two more cores on my http instance in get_container method. These two cores were linked to the following error message in the error log :

[Tue Nov 03 11:40:49.837710 2015] [proxy_http:error] [pid 18490:tid 139811338880768] [client 10.70.1.43:48219] AH01114: HTTP: failed to make connection to backend: 127.0.6.57
[Tue Nov 03 12:29:28.355655 2015] [proxy:error] [pid 13344:tid 139811349370624] (70007)The timeout specified has expired: AH00941: HTTP: failed to acquire connection for (127.0.6.57)
[Tue Nov 03 12:29:28.355723 2015] [proxy_http:error] [pid 13344:tid 139811359860480] [client 10.70.1.43:17359] AH01114: HTTP: failed to make connection to backend: 127.0.6.57
[Tue Nov 03 12:29:29.346693 2015] [core:notice] [pid 30780:tid 139811468363584] AH00051: child pid 13344 exit signal Segmentation fault (11), possible coredump in /tech/apache/var/logs/switch


[Tue Nov 03 12:59:42.492382 2015] [proxy:error] [pid 14803:tid 139811251812096] (70007)The timeout specified has expired: AH00941: HTTP: failed to acquire connection for (127.0.6.57)
[Tue Nov 03 12:59:42.492369 2015] [proxy:error] [pid 14803:tid 139811241322240] (70007)The timeout specified has expired: AH00941: HTTP: failed to acquire connection for (127.0.6.57)
[Tue Nov 03 12:59:42.492432 2015] [proxy_http:error] [pid 14371:tid 139811338880768] [client 10.70.1.43:6279] AH01114: HTTP: failed to make connection to backend: 127.0.6.57
[Tue Nov 03 12:59:42.492524 2015] [proxy:error] [pid 14803:tid 139811338880768] (70007)The timeout specified has expired: AH00941: HTTP: failed to acquire connection for (127.0.6.57)
[Tue Nov 03 12:59:48.489341 2015] [core:notice] [pid 30780:tid 139811468363584] AH00051: child pid 14803 exit signal Segmentation fault (11), possible coredump in /tech/apache/var/logs/switch
Comment 7 Sylvain Goulmy 2015-11-13 09:24:23 UTC
Hi,

After removing the BalancerPersist directive the core dumps stop appearing for four days now whereas we used to have one or several core files by day.

I'll give you a new update in a week to confirm that this directive is the root cause.

Rgds.