Bug 62357 - Proxy hanging at reslist lock (pthread_mutex_lock / pthread_cond_signal)
Summary: Proxy hanging at reslist lock (pthread_mutex_lock / pthread_cond_signal)
Status: NEEDINFO
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy (show other bugs)
Version: 2.4.33
Hardware: Other Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-07 09:17 UTC by Rainer Jung
Modified: 2019-10-09 07:43 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Jung 2018-05-07 09:17:02 UTC
Latest httpd 2.4.33 using APR 1.6.3, APU 1.6.1, event MPM and HTTP/2.

Plattform is CentOS 7.4.1708, kernel 3.10.0-693.21.1.el7.x86_64, glibc glibc-2.17-196.el7_4.2.

One child process hangs, worker threads are hanging while for the reslist lock for one proxied backend.

22 Threads in stack:

#0  0x00007fb714a1942d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fb714a14dcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007fb714a14c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fb7152884d9 in apr_thread_mutex_lock (mutex=<optimized out>) at locks/unix/thread_mutex.c:92
#4  0x00007fb7154bef71 in apr_reslist_acquire (reslist=0x7fb6c4032d80, resource=resource@entry=0x7fb70965dab0) at misc/apr_reslist.c:334
#5  0x00007fb7121e5295 in ap_proxy_acquire_connection (proxy_function=proxy_function@entry=0x7fb711fd0cdc "HTTP", conn=conn@entry=0x7fb70965dab0, worker=worker@entry=0x227a7e8, s=0x225d9c8) at proxy_util.c:2140
#6  0x00007fb711fce154 in proxy_http_handler (r=0x7fb6ac05f920, worker=0x227a7e8, conf=0x2202d78, url=0x7fb6ac061a60 "http://REDACTED_IP:REDACTED_PORT/", proxyname=0x0, proxyport=0) at mod_proxy_http.c:1908
#7  0x00007fb7121df244 in proxy_run_scheme_handler (r=r@entry=0x7fb6ac05f920, worker=0x227a7e8, conf=conf@entry=0x2202d78, url=0x7fb6ac061a60 "http://REDACTED_IP:REDACTED_PORT/", proxyhost=proxyhost@entry=0x0, proxyport=proxyport@entry=0) at mod_proxy.c:3046
#8  0x00007fb7121e034d in proxy_handler (r=0x7fb6ac05f920) at mod_proxy.c:1241
#9  0x0000000000453c00 in ap_run_handler (r=0x7fb6ac05f920) at config.c:170
#10 0x0000000000454149 in ap_invoke_handler (r=r@entry=0x7fb6ac05f920) at config.c:444
#11 0x0000000000469cda in ap_process_async_request (r=0x7fb6ac05f920) at http_request.c:436
#12 0x0000000000469f9f in ap_process_request (r=r@entry=0x7fb6ac05f920) at http_request.c:471
#13 0x00007fb7114d72eb in h2_task_process_request (c=0x7fb6ac0598f0, task=0x7fb6ac05d910) at h2_task.c:678
#14 h2_task_process_conn (c=0x7fb6ac0598f0) at h2_task.c:725
#15 0x000000000045d8e0 in ap_run_process_connection (c=0x7fb6ac0598f0) at connection.c:42
#16 0x00007fb7114d877c in h2_task_do (task=0x7fb6ac05d910, thread=thread@entry=0x2221868, worker_id=<optimized out>) at h2_task.c:635
#17 0x00007fb7114dc2c4 in slot_run (thread=0x2221868, wctx=0x234f4c8) at h2_workers.c:231
#18 0x00007fb714a12e25 in start_thread () from /lib64/libpthread.so.0
#19 0x00007fb71453c34d in clone () from /lib64/libc.so.6

One thread in:

#0  0x00007fb714a1942d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fb714a14dcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007fb714a14c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fb7152884d9 in apr_thread_mutex_lock (mutex=<optimized out>) at locks/unix/thread_mutex.c:92
#4  0x00007fb7154bf143 in apr_reslist_release (reslist=0x7fb6c4032d80, resource=resource@entry=0x7fb6c4034c30) at misc/apr_reslist.c:414
#5  0x00007fb7121e54ca in connection_cleanup (theconn=0x7fb6c4034c30) at proxy_util.c:1398
#6  ap_proxy_release_connection (proxy_function=<optimized out>, conn=conn@entry=0x7fb6c4034c30, s=<optimized out>) at proxy_util.c:2178
#7  0x00007fb711fcd2bd in ap_proxy_http_process_response (p=p@entry=0x7fb6d4027548, r=r@entry=0x7fb6d40275c0, backend_ptr=backend_ptr@entry=0x7fb6fe647ab0, server_portstr=server_portstr@entry=0x7fb6fe647b50 "", conf=0x2202d78, conf=0x2202d78, worker=<optimized out>) at mod_proxy_http.c:1764
#8  0x00007fb711fcea4f in proxy_http_handler (r=<optimized out>, worker=<optimized out>, conf=0x2202d78, url=0x7fb6d40297a8 "http://REDACTED_IP:REDACTED_PORT/REDACTED_URI", proxyname=0x0, proxyport=0) at mod_proxy_http.c:1986
#9  0x00007fb7121df244 in proxy_run_scheme_handler (r=r@entry=0x7fb6d40275c0, worker=0x227a7e8, conf=conf@entry=0x2202d78, url=0x7fb6d40297a8 "http://REDACTED_IP:REDACTED_PORT/REDACTED_URI", proxyhost=proxyhost@entry=0x0, proxyport=proxyport@entry=0) at mod_proxy.c:3046
#10 0x00007fb7121e034d in proxy_handler (r=0x7fb6d40275c0) at mod_proxy.c:1241
#11 0x0000000000453c00 in ap_run_handler (r=0x7fb6d40275c0) at config.c:170
#12 0x0000000000454149 in ap_invoke_handler (r=r@entry=0x7fb6d40275c0) at config.c:444
#13 0x0000000000469cda in ap_process_async_request (r=0x7fb6d40275c0) at http_request.c:436
#14 0x0000000000469f9f in ap_process_request (r=r@entry=0x7fb6d40275c0) at http_request.c:471
#15 0x00007fb7114d72eb in h2_task_process_request (c=0x7fb6d4016120, task=0x7fb6d4006980) at h2_task.c:678
#16 h2_task_process_conn (c=0x7fb6d4016120) at h2_task.c:725
#17 0x000000000045d8e0 in ap_run_process_connection (c=0x7fb6d4016120) at connection.c:42
#18 0x00007fb7114d877c in h2_task_do (task=0x7fb6d4006980, thread=thread@entry=0x2222af8, worker_id=<optimized out>) at h2_task.c:635
#19 0x00007fb7114dc2c4 in slot_run (thread=0x2222af8, wctx=0x234ef48) at h2_workers.c:231
#20 0x00007fb714a12e25 in start_thread () from /lib64/libpthread.so.0
#21 0x00007fb71453c34d in clone () from /lib64/libc.so.6

And then there's one more thread, which I think is the culprit:

#0  0x00007fb714a1942d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fb714a1705d in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#2  0x00007fb7154bf184 in apr_reslist_release (reslist=0x7fb6c4032d80, resource=resource@entry=0x7fb6e0016f90) at misc/apr_reslist.c:420
#3  0x00007fb7121e54ca in connection_cleanup (theconn=0x7fb6e0016f90) at proxy_util.c:1398
#4  ap_proxy_release_connection (proxy_function=<optimized out>, conn=conn@entry=0x7fb6e0016f90, s=<optimized out>) at proxy_util.c:2178
#5  0x00007fb711fcd2bd in ap_proxy_http_process_response (p=p@entry=0x7fb6e0012ef8, r=r@entry=0x7fb6e0012f70, backend_ptr=backend_ptr@entry=0x7fb6fee48ab0, server_portstr=server_portstr@entry=0x7fb6fee48b50 "", conf=0x2202d78, conf=0x2202d78, worker=<optimized out>) at mod_proxy_http.c:1764
#6  0x00007fb711fcea4f in proxy_http_handler (r=<optimized out>, worker=<optimized out>, conf=0x2202d78, url=0x7fb6e0015228 "http://REDACTED_IP:REDACTED_PORT/REDACTED_URI", proxyname=0x0, proxyport=0) at mod_proxy_http.c:1986
#7  0x00007fb7121df244 in proxy_run_scheme_handler (r=r@entry=0x7fb6e0012f70, worker=0x227a7e8, conf=conf@entry=0x2202d78, url=0x7fb6e0015228 "http://REDACTED_IP:REDACTED_PORT/REDACTED_URI", proxyhost=proxyhost@entry=0x0, proxyport=proxyport@entry=0) at mod_proxy.c:3046
#8  0x00007fb7121e034d in proxy_handler (r=0x7fb6e0012f70) at mod_proxy.c:1241
#9  0x0000000000453c00 in ap_run_handler (r=0x7fb6e0012f70) at config.c:170
#10 0x0000000000454149 in ap_invoke_handler (r=r@entry=0x7fb6e0012f70) at config.c:444
#11 0x0000000000469cda in ap_process_async_request (r=0x7fb6e0012f70) at http_request.c:436
#12 0x0000000000469f9f in ap_process_request (r=r@entry=0x7fb6e0012f70) at http_request.c:471
#13 0x00007fb7114d72eb in h2_task_process_request (c=0x7fb6e00018f0, task=0x7fb6e0005910) at h2_task.c:678
#14 h2_task_process_conn (c=0x7fb6e00018f0) at h2_task.c:725
#15 0x000000000045d8e0 in ap_run_process_connection (c=0x7fb6e00018f0) at connection.c:42
#16 0x00007fb7114d877c in h2_task_do (task=0x7fb6e0005910, thread=thread@entry=0x2222a20, worker_id=<optimized out>) at h2_task.c:635
#17 0x00007fb7114dc2c4 in slot_run (thread=0x2222a20, wctx=0x234ef88) at h2_workers.c:231
#18 0x00007fb714a12e25 in start_thread () from /lib64/libpthread.so.0
#19 0x00007fb71453c34d in clone () from /lib64/libc.so.6

I wonder, why the call pthread_cond_signal() issued from apr_reslist_release() at misc/apr_reslist.c:420 hangs in __lll_lock_wait(). What lock is that? If it where the reslist "listlock", that should already be locked by that signalling thread:

408 APU_DECLARE(apr_status_t) apr_reslist_release(apr_reslist_t *reslist,
409                                               void *resource)
410 {
411     apr_res_t *res;
412
413 #if APR_HAS_THREADS
414     apr_thread_mutex_lock(reslist->listlock);
415 #endif
416     res = get_container(reslist);
417     res->opaque = resource;
418     push_resource(reslist, res);
419 #if APR_HAS_THREADS
420     apr_thread_cond_signal(reslist->avail);
421     apr_thread_mutex_unlock(reslist->listlock);
422 #endif
423
424     return apr_reslist_maintain(reslist);
425 }

and from apr/locks/unix/thread_mutex.c:

 88 APR_DECLARE(apr_status_t) apr_thread_mutex_lock(apr_thread_mutex_t *mutex)
 89 {
 90     apr_status_t rv;
 91
 92     rv = pthread_mutex_lock(&mutex->mutex);
 93 #ifdef HAVE_ZOS_PTHREADS
 94     if (rv) {
 95         rv = errno;
 96     }
 97 #endif
 98
 99     return rv;
100 }

Unfortunately I can not reproduce, but I will add some logging to the reslist code for the case of errors during pthread_mutex_lock().

Regards,

Rainer
Comment 1 Antoine Prevosto 2018-06-25 12:14:06 UTC
We have faced several times the very same problem, since we upgraded our production servers from httpd-2.4.17 to httpd-2.4.29

Operating System : Red Hat Enterprise Linux Server release 6.6 (Santiago)
Kernel : 2.6.32-504.54.1.el6.x86_64
Apache NPM : worker

All threads are in pthread_mutex_lock() except on thread in pthread_cond_signal()
Comment 2 Servanator 2019-09-11 23:24:11 UTC
We are hitting exactly same issue from time to time on version 2.4.6 running on CentOS 7.

However the issue seemingly started happening after we removed `delayedcompress` option from Apache log rotation configuration, but looking at the stack trace, I don't see how it's related though.
Comment 3 Jim Jagielski 2019-09-24 19:15:49 UTC
Please confirm that the issue still exists with 2.4.41... there have been fixes to handle this exact problem.
Comment 4 Antoine Prevosto 2019-10-09 07:43:09 UTC
Hi,

We are currently using the 2.4.39 version, and we have not faced the problem for a while.

But we have moved now our HTTP routing to another infrastructure (F5), so the traffic on our Apache HTTP servers has dramatically decreased.

So I think this is no more really relevant.

Best regards.