While load testing our apache web servers this week I observed in the server-status output numerous requests in a state ?...reading... In addition during this load test the apache web server was hitting the maxclients setting which is unusual for the load we were applying. I did a pstack on one of the httpd child processes and noticed numerous threads in a 'zombie' state. This is not crashing apache and I have not noticed an increase in CPU or memory utilization. The zombie threads however do not go away until the httpd process is stopped and started again. The pstack output is included below. The apache version is 2.2.22 and was compiled using the Sun Studio 11 compiler. I can provide the config, a core dump, etc if necessary. If anyone has any troubleshooting recommendations I would greatly appreciate it. Server version: Apache/2.2.22 (Unix) Server built: Aug 16 2012 09:46:33 Server's Module Magic Number: 20051115:30 Server loaded: APR 1.4.5, APR-Util 1.4.1 Compiled using: APR 1.4.5, APR-Util 1.4.1 Architecture: 64-bit Server MPM: Worker threaded: yes (fixed thread count) forked: yes (variable process count) Server compiled with.... -D APACHE_MPM_DIR="server/mpm/worker" -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_FCNTL_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=128 -D HTTPD_ROOT="/app/asf/WS/2.2" -D SUEXEC_BIN="/app/asf/WS/2.2/bin/suexec" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" 19836: /app/asf/WS/2.2/bin/httpd -D SSL -f /app/asf/WS/2.2/conf/qaws04.nyenet ----------------- lwp# 1 / thread# 1 -------------------- ffffffff7d8a9088 lwp_wait (6, ffffffff7ffff2ac) ffffffff7da0b690 _thrp_join (6, 0, ffffffff7ffff368, 1, 0, 0) + 48 ffffffff7eb2bf30 apr_thread_join (ffffffff7ffff434, 100299d30, fffffffffffffff0, 100299d58, 7a120, 100299d58) + c 0000000100054560 child_main (7a000, 1003e60c0, 2, 10029a178, 10017a05c, 10017d328) + 7b8 00000001000550c8 perform_idle_server_maintenance (ffffffffffffffff, 0, 0, 0, 10017a030, 10017a060) + 750 0000000100055500 server_main_loop (0, ffffffff, 0, 100308250, 100308250, 1) + 2e8 00000001000558e0 ap_mpm_run (2, 10017e000, 100070000, 100070000, 100070000, 10017d374) + 3b0 0000000100020470 main (100000, 100177, 1001971b8, 100177a38, 100000, 2400) + ba8 000000010001f79c _start (0, 0, 0, 0, 0, 0) + 17c ----------------- lwp# 30 / thread# 30 -------------------- ffffffff7d8a755c poll (ffffffff757fbb60, 0, 3e8) ffffffff7da1091c select (0, 0, 0, 0, ffffffff757fbcd0, 1) + 6c ffffffff7c3851a8 __1cIcm_sleep6FLl_v_ (1, 0, 6, 0, 0, 0) + 30 ffffffff7c38673c ConnectionService (10046fe40, ffffffff7c62f950, ffffffff7c62fd41, 7, 1, 3) + 35c ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 6 / thread# 6 -------------------- ffffffff7da17de0 lwp_park (0, 0, 0) ffffffff7da1437c slow_lock (1002c04b8, ffffffff7cf01400, ffffffff7992b498, 75a8, a, 7400) + 58 ffffffff7916155c CRYPTO_add_lock (100185538, 1, a, ffffffff7945bf00, 120, c00) + 4c ffffffff7944c094 ssl_cert_dup (100252950, 1002529b0, 1006f5950, 0, ffffffff7945bf00, 0) + 2e4 ffffffff79446088 SSL_new (1006fcfe0, 100252500, 105400, 11b66c, ffffffff79561640, ffffffff7945be48) + b8 ffffffff79808f04 ssl_init_ssl_connection (48, 1006dd028, 1006dd7c8, 80, ffffffff7992b498, 1001b1ea8) + dc 0000000100043aa0 ap_process_connection (1006dd028, 1006dced0, 1001b9858, 3, 10017d2b0, 2) + 40 0000000100053470 worker_thread (100299d30, 1006dce58, 1006dd028, 10017d358, 10017a048, 18) + 2b8 ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 8 / thread# 8 -------------------- ffffffff7da17de0 lwp_park (0, 0, 0) ffffffff7da1437c slow_lock (1002c04b8, ffffffff7cf01c00, ffffffff7992b498, 75a8, a, 7400) + 58 ffffffff7916155c CRYPTO_add_lock (100185538, 1, a, ffffffff7945bf00, 120, c00) + 4c ffffffff7944c094 ssl_cert_dup (100252950, 1002529b0, 1006256e0, 0, ffffffff7945bf00, 0) + 2e4 ffffffff79446088 SSL_new (1006253a0, 100252500, 105400, 11b66c, ffffffff79561640, ffffffff7945be48) + b8 ffffffff79808f04 ssl_init_ssl_connection (48, 100623588, 100623d28, 80, ffffffff7992b498, 1001b1ea8) + dc 0000000100043aa0 ap_process_connection (100623588, 100623430, 1001b9858, 5, 10017d2b0, 2) + 40 0000000100053470 worker_thread (100299d90, 1006233b8, 100623588, 10017d358, 10017a048, 28) + 2b8 ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 10 / thread# 10 -------------------- ffffffff7da17de0 lwp_park (0, 0, 0) ffffffff7da1437c slow_lock (1002c04b8, ffffffff7cf02400, ffffffff7992b498, 75a8, a, 7400) + 58 ffffffff7916155c CRYPTO_add_lock (100185538, 1, a, ffffffff7945bf00, 120, c00) + 4c ffffffff7944c094 ssl_cert_dup (100252950, 1002529b0, 10046d9d0, 0, ffffffff7945bf00, 0) + 2e4 ffffffff79446088 SSL_new (1004e8550, 100252500, 105400, 11b66c, ffffffff79561640, ffffffff7945be48) + b8 ffffffff79808f04 ssl_init_ssl_connection (48, 1004e6738, 1004e6ed8, 80, ffffffff7992b498, 1001b1ea8) + dc 0000000100043aa0 ap_process_connection (1004e6738, 1004e65e0, 1001b9858, 7, 10017d2b0, 2) + 40 0000000100053470 worker_thread (100299df0, 1004e6568, 1004e6738, 10017d358, 10017a048, 38) + 2b8 ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 12 / thread# 12 -------------------- ffffffff7da17de0 lwp_park (0, 0, 0) ffffffff7da1437c slow_lock (1002c04b8, ffffffff7cf02c00, ffffffff7992b498, 75a8, a, 7400) + 58 ffffffff7916155c CRYPTO_add_lock (100185538, 1, a, ffffffff7945bf00, 120, c00) + 4c ffffffff7944c094 ssl_cert_dup (100252950, 1002529b0, 1004e1120, 0, ffffffff7945bf00, 0) + 2e4 ffffffff79446088 SSL_new (1004e5920, 100252500, 105400, 11b66c, ffffffff79561640, ffffffff7945be48) + b8 ffffffff79808f04 ssl_init_ssl_connection (48, 1004e1af8, 1004e2298, 80, ffffffff7992b498, 1001b1ea8) + dc 0000000100043aa0 ap_process_connection (1004e1af8, 1004e19a0, 1001b9858, 9, 10017d2b0, 2) + 40 0000000100053470 worker_thread (100299e50, 1004e1928, 1004e1af8, 10017d358, 10017a048, 48) + 2b8 ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 20 / thread# 20 -------------------- ffffffff7da17de0 lwp_park (0, 0, 0) ffffffff7da1437c slow_lock (1002c04b8, ffffffff73600c00, ffffffff7992b498, 75a8, a, 7400) + 58 ffffffff7916155c CRYPTO_add_lock (100185538, 1, a, ffffffff7945bf00, 120, c00) + 4c ffffffff7944c094 ssl_cert_dup (100252950, 1002529b0, 100323c80, 0, ffffffff7945bf00, 0) + 2e4 ffffffff79446088 SSL_new (100323940, 100252500, 105400, 11b66c, ffffffff79561640, ffffffff7945be48) + b8 ffffffff79808f04 ssl_init_ssl_connection (48, 1004ad7a8, 1004adf48, 80, ffffffff7992b498, 1001b1ea8) + dc 0000000100043aa0 ap_process_connection (1004ad7a8, 1004ad650, 1001b9858, 11, 10017d2b0, 2) + 40 0000000100053470 worker_thread (100299fd0, 1004ad5d8, 1004ad7a8, 10017d358, 10017a048, 88) + 2b8 ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 25 / thread# 25 -------------------- ffffffff7da17de0 lwp_park (0, 0, 0) ffffffff7da1437c slow_lock (1002c04b8, ffffffff73602000, ffffffff7992b498, 75a8, a, 7400) + 58 ffffffff7916155c CRYPTO_add_lock (100185538, 1, a, ffffffff7945bf00, 120, c00) + 4c ffffffff7944c094 ssl_cert_dup (100252950, 1002529b0, 10045a680, 0, ffffffff7945bf00, 0) + 2e4 ffffffff79446088 SSL_new (10045a340, 100252500, 105400, 11b66c, ffffffff79561640, ffffffff7945be48) + b8 ffffffff79808f04 ssl_init_ssl_connection (48, 1004e9348, 1004e9ae8, 80, ffffffff7992b498, 1001b1ea8) + dc 0000000100043aa0 ap_process_connection (1004e9348, 1004e91f0, 1001b9858, 16, 10017d2b0, 2) + 40 0000000100053470 worker_thread (10029a0c0, 1004e9178, 1004e9348, 10017d358, 10017a048, b0) + 2b8 ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 26 / thread# 26 -------------------- ffffffff7da17de0 lwp_park (0, 0, 0) ffffffff7da1437c slow_lock (1002c04b8, ffffffff73602400, ffffffff7992b498, 75a8, a, 7400) + 58 ffffffff7916155c CRYPTO_add_lock (100185538, ffffffffffffffff, a, ffffffff792b8310, 189, c00) + 4c ffffffff7920ff54 EVP_PKEY_free (100185530, 10dc00, ffffffff793c61a8, ffffffff793c61a8, 1b627c, 1a800) + 34 ffffffff7944c1f0 ssl_cert_free (10040d9c0, ffffffffffefa8c0, 105400, 1, 1154ec, 10040da20) + a8 ffffffff794467d0 SSL_free (100470480, 3e9, 6, 0, 11afdc, 100470570) + 178 ffffffff79811634 ssl_io_filter_output (ffffffffffefb2c0, 48, 100492c10, 10048a418, ffffffff7992b498, 100470480) + 4fc 000000010004394c ap_lingering_close (100489a70, 100489bc8, 100179000, 100489c70, 100179, 100000) + 3c 0000000100053478 worker_thread (10029a0f0, 1004899f8, 100489bc8, 10017d358, 10017a048, b8) + 2c0 ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 27 / thread# 27 -------------------- ffffffff7da17de0 lwp_park (0, 0, 0) ffffffff7da1437c slow_lock (1002c04b8, ffffffff73602800, ffffffff7992b498, 75a8, a, 7400) + 58 ffffffff7916155c CRYPTO_add_lock (100185d58, ffffffffffffffff, a, ffffffff792b8310, 189, c00) + 4c ffffffff7920ff54 EVP_PKEY_free (100185d50, 10dc00, ffffffff7992b498, ffffffff793c61a8, 1b627c, 7400) + 34 ffffffff79222c3c X509_PUBKEY_get (100185d50, 100300bd0, ffffffffffef36e8, ffffffff793dc5c8, 10c800, 100185fd0) + 14c ffffffff7924b85c internal_verify (ffffffff723fb630, ffffffff792499c4, 0, 22, 100300a70, 4000) + d8 ffffffff792498a8 X509_verify_cert (1, ffffffff723fb630, ffffffff723fb568, 0, 0, ffffffff792492d0) + 5b8 ffffffff7942e2c4 ssl3_output_cert_chain (1003291f0, 100300a70, 0, 7, 133414, 1004135a0) + 9c ffffffff7941c3cc ssl3_accept (1003291f0, 100475ac9, 2150, 2, 21a0, 2112) + f14 ffffffff79430158 ssl23_get_client_hello (1003291f0, b, 300, ffffffff79561640, 3, ffffffff723fb8d4) + 8b8 ffffffff7942f818 ssl23_accept (1003291f0, ffffffff79815668, 4000, 100307d10, ffffffff79561640, 2210) + 298 ffffffff798104a8 ssl_io_filter_connect (100308d30, 100308518, 0, 11b324, 100308cb8, ffffffff7992b498) + 338 ffffffff79810c24 ssl_io_filter_input (100472ad8, 100475070, 1, 0, 0, 100472ad8) + 98 000000010002a098 ap_rgetline_core (100473b40, 2000, ffffffff723fbcf0, 100473b10, 0, 100475070) + 70 000000010002ae24 ap_read_request (100308518, 64, 3, 0, 100473b40, ffffffff79701968) + 17c 00000001000488b8 ap_process_http_connection (100308518, 0, 80000000, 1002194a8, 100179, 10017e000) + 14 0000000100043b30 ap_process_connection (100308518, 1003083c0, 1002194a8, 0, 10017d2b0, 3) + d0 0000000100053470 worker_thread (10029a120, 100308348, 100308518, 10017d358, 10017a048, c0) + 2b8 ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 31 / thread# 31 -------------------- ffffffff7d8a755c poll (ffffffff6d7fbcf0, 0, 3e8) ffffffff7da1091c select (0, 0, 0, 0, ffffffff6d7fbe68, ffffffff7c62d2c8) + 6c ffffffff7c36b310 __1cPCSmWorkerThreadFSleep6MLl_v_ (10045b820, 1, 0, 0, 1, 0) + 30 ffffffff7c33392c __1cPCSmAdminManagerRManageAgentThread6Fpv_1_ (10045b820, 0, 0, 0, 0, 100322418) + 54 ffffffff7da17cd0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 7 / thread# 7 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 9 / thread# 9 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 13 / thread# 13 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 14 / thread# 14 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 15 / thread# 15 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 11 / thread# 11 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 16 / thread# 16 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 17 / thread# 17 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 18 / thread# 18 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 21 / thread# 21 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 22 / thread# 22 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 23 / thread# 23 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 24 / thread# 24 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) ** ----------------- lwp# 19 / thread# 19 -------------------- ffffffff7eb2bd88 dummy_worker(), exit value = 0x0000000000000000 ** zombie (exited, not detached, not yet joined) **
This is a process trying to exit from e.g. MaxRequestsPerChild or MaxSpareThreads, but it still has threads wrapping up their work. But, these threads in SSL are blocked on a lock and not just waiting for some I/O to finish, which is why they don't wrap up without taking any action. In the meantime I'd suggest for relief maxRequestsPerChild = 0 and MaxSpareThreads=MaxClients do you don't end up with half-exited processes building up.
As this is an issue encountered in load testing and presumably not a production environment, you might try to reproduce with the latest OpenSSL. BTW, what level of OpenSSL is this?
(In reply to comment #2) > As this is an issue encountered in load testing and presumably not a > production environment, you might try to reproduce with the latest OpenSSL. > > BTW, what level of OpenSSL is this? I would be happy to replace OpenSSL if there is a newer version currently we are using: /usr/local/ssl/bin] ./openssl version OpenSSL 1.0.1b 26 Apr 2012 Here is the configure options for my last apache compile: CC="/opt/SUNWspro/bin/cc"; export CC CFLAGS="-xarch=generic64 -xO5"; export CFLAGS CPPFLAGS="-I/usr/local/ssl/includ/openssl"; export CPPFLAGS LDFLAGS="-L/usr/local/ssl/lib -R/usr/local/ssl/lib"; export LDFLAGS "./configure" \ "--prefix=/app/asf/WS/2.2" \ "--with-included-apr" \ "--with-mpm=worker" \ "--with-ssl=/usr/local/ssl" \ "--enable-ssl=shared" \ "--disable-userdir" \ "--disable-asis" \ "--disable-autoindex" \ "--disable-mod_authn_file" \ "--disable-mod_authn_default" \ "--disable-mod_authz_host" \ "--disable-mod_authz_groupfile" \ "--disable-mod-authz_user" \ "--disable-mod-authz_default" \ "--disable-mod_auth_basic" \ "--enable-mods-shared=most" \ "CC=/opt/SUNWspro/bin/cc" \ "CFLAGS=-xarch=generic64 -xO5" \ "LDFLAGS=-L/usr/local/ssl/lib -R/usr/local/ssl/lib" \ "CPPFLAGS=-I/usr/local/ssl/includ/openssl" \ "$@" Thanks, Jeff
Nothing between 1.0.1b and 1.0.1c looks interesting w.r.t. this issue.
Could it be this one: http://rt.openssl.org/Ticket/Display.html?id=2813 Patch for OpenSSL 1.0.1 would be: http://cvs.openssl.org/chngview?cn=22570 Regards, Rainer
(In reply to comment #5) > Could it be this one: > > http://rt.openssl.org/Ticket/Display.html?id=2813 > > Patch for OpenSSL 1.0.1 would be: > > http://cvs.openssl.org/chngview?cn=22570 > > Regards, > > Rainer Thank you for the recommendation. I will patch openssl and recompile apache and test again. -Jeff
I just observed the same (or at least a very similar) behavior on a Linux installation. I tried to fix it by patching OpenSSL, but this didn't change anything. What should I check next?
For those seeing the issue: Are you using an engine (SSLCryptoDevice)? I know some register their own callbacks for locks - not sure if that's coming into play. I see similar behavior as well on Solaris 10 w/ worker in openssl-0.9.8u, but far fewer zombie processes. This is on a particularly busy production install rather than under a load test scenario. Just like the original bug, I see a few threads interacting with a proxied resource with most threads sitting in CRYPTO_add_lock. There is at least one zombie thread in 5 of the 6 running child processes. The child process without any zombies are still sitting in CRYPTO_add_lock. I can attach pstack output as well, but the short version is: *Very few threads doing work *Many threads waiting in CRYPTO_add_lock *Did not seem to happen in 2.2.13 w/ openssl-0.9.8l with an otherwise identical config Happy to provide more info if needed
Please ignore my comment 7, the mentioned OpenSSL patch seems to fix the issue, I just did not apply it correctly. So thanks for the hint.
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd. As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd. If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question. If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with. Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.