The following error message appears in the error log file of a virtual host under load: [warn] (45)Deadlock situation detected/avoided: Failed to acquire global mutex lock Then, the following error message appears in the main server error log: [emerg] (45)Deadlock situation detected/avoided: apr_proc_mutex_lock failed. Attempting to shutdown process gracefully. Then a new process is spawned and operations resume. I have this problem with Apache 2.0.52 on Sun Solaris 8 (USparc), but I am unable to reproduce it using Apache 2.0.49 with the exact same configuration. During compilation of both server versions I use the following configuration flags (in addition to enabling modules): --enable-rule=SHARED_CORE --enable-rule=SSL_EXPERIMENTAL --with-mpm=worker --enable-nonportable-atomics --with-ssl=/u01/opt --with-expat=$PWD/`ls -d srclib/apr-util/xml/expat` # apache2/bin/httpd -V Server version: Apache/2.0.52 Server built: Oct 19 2004 12:07:03 Server's Module Magic Number: 20020903:9 Architecture: 32-bit Server compiled with.... -D APACHE_MPM_DIR="server/mpm/worker" -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_FCNTL_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D HTTPD_ROOT="/u01/opt/apache2" -D SUEXEC_BIN="/u01/opt/apache2/bin/suexec" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" # apache2/bin/httpd -l Compiled in modules: core.c mod_access.c mod_log_config.c mod_env.c mod_headers.c mod_setenvif.c mod_proxy.c proxy_connect.c proxy_ftp.c proxy_http.c worker.c http_core.c mod_mime.c mod_status.c mod_autoindex.c mod_cgi.c mod_negotiation.c mod_dir.c mod_userdir.c mod_alias.c mod_rewrite.c mod_so.c DSO Modules included: apache2.0.52/modules/mod_info.so apache2.0.52/modules/mod_expires.so apache2.0.52/modules/mod_deflate.so apache2.0.52/modules/mod_ssl.so siteminder/webagent/lib/libmod_sm20.so siteminder/webagent/lib/libbtunicode.so siteminder/webagent/lib/libsmlogging.so siteminder/webagent/lib/libsmgda.so siteminder/webagent/lib/libsmvariable.so I use AcceptMutex default SSLMutex default According to the logs they default to fcntl and shmcb respectively. Another user has reported getting the same error with 2.0.51 on the Apache mailing list. He also uses the worker MPM and modSSL, but not the siteminder stuff.
The default mutex type changed to fcntl (from pthread?) between .49 and .52, so a workaround other than using prefork might be to substitute "pthread" for "default" in the locking directives. But this shouldn't happen, regardless. Is /u01/opt/apache2 a local filesystem, not an NFS mount etc?
Google finds reports of this happening on Solaris when using fcntl locking with 1.3 so perhaps that rules it being anything to do with worker and threading: http://article.gmane.org/gmane.comp.apache.user/36430 An interesting link with Netegrity modules mentioned here: http://archive.apache.org/gnats/5499 "If you have more than one application running on the server that can cause time delays, in one case it was Netegrity Web Agent, Vignette V5 5.6.2, Apache can get confused."
1. I have defined AcceptMutex default so I actually have no idea of what file would be used. 2. You are right: according to the logs, on Solaris, in 2.0.49 AcceptMutex defaults to pthread, while it defaults to fcntl in 2.0.52 3. In any event, I do not have any NFS file systems mounted on the server 4. I have ruled out the Netegrity Siteminder WebAgent because somebody not using the Siteminder stuff reported having the same problem with 2.0.51 5. I do have more than one Listen directive I would like to avoid using prefork, so one of these days I will try to change the AcceptMutex to pthread to see if it makes any difference.
Today I had the opportunity to make additional load tests after having added "AcceptMutex pthread" to the configuration. So far, I have not observed any error messages about deadlock situations, nor any server restarts.
I am experiencing the same problem with Solaris 5.7 and apache 2.0.52 using mod_ssl and mod_proxy. As it is overloading our jetty server running behind apache, it seems to be a problem related to network socket.
I am also noticing this problem on Solaris 8 - I am using Sun's OpenSSL package that is installed as part of their Crypto Accelerator 1000 hardware, worker threads, and the following rules defined: ./configure \ --enable-so \ --with-mpm=worker \ --enable-mods-shared=most \ --enable-ssl \ --with-ssl=/opt/SUNWconn/crypto \ --enable-dav \ --enable-so \ --enable-deflate \ --enable-proxy=static \ --enable-proxy-http=static \ --enable-nonportable-atomics=yes I did not have AcceptMutex or SSLMutex defined in my httpd.conf, so I defined them as: AcceptMutex pthread SSLMutex sem I should have defined the SSLMutex anyway since I'm using SSLSessionCache - my bad. I'm testing the change of AcceptMutex and will report back.
I should add that I experienced this problem with 2.0.53 - I have also upgraded to 2.0.54. (In reply to comment #6) > I am also noticing this problem on Solaris 8 - I am using Sun's OpenSSL package > that is installed as part of their Crypto Accelerator 1000 hardware, worker > threads, and the following rules defined: > > ./configure \ > --enable-so \ > --with-mpm=worker \ > --enable-mods-shared=most \ > --enable-ssl \ > --with-ssl=/opt/SUNWconn/crypto \ > --enable-dav \ > --enable-so \ > --enable-deflate \ > --enable-proxy=static \ > --enable-proxy-http=static \ > --enable-nonportable-atomics=yes > > I did not have AcceptMutex or SSLMutex defined in my httpd.conf, so I defined > them as: > > AcceptMutex pthread > SSLMutex sem > > I should have defined the SSLMutex anyway since I'm using SSLSessionCache - my bad. > > I'm testing the change of AcceptMutex and will report back.
The current default is deemed to be the best trade-off, it's not clear what there is left to change here. Since: 1) this issue was mostly seen by people using a particular third-party module 2) it was also seen by people using 1.3 3) it only happens on Solaris I'd guess this is either something to do with a third-party module doing something weird, or some system tuning issue. (1) and (3) can be eliminated from enquiries by taking up a support issue with the appropriate vendors.