32325 – Deadlock situation detected/avoided: Failed to acquire global mutex lock

Bug 32325 - Deadlock situation detected/avoided: Failed to acquire global mutex lock

Summary: Deadlock situation detected/avoided: Failed to acquire global mutex lock

Status:	RESOLVED INVALID

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	All (show other bugs)
Version:	2.0.52
Hardware:	Sun Solaris

Importance:	P3 normal (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-11-19 19:50 UTC by Axel-Stephane Smorgrav
Modified:	2005-10-28 10:00 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Axel-Stephane Smorgrav 2004-11-19 19:50:12 UTC

The following error message appears in the error log file of a virtual host 
under load:

[warn] (45)Deadlock situation detected/avoided: Failed to acquire global mutex 
lock

Then, the following error message appears in the main server error log:

[emerg] (45)Deadlock situation detected/avoided: apr_proc_mutex_lock failed. 
Attempting to shutdown process gracefully.

Then a new process is spawned and operations resume.

I have this problem with Apache 2.0.52 on Sun Solaris 8 (USparc), but I am 
unable to reproduce it using Apache 2.0.49 with the exact same configuration. 

During compilation of both server versions I use the following configuration 
flags (in addition to enabling modules): --enable-rule=SHARED_CORE 
--enable-rule=SSL_EXPERIMENTAL --with-mpm=worker --enable-nonportable-atomics 
--with-ssl=/u01/opt --with-expat=$PWD/`ls -d srclib/apr-util/xml/expat`

# apache2/bin/httpd -V
Server version: Apache/2.0.52
Server built:   Oct 19 2004 12:07:03
Server's Module Magic Number: 20020903:9
Architecture:   32-bit
Server compiled with....
 -D APACHE_MPM_DIR="server/mpm/worker"
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_FCNTL_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D HTTPD_ROOT="/u01/opt/apache2"
 -D SUEXEC_BIN="/u01/opt/apache2/bin/suexec"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"

# apache2/bin/httpd -l
Compiled in modules:
  core.c
  mod_access.c
  mod_log_config.c
  mod_env.c
  mod_headers.c
  mod_setenvif.c
  mod_proxy.c
  proxy_connect.c
  proxy_ftp.c
  proxy_http.c
  worker.c
  http_core.c
  mod_mime.c
  mod_status.c
  mod_autoindex.c
  mod_cgi.c
  mod_negotiation.c
  mod_dir.c
  mod_userdir.c
  mod_alias.c
  mod_rewrite.c
  mod_so.c

DSO Modules included:
apache2.0.52/modules/mod_info.so 
apache2.0.52/modules/mod_expires.so
apache2.0.52/modules/mod_deflate.so
apache2.0.52/modules/mod_ssl.so 
siteminder/webagent/lib/libmod_sm20.so
siteminder/webagent/lib/libbtunicode.so
siteminder/webagent/lib/libsmlogging.so
siteminder/webagent/lib/libsmgda.so
siteminder/webagent/lib/libsmvariable.so

I use 
AcceptMutex default
SSLMutex default

According to the logs they default to fcntl and shmcb respectively.

Another user has reported getting the same error with 2.0.51 on the Apache 
mailing list. He also uses the worker MPM and modSSL, but not the siteminder 
stuff.

Comment 1 Joe Orton 2004-11-25 14:39:57 UTC

The default mutex type changed to fcntl (from pthread?) between .49 and .52, so
a workaround other than using prefork might be to substitute "pthread" for
"default" in the locking directives.

But this shouldn't happen, regardless.  Is /u01/opt/apache2 a local filesystem,
not an NFS mount etc?

Comment 2 Joe Orton 2004-11-25 14:48:04 UTC

Google finds reports of this happening on Solaris when using fcntl locking with
1.3 so perhaps that rules it being anything to do with worker and threading:
http://article.gmane.org/gmane.comp.apache.user/36430

An interesting link with Netegrity modules mentioned here:
http://archive.apache.org/gnats/5499

 "If you have more than one application running on the server that can cause
 time delays, in one case it was Netegrity Web Agent, Vignette V5 5.6.2,
 Apache can get confused."

Comment 3 Axel-Stephane Smorgrav 2004-12-03 17:41:42 UTC

1. I have defined AcceptMutex default so I actually have no idea of what file 
would be used. 
2. You are right: according to the logs, on Solaris, in 2.0.49 AcceptMutex 
defaults to pthread, while it defaults to fcntl in 2.0.52
3. In any event, I do not have any NFS file systems mounted on the server
4. I have ruled out the Netegrity Siteminder WebAgent because somebody not using 
the Siteminder stuff reported having the same problem with 2.0.51
5. I do have more than one Listen directive

I would like to avoid using prefork, so one of these days I will try to change 
the AcceptMutex to pthread to see if it makes any difference.

Comment 4 Axel-Stephane Smorgrav 2004-12-13 14:31:05 UTC

Today I had the opportunity to make additional load tests after having added 
"AcceptMutex pthread" to the configuration. So far, I have not observed any 
error messages about deadlock situations, nor any server restarts.

Comment 5 Guilherme Assad 2005-01-21 19:41:34 UTC

I am experiencing the same problem with Solaris 5.7 and apache 2.0.52 using 
mod_ssl and mod_proxy. As it is overloading our jetty server running behind 
apache, it seems to be a problem related to network socket.

Comment 6 Ed Wittmann 2005-05-19 23:21:33 UTC

I am also noticing this problem on Solaris 8 - I am using Sun's OpenSSL package
that is installed as part of their Crypto Accelerator 1000 hardware, worker
threads, and the following rules defined:

./configure \
--enable-so \
--with-mpm=worker \
--enable-mods-shared=most \
--enable-ssl \
--with-ssl=/opt/SUNWconn/crypto \
--enable-dav \
--enable-so \
--enable-deflate \
--enable-proxy=static \
--enable-proxy-http=static \
--enable-nonportable-atomics=yes 

I did not have AcceptMutex or SSLMutex defined in my httpd.conf, so I defined
them as:

AcceptMutex pthread
SSLMutex sem

I should have defined the SSLMutex anyway since I'm using SSLSessionCache - my bad.

I'm testing the change of AcceptMutex and will report back.

Comment 7 Ed Wittmann 2005-05-19 23:26:11 UTC

I should add that I experienced this problem with 2.0.53 - I have also upgraded
to 2.0.54.

(In reply to comment #6)
> I am also noticing this problem on Solaris 8 - I am using Sun's OpenSSL package
> that is installed as part of their Crypto Accelerator 1000 hardware, worker
> threads, and the following rules defined:
> 
> ./configure \
> --enable-so \
> --with-mpm=worker \
> --enable-mods-shared=most \
> --enable-ssl \
> --with-ssl=/opt/SUNWconn/crypto \
> --enable-dav \
> --enable-so \
> --enable-deflate \
> --enable-proxy=static \
> --enable-proxy-http=static \
> --enable-nonportable-atomics=yes 
> 
> I did not have AcceptMutex or SSLMutex defined in my httpd.conf, so I defined
> them as:
> 
> AcceptMutex pthread
> SSLMutex sem
> 
> I should have defined the SSLMutex anyway since I'm using SSLSessionCache - my
bad.
> 
> I'm testing the change of AcceptMutex and will report back.

Comment 8 Joe Orton 2005-10-28 18:00:23 UTC

The current default is deemed to be the best trade-off, it's not clear what
there is left to change here.   Since:

1) this issue was mostly seen by people using a particular third-party module
2) it was also seen by people using 1.3
3) it only happens on Solaris

I'd guess this is either something to do with a third-party module doing
something weird, or some system tuning issue.  (1) and (3) can be eliminated
from enquiries by taking up a support issue with the appropriate vendors.