Bug 29277

Summary: Apache gets locked
Product: Apache httpd-2 Reporter: Guilherme Assad <guilhermeassad>
Component: AllAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: P3    
Version: 2.0.49   
Target Milestone: ---   
Hardware: Sun   
OS: Solaris   

Description Guilherme Assad 2004-05-28 14:10:31 UTC
I've recently upgraded my apache+ssl server from version 2.0.48 toapache+ssl
2.0.49 using mod_worker. 
I use Solaris Sparc as my operating system. After the upgrade my apache server
became really unstable, sometimes it stops responding and only return after a
restart.  
I did a truss atchild and parent process and I found this :
Child Process... 
17512:  lwp_sema_wait(0xFD109E78)       (sleeping...)
17512:  lwp_sema_wait(0xFD007E78)       (sleeping...)
17512:  lwp_sema_wait(0xFDA03E78)       (sleeping...)
17512:  lwp_sema_wait(0xFBA0BE78)       (sleeping...)
17512:  lwp_sema_wait(0xFC407E78)       (sleeping...)
17512:  lwp_sema_wait(0xFC203E78)       (sleeping...)
17512:  lwp_sema_wait(0xFBC0FE78)       (sleeping...)
17512:  lwp_sema_wait(0xFCD01E78)       (sleeping...)
17512:  lwp_sema_wait(0xFC509E78)       (sleeping...)
17512:  lwp_sema_wait(0xFC101E78)       (sleeping...)
17512:  lwp_sema_wait(0xFD40FE78)       (sleeping...)

Parent Process... 
17104:  poll(0xFFBEF920, 0, 1000)                       = 0
17104:  write(9, " !", 1)                               = 1
17104:  waitid(P_ALL, 0, 0xFFBEF8D0, WEXITED|WTRAPPED|WSTOPPED|WNOHANG)= 0
17104:  poll(0xFFBEF920, 0, 1000)                       = 0
17104:  write(9, " !", 1)                               = 1
17104:  waitid(P_ALL, 0, 0xFFBEF8D0, WEXITED|WTRAPPED|WSTOPPED|WNOHANG)= 0

It seems that all child process get locked, and the parent stay in loop waiting
for a change at it's child status. 
My errorlog file shows a Segmentation Fault at a child process exactly at the
moment of the "not responding" apache. 

[Fri May 21 14:41:03 2004] [notice] child pid 17483 exit signal Segmentation
fault (11)

Does anyone has a clue about it?
Comment 1 Jeff Trawick 2004-05-28 15:08:18 UTC
Best doc for a web server hang condition is to run pstack against each web
server process and post the results.

There are two known issues which you may be suffering from:

1) ssl-related child process crash introduced in 2.0.49

See PR27945.  Here is the fix for that problem:

http://cvs.apache.org/viewcvs.cgi/httpd-2.0/modules/ssl/ssl_engine_io.c?r1=1.121&r2=1.122

2) hang after crash when using pthread accept mutex (the default on Solaris)

(this is not at all obvious from your truss, but pstack of all children when in
this hung state should give more definitive info)

Very possibly the child process crash (segmentation fault) resulted in screwing
up the accept mutex, which in turn hung the server.

Try switching to non-default accept mutex type, by adding this to httpd.conf:

AcceptMutex fcntl

This has nothing to do with the child process crash, but it should resolve web
server hangs that occur after a crash due to loss of accept mutex.
Comment 2 Joe Orton 2005-03-10 14:44:27 UTC
Presuming this was a duplicate of bug 27945.  The "segfaults can hang a threaded
server" issue was fixed in 2.0.50ish as well.

*** This bug has been marked as a duplicate of 27945 ***