I've recently upgraded my apache+ssl server from version 2.0.48 toapache+ssl 2.0.49 using mod_worker. I use Solaris Sparc as my operating system. After the upgrade my apache server became really unstable, sometimes it stops responding and only return after a restart. I did a truss atchild and parent process and I found this : Child Process... 17512: lwp_sema_wait(0xFD109E78) (sleeping...) 17512: lwp_sema_wait(0xFD007E78) (sleeping...) 17512: lwp_sema_wait(0xFDA03E78) (sleeping...) 17512: lwp_sema_wait(0xFBA0BE78) (sleeping...) 17512: lwp_sema_wait(0xFC407E78) (sleeping...) 17512: lwp_sema_wait(0xFC203E78) (sleeping...) 17512: lwp_sema_wait(0xFBC0FE78) (sleeping...) 17512: lwp_sema_wait(0xFCD01E78) (sleeping...) 17512: lwp_sema_wait(0xFC509E78) (sleeping...) 17512: lwp_sema_wait(0xFC101E78) (sleeping...) 17512: lwp_sema_wait(0xFD40FE78) (sleeping...) Parent Process... 17104: poll(0xFFBEF920, 0, 1000) = 0 17104: write(9, " !", 1) = 1 17104: waitid(P_ALL, 0, 0xFFBEF8D0, WEXITED|WTRAPPED|WSTOPPED|WNOHANG)= 0 17104: poll(0xFFBEF920, 0, 1000) = 0 17104: write(9, " !", 1) = 1 17104: waitid(P_ALL, 0, 0xFFBEF8D0, WEXITED|WTRAPPED|WSTOPPED|WNOHANG)= 0 It seems that all child process get locked, and the parent stay in loop waiting for a change at it's child status. My errorlog file shows a Segmentation Fault at a child process exactly at the moment of the "not responding" apache. [Fri May 21 14:41:03 2004] [notice] child pid 17483 exit signal Segmentation fault (11) Does anyone has a clue about it?
Best doc for a web server hang condition is to run pstack against each web server process and post the results. There are two known issues which you may be suffering from: 1) ssl-related child process crash introduced in 2.0.49 See PR27945. Here is the fix for that problem: http://cvs.apache.org/viewcvs.cgi/httpd-2.0/modules/ssl/ssl_engine_io.c?r1=1.121&r2=1.122 2) hang after crash when using pthread accept mutex (the default on Solaris) (this is not at all obvious from your truss, but pstack of all children when in this hung state should give more definitive info) Very possibly the child process crash (segmentation fault) resulted in screwing up the accept mutex, which in turn hung the server. Try switching to non-default accept mutex type, by adding this to httpd.conf: AcceptMutex fcntl This has nothing to do with the child process crash, but it should resolve web server hangs that occur after a crash due to loss of accept mutex.
Presuming this was a duplicate of bug 27945. The "segfaults can hang a threaded server" issue was fixed in 2.0.50ish as well. *** This bug has been marked as a duplicate of 27945 ***