Bug 29277 - Apache gets locked
Summary: Apache gets locked
Status: RESOLVED DUPLICATE of bug 27945
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: All (show other bugs)
Version: 2.0.49
Hardware: Sun Solaris
: P3 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-05-28 14:10 UTC by Guilherme Assad
Modified: 2005-03-10 05:44 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Guilherme Assad 2004-05-28 14:10:31 UTC
I've recently upgraded my apache+ssl server from version 2.0.48 toapache+ssl
2.0.49 using mod_worker. 
I use Solaris Sparc as my operating system. After the upgrade my apache server
became really unstable, sometimes it stops responding and only return after a
restart.  
I did a truss atchild and parent process and I found this :
Child Process... 
17512:  lwp_sema_wait(0xFD109E78)       (sleeping...)
17512:  lwp_sema_wait(0xFD007E78)       (sleeping...)
17512:  lwp_sema_wait(0xFDA03E78)       (sleeping...)
17512:  lwp_sema_wait(0xFBA0BE78)       (sleeping...)
17512:  lwp_sema_wait(0xFC407E78)       (sleeping...)
17512:  lwp_sema_wait(0xFC203E78)       (sleeping...)
17512:  lwp_sema_wait(0xFBC0FE78)       (sleeping...)
17512:  lwp_sema_wait(0xFCD01E78)       (sleeping...)
17512:  lwp_sema_wait(0xFC509E78)       (sleeping...)
17512:  lwp_sema_wait(0xFC101E78)       (sleeping...)
17512:  lwp_sema_wait(0xFD40FE78)       (sleeping...)

Parent Process... 
17104:  poll(0xFFBEF920, 0, 1000)                       = 0
17104:  write(9, " !", 1)                               = 1
17104:  waitid(P_ALL, 0, 0xFFBEF8D0, WEXITED|WTRAPPED|WSTOPPED|WNOHANG)= 0
17104:  poll(0xFFBEF920, 0, 1000)                       = 0
17104:  write(9, " !", 1)                               = 1
17104:  waitid(P_ALL, 0, 0xFFBEF8D0, WEXITED|WTRAPPED|WSTOPPED|WNOHANG)= 0

It seems that all child process get locked, and the parent stay in loop waiting
for a change at it's child status. 
My errorlog file shows a Segmentation Fault at a child process exactly at the
moment of the "not responding" apache. 

[Fri May 21 14:41:03 2004] [notice] child pid 17483 exit signal Segmentation
fault (11)

Does anyone has a clue about it?
Comment 1 Jeff Trawick 2004-05-28 15:08:18 UTC
Best doc for a web server hang condition is to run pstack against each web
server process and post the results.

There are two known issues which you may be suffering from:

1) ssl-related child process crash introduced in 2.0.49

See PR27945.  Here is the fix for that problem:

http://cvs.apache.org/viewcvs.cgi/httpd-2.0/modules/ssl/ssl_engine_io.c?r1=1.121&r2=1.122

2) hang after crash when using pthread accept mutex (the default on Solaris)

(this is not at all obvious from your truss, but pstack of all children when in
this hung state should give more definitive info)

Very possibly the child process crash (segmentation fault) resulted in screwing
up the accept mutex, which in turn hung the server.

Try switching to non-default accept mutex type, by adding this to httpd.conf:

AcceptMutex fcntl

This has nothing to do with the child process crash, but it should resolve web
server hangs that occur after a crash due to loss of accept mutex.
Comment 2 Joe Orton 2005-03-10 14:44:27 UTC
Presuming this was a duplicate of bug 27945.  The "segfaults can hang a threaded
server" issue was fixed in 2.0.50ish as well.

*** This bug has been marked as a duplicate of 27945 ***