Bug 54993

Summary: Critical error: File descriptor in bad state: apr_pollset_poll failed.
Product: Apache httpd-2 Reporter: Yonah Russ <apache>
Component: mpm_eventAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: normal    
Priority: P2    
Version: 2.4.4   
Target Milestone: ---   
Hardware: PC   
OS: other   

Description Yonah Russ 2013-05-20 14:13:52 UTC
We are getting lots of strange error messages from Apache 2.4 with the event mpm under load.
 
[mpm_event:crit] [pid 9685:tid 28] (81)File descriptor in bad state: apr_pollset_poll failed. Attempting to shutdown process gracefully

It appears to come from these lines in /httpd/tags/2.4.4/server/mpm/event/event.c

1455	 rc = apr_pollset_poll(event_pollset, timeout_interval, &num, &out_pfd); 
1456	 if (rc != APR_SUCCESS) { 
1457	 if (APR_STATUS_IS_EINTR(rc)) { 
1458	 continue; 
1459	 } 
1460	 if (!APR_STATUS_IS_TIMEUP(rc)) { 
1461	 ap_log_error(APLOG_MARK, APLOG_CRIT, rc, ap_server_conf, 
1462	 "apr_pollset_poll failed. Attempting to " 
1463	 "shutdown process gracefully"); 
1464	 signal_threads(ST_GRACEFUL); 
1465	 } 
1466	 } 
1467

If I understand correctly, apr_pollset_poll() is returning EBADFD in this code (specific to pollsets using solaris event ports): /apr/tags/1.4.6/poll/unix/port.c

372	 ret = port_associate(pollset->p->port_fd, PORT_SOURCE_FD, 
373	 fd, get_event(ep->pfd.reqevents), ep); 
374	 if (ret < 0) { 
375	 rv = apr_get_netos_error(); 
376	 APR_RING_INSERT_TAIL(&(pollset->p->free_ring), ep, pfd_elem_t, link); 
377	 break; 
378	 }

According to the man page for port_associate:

EBADFD The source argument is of type PORT_SOURCE_FD and the object argument is not a valid file descriptor.

We are running SmartOS, a variant of OpenSolaris/Illumos
Comment 1 Mat Mannion 2015-01-16 11:28:46 UTC
We're also seeing this problem on Apache 2.4.9 on Solaris 10_u11; under load we're seeing apr_pollset_poll failed messages and the server is very slow to respond. After 30 seconds or so it recovers.

We use log_server_status to log the server status to a file, and when this happens we can actually see requests to /server-status failing:

110500:90:185:136912038:.722062
110600:93:182:136920497:.722775
110700:71:204:136926275:.723272
110800:105:170:136933289:.723871
110900:101:174:136941124:.724506
111000:-1:-1:-1:-1:IO::Socket::INET: connect: Connection timed out
111100:-1:-1:-1:-1:IO::Socket::INET: connect: Connection timed out
111200:43:207:136954348:.726364
111300:293:207:136960240:.725304
111400:87:213:136967664:.497808
111500:99:201:136974792:.49857