_Every_ time stopping or restarting apache following error is in the log: (70007)The timeout specified has expired: apr_pollset_poll: (listen) The context is the following: [Sat Jun 02 14:59:40 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Sat Jun 02 14:59:42 2007] [notice] SIGHUP received. Attempting to restart [Sat Jun 02 14:59:42 2007] [notice] Digest: generating secret for digest authentication ... [Sat Jun 02 14:59:42 2007] [notice] Digest: done [Sat Jun 02 14:59:44 2007] [notice] Apache/2.2.4 (Unix) mod_ssl/2.2.4 OpenSSL/0.9.8d configured -- resuming normal operations The bug also in 2.2.3 occurs. Other versions are not tested.
What MPM are you using?
Worker That might be the problem. Ive got running a 2.2.3 with prefork. There are no problems. And Ive got a 2.2.4 and a 2.2.3 with worker where these apr problems occur.
Sorry, I am new to Bugzilla. Do I have to check the "reassign" checkbox like I do now? My answer to your question is 11 days old. Or isnt there a solution for the problem?
(In reply to comment #3) > Sorry, I am new to Bugzilla. Do I have to check the "reassign" checkbox like I > do now? No, you don't need to. You are doing it perfectly. > My answer to your question is 11 days old. Thanks for that. > Or isnt there a solution for the problem? It is lack of time :-). Sofar nobody had some time to dig into it. Feel free to bug us here from time to time if you see no progress.
Ok, thanks. No problem. Perhaps I can help i bit. I have examined the situation. Here the results: The apr problem only occurs, when the Apache runs in a Solaris Zone. When it runs in the global context (means: not in a zone), the apache restart does NOT cause an APR_TIMEUP. Is it right, that the error is from srclib/apr/poll/unix/poll.c from line 253? And the causing function is the poll(..) system call? "man poll" under Solaris says the same as the source code: RETURN VALUES Upon successful completion, a non-negative value is returned. A positive value indicates the total number of file descriptors that has been selected (that is, file descriptors for which the revents member is non-zero). A value of 0 indicates that the call timed out and no file descriptors have been selected. Upon failure, -1 is returned and errno is set to indicate the error. Perhaps its not possible to execute poll(..) in a Solaris zone? Or there is a missing device in /dev/?
(In reply to comment #5) > The apr problem only occurs, when the Apache runs in a Solaris Zone. When it > runs in the global context (means: not in a zone), the apache restart does NOT > cause an APR_TIMEUP. This is an interesting information. > > Is it right, that the error is from srclib/apr/poll/unix/poll.c from line 253? As you are using zones I suppose that you are running on Solaris 10. So I guess that you have port support there and you should be rather in line 291 of port.c in the same directory. You can check if you have port support by checking if HAVE_PORT_CREATE is defined in srclib/apr/include/arch/unix/apr_private.h > And the causing function is the poll(..) system call? I think it is rather port_getn. > > "man poll" under Solaris says the same as the source code: > > RETURN VALUES > Upon successful completion, a non-negative value is > returned. A positive value indicates the total number of > file descriptors that has been selected (that is, file > descriptors for which the revents member is non-zero). A > value of 0 indicates that the call timed out and no file > descriptors have been selected. Upon failure, -1 is returned > and errno is set to indicate the error. > > > Perhaps its not possible to execute poll(..) in a Solaris zone? > Or there is a missing device in /dev/? I don't think that this is the case. If you wouldn't be able to use this system call httpd would not work at all. I suppose that it has something to do with the restart logic itself. What happens (error message wise) if you 1. Stop httpd 2. do a gracefull restart 3. do a graceful shutdown in your zone. >
> As you are using zones I suppose that you are running on Solaris 10. Yes, thats right. Solaris 10 for sparc architecture - and not for x86. > So I guess > that you have port support there and you should be rather in line 291 of port.c > in the same directory. You can check if you have port support by checking if > HAVE_PORT_CREATE is defined in srclib/apr/include/arch/unix/apr_private.h Yes, it is defined there. Since September 2006 there is a patch for two port_getn problems: 6374407 port_getn does not dissociate fd when it reports an event on it 6402576 Sometimes port_getn() never returns indicating that a socket has become writable Dont know either if the patch was applied on this server nor if it is possible that is has something to do with the APR_TIMEUP problem. > I don't think that this is the case. If you wouldn't be able to use this system > call httpd would not work at all. I suppose that it has something to do with the > restart logic itself. What happens (error message wise) if you > > 1. Stop httpd > 2. do a gracefull restart > 3. do a graceful shutdown > > in your zone. The error occurs in each case (of the three).
Sorry, I was wrong. Today I have seen that the APR_TIMEUP also occurs in the global context / global zone :-( And not only one timeup per restart... how you can see... [Sat Jun 23 13:01:30 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Sat Jun 23 13:01:31 2007] [notice] SIGHUP received. Attempting to restart [Sat Jun 23 13:01:31 2007] [notice] Apache/2.2.4 (Unix) configured -- resuming normal operations [Sat Jun 23 13:01:32 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Sat Jun 23 13:01:39 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Sat Jun 23 13:01:40 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Sat Jun 23 13:01:41 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Sat Jun 23 13:06:31 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) Any ideas how I can help? Can I debug this issue?
Ok, the second time I was wrong. But now i have got the problem! :) I accuse a second Listen directive! (Doesnt matter if in the global context or in a zone.) For example: Listen 192.168.1.7:80 Listen 192.168.1.7:81 -> The APR_TIMEUP happens when restarting, stopping, ... Listen 192.168.1.7:80 -> No APR_TIMEUP Omitting the IP and using Listen *:80 and Listen *:81 didnt solve the problem. The APR_TIMEUP occurs. Please help :)
Hello, I can also reproduce this error on my Solaris 10 servers with a recent build of Apache 2.2.6. Whenever I run a graceful restart, I see the same apr_timeout message in the log file. Perhaps more importantly, this SOMETIMES seems to be leaving behind children processes that shouldn't still be running after a graceful restart. Sometimes it leaves one child process, sometimes it leaves 5. For example, on a completely idle server, here is a ps list after numerous graceful restarts in a row: $ ps -ef | grep http webservd 19097 18781 0 16:10:01 ? 0:00 /usr/local/apache-2.2.6/bin/httpd -k start root 18781 12143 1 16:02:28 ? 0:04 /usr/local/apache-2.2.6/bin/httpd -k start webservd 18868 18781 0 16:03:53 ? 0:00 /usr/local/apache-2.2.6/bin/httpd -k start webservd 19095 18781 0 16:10:01 ? 0:00 /usr/local/apache-2.2.6/bin/httpd -k start webservd 19099 18781 0 16:10:01 ? 0:00 /usr/local/apache-2.2.6/bin/httpd -k start webservd 19096 18781 0 16:10:01 ? 0:00 /usr/local/apache-2.2.6/bin/httpd -k start webservd 19098 18781 0 16:10:01 ? 0:00 /usr/local/apache-2.2.6/bin/httpd -k start webservd 19072 18781 0 16:08:59 ? 0:00 /usr/local/apache-2.2.6/bin/httpd -k start There were no client requests made to this daemon during this period as it is a completely idle test box and yet there are consistently processes (19072, 18868 in this example) that are not exiting as I would expect them to. A truss of one of the hung up processes is sticking in port_getn: $ truss -p 18868 port_getn(19, 0x00224298, 2, 1, 0x00000000) (sleeping...) The log file showing the restarts and errors in the log during this period is below: [Fri Nov 30 16:02:27 2007] [notice] Digest: generating secret for digest authentication ... [Fri Nov 30 16:02:27 2007] [notice] Digest: done [Fri Nov 30 16:02:27 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g configured -- resuming normal operations [Fri Nov 30 16:02:33 2007] [notice] Graceful restart requested, doing restart [Fri Nov 30 16:02:33 2007] [notice] Digest: generating secret for digest authentication ... [Fri Nov 30 16:02:33 2007] [notice] Digest: done [Fri Nov 30 16:02:33 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g configured -- resuming normal operations [Fri Nov 30 16:02:58 2007] [notice] Graceful restart requested, doing restart [Fri Nov 30 16:02:58 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:02:59 2007] [notice] Digest: generating secret for digest authentication ... [Fri Nov 30 16:02:59 2007] [notice] Digest: done [Fri Nov 30 16:02:59 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g configured -- resuming normal operations [Fri Nov 30 16:03:31 2007] [notice] Graceful restart requested, doing restart [Fri Nov 30 16:03:31 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:03:31 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:03:31 2007] [notice] Digest: generating secret for digest authentication ... [Fri Nov 30 16:03:31 2007] [notice] Digest: done [Fri Nov 30 16:03:31 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g configured -- resuming normal operations [Fri Nov 30 16:03:34 2007] [notice] Graceful restart requested, doing restart [Fri Nov 30 16:03:34 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:03:34 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:03:34 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:03:34 2007] [notice] Digest: generating secret for digest authentication ... [Fri Nov 30 16:03:34 2007] [notice] Digest: done [Fri Nov 30 16:03:34 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g configured -- resuming normal operations [Fri Nov 30 16:03:52 2007] [notice] Graceful restart requested, doing restart [Fri Nov 30 16:03:52 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:03:52 2007] [notice] Digest: generating secret for digest authentication ... [Fri Nov 30 16:03:52 2007] [notice] Digest: done [Fri Nov 30 16:03:52 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g configured -- resuming normal operations [Fri Nov 30 16:04:02 2007] [notice] Graceful restart requested, doing restart [Fri Nov 30 16:04:02 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:04:02 2007] [notice] Digest: generating secret for digest authentication ... [Fri Nov 30 16:04:02 2007] [notice] Digest: done [Fri Nov 30 16:04:03 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g configured -- resuming normal operations [Fri Nov 30 16:08:58 2007] [notice] Graceful restart requested, doing restart [Fri Nov 30 16:08:58 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:08:58 2007] [notice] Digest: generating secret for digest authentication ... [Fri Nov 30 16:08:58 2007] [notice] Digest: done [Fri Nov 30 16:08:58 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g configured -- resuming normal operations [Fri Nov 30 16:10:00 2007] [notice] Graceful restart requested, doing restart [Fri Nov 30 16:10:00 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:10:00 2007] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen) [Fri Nov 30 16:10:00 2007] [notice] Digest: generating secret for digest authentication ... [Fri Nov 30 16:10:00 2007] [notice] Digest: done [Fri Nov 30 16:10:00 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g configured -- resuming normal operations Some info about my config.. Server is on Solaris 10/sparc recently patched with the 10/31/2007 Recommended patch cluster. Following modules are loaded: core_module (static), mpm_prefork_module (static), http_module (static), so_module (static), alias_module (shared), authn_default_module (shared), authn_file_module (shared), authz_default_module (shared), authz_host_module (shared), auth_basic_module (shared), auth_digest_module (shared), deflate_module (shared), dir_module (shared), dumpio_module (shared), headers_module (shared), include_module (shared), log_config_module (shared), logio_module (shared), mime_module (shared), proxy_module (shared), proxy_ajp_module (shared), proxy_balancer_module (shared), proxy_http_module (shared), rewrite_module (shared), setenvif_module (shared), ssl_module (shared), vhost_alias_module (shared) /usr/local/apache-2.2.6/bin/httpd -V Server version: Apache/2.2.6 (Unix) Server built: Nov 27 2007 18:40:30 Server's Module Magic Number: 20051115:5 Server loaded: APR 1.2.11, APR-Util 1.2.10 Compiled using: APR 1.2.11, APR-Util 1.2.10 Architecture: 32-bit Server MPM: Prefork threaded: no forked: yes (variable process count) Server compiled with.... -D APACHE_MPM_DIR="server/mpm/prefork" -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_FCNTL_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=128 -D HTTPD_ROOT="/usr/local/apache-2.2.6" -D SUEXEC_BIN="/usr/local/apache-2.2.6/bin/suexec" -D DEFAULT_PIDLOG="logs/httpd.pid" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_LOCKFILE="logs/accept.lock" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" Happy to assist with debugging this but I'm not sure what needs to be looked at and don't really have the time to go spelunking into it all by my lonesome. Anyone care to assist?
Tested 2.0.61 on the same machine with similar options and do not see this problem. Updated version to 2.2.6 in the bug report.
I also saw the similar behavior in open solaris (snv_79). httpd.conf contains : Listen 192.168.21.1:80 Listen 192.168.22.1:80 In error log, I observed : [Mon Jan 28 18:10:48 2008] [notice] Apache/2.2.6 (Unix) configured -- resuming normal operations [Mon Jan 28 18:10:49 2008] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen)
Created attachment 21511 [details] Patch When I had the following settings : Listen 192.168.21.1:81 Listen 192.168.22.1:81 <IfModule worker.c> ListenBackLog 50000 StartServers 2 ThreadLimit 5 ThreadsPerChild 5 MinSpareThreads 1 MaxSpareThreads 1 ThreadsPerChild 5 MaxClients 10 MaxRequestsPerChild 0 </IfModule> I was getting the following error in my error log [Mon Feb 11 13:33:52 2008] [error] (70007)The timeout specified has expired: apr_pollset_poll : (listen) On debugging, I found out that the listener thread calls port_getn to listen for new connections, it was getting SIGHUP signal. This was confirmed by truss output. 5541/1: read(6, 0xFFFFFD7FFFDFF38B, 1) (sleeping...) 5541/7: lwp_park(0x00000000, 0) (sleeping...) 5541/6: lwp_park(0x00000000, 0) (sleeping...) 5541/5: lwp_park(0x00000000, 0) (sleeping...) 5541/4: lwp_park(0x00000000, 0) (sleeping...) 5541/3: lwp_park(0x00000000, 0) (sleeping...) 5541/8: port_getn(12, 0x00539D30, 2, 1, 0x00000000) (sleeping...) 5543/1: read(6, 0xFFFFFD7FFFDFF38B, 1) (sleeping...) 5543/3: lwp_park(0x00000000, 0) (sleeping...) 5543/4: lwp_park(0x00000000, 0) (sleeping...) 5543/5: lwp_park(0x00000000, 0) (sleeping...) 5543/6: lwp_park(0x00000000, 0) (sleeping...) 5543/7: lwp_park(0x00000000, 0) (sleeping...) 5543/8: fcntl(11, F_SETLKW, 0xFFFFFD7FFF241E10) (sleeping...) 5539/1: pollsys(0xFFFFFD7FFFDFF2E0, 0, 0xFFFFFD7FFFDFF3A0, 0x00000000) = 0 5539/1: write(7, " !", 1) = 1 5541/1: read(6, " !", 1) = 1 5541/1: lwp_wait(2, 0xFFFFFD7FFFDFF31C) = 0 5541/1: lwp_kill(8, SIGHUP) = 0 5541/1: lwp_kill(8, SIG#0) = 0 5541/8: Received signal #1, SIGHUP, in port_getn() [caught] 5541/8: siginfo: SIGHUP pid=5541 uid=101 code=-1 5541/8: port_getn(12, 0x00539D30, 2, 1, 0x00000000) Err#4 EINTR 5541/8: lwp_sigmask(SIG_SETMASK, 0xFFBEE007, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF] Note that 5541/8 is the listener thread and 5541/1 was sending the SIGHUP signal. I further debugged why one of my process is dieing. Then I found out that in worker.c, in perform_idle_server_maintenance function, if (idle_thread_count > max_spare_threads) { /* Kill off one child */ ap_mpm_pod_signal(pod, TRUE); idle_spawn_rate = 1; .. From debugging, I found out that idle_thread_count was set to 10 and max_spare_threads was set to 6. Hence the first thread was sending SIGHUP signal to kill the process. If I increase the MaxSpareThreads to 10 then this error disappeared because in that case, worker process didn't try to kill himself. <IfModule worker.c> ListenBackLog 50000 StartServers 2 ThreadLimit 5 ThreadsPerChild 5 MinSpareThreads 1 MaxSpareThreads 10 ThreadsPerChild 5 MaxClients 10 MaxRequestsPerChild 0 </IfModule> The error message is misleading and it may or may not appear. It depends on the timing. If listener thread has been reached to the point when it is blocked with port_getn then we get this error. It seems to me that server wanted to deliberately kill one of the child process. Casue of the problem is that apr_pollset_poll function doesn't differentiate the EINTR and ETIME. apr_pollset_poll contains ... if (errno == ETIME || errno == EINTR) { rv = APR_TIMEUP; } In worker.c, we do expect EINTR to be normal as shown below : -------------- worker.c -------------------------- rv = apr_pollset_poll(pollset, -1, &numdesc, &pdesc); if (rv != APR_SUCCESS) { if (APR_STATUS_IS_EINTR(rv)) { continue; } -------------------------------------------------- If we allow EINTR as a known error and return APR_EINTR as given in suggested patch then this error disappeared.
So this is an APR bug. I've no idea why the Solaris port pollset code would map EINTR to APR_ETIMEUP; can you post that patch to dev@apr to solicit feedback?
I posted the patch to apr dev list.
Here is the link to the apr question which I posted : http://mail-archives.apache.org/mod_mbox/apr-dev/200802.mbox/browser
Created attachment 21513 [details] New patch Based on the suggestion from Lucian Adrian Grijincu, I have updated the patch.
Now fixed in 1.2.x (for httpd) as well as trunk+1.3