Bug 54519

Summary: httpd already running
Product: Apache httpd-2 Reporter: Edward Quick <edwardquick>
Component: CoreAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: normal CC: dclarke, edwardquick
Priority: P2    
Version: 2.5-HEAD   
Target Milestone: ---   
Hardware: Sun   
OS: Solaris   

Description Edward Quick 2013-02-02 09:09:20 UTC
If httpd dies and leaves behind a pidfile containing a pid which later gets reused, the httpd refuses to come back up. I've already seen this in production, where a host has crashed and on coming back up the web server fails to start because something else has grabed the pid.

To reproduce the problem, do this:

[root@laptop httpd]# pkill -9 httpd
[root@laptop httpd]# pgrep httpd
[root@laptop httpd]# echo 1 > /var/run/httpd/httpd.pid

[root@laptop httpd]# /usr/sbin/httpd -k start
httpd: Could not reliably determine the server's fully qualified domain name, using fe80::201:4aff:fe5e:5331 for ServerName
httpd (pid 1) already running

This is the version I'm using:

[quick@laptop ~]$ httpd -v
Server version: Apache/2.2.22 (Unix)
Server built:   Apr 30 2012 09:55:05
[quick@laptop ~]$ cat /etc/redhat-release 
Fedora release 17 (Beefy Miracle)

I tested this out on RHEL6 which ships with httpd 2.2.15 and noted that doesn't suffer the same problem, however I can't see anything in the changelog between versions 2.2.15 and 2.2.22 which would have caused this problem to occur.
Comment 1 Edward Quick 2013-02-17 21:06:40 UTC
I reproduced the problem on Fedora 18 with httpd 2.4.3 as well:

[root@laptop httpd]# ps -ef | grep [h]ttp
root      2326     1  0 20:57 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache    2327  2326  0 20:57 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache    2328  2326  0 20:57 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache    2329  2326  0 20:57 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache    2330  2326  0 20:57 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache    2331  2326  0 20:57 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache    2332  2326  0 20:57 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND

[root@laptop httpd]# kill -9 2326
[root@laptop httpd]# ps -ef | grep [h]ttp
[root@laptop httpd]# echo 1 > /var/run/httpd/httpd.pid

[root@laptop httpd]# /usr/sbin/httpd -k start
httpd (pid 1) already running

[root@laptop httpd]# ps -ef | grep [h]ttp

[root@laptop httpd]# httpd -v
Server version: Apache/2.4.3 (Fedora)
Server built:   Jan  8 2013 13:46:23


[root@laptop httpd]# uname -a
Linux laptop 3.7.7-201.fc18.i686 #1 SMP Tue Feb 12 22:59:10 UTC 2013 i686 i686 i386 GNU/Linux
Comment 2 Edward Quick 2013-02-17 21:58:54 UTC
Just for comparison, I carried out the same test on nginx and that was fine. 


[root@laptop run]# ps -ef | grep [n]ginx
root      3055     1  0 21:50 ?        00:00:00 nginx: master process /usr/sbin/nginx
nginx     3056  3055  0 21:50 ?        00:00:00 nginx: worker process

[root@laptop run]# cat /run/nginx.pid 
3055
[root@laptop run]# kill -9 3055
[root@laptop run]# ps -ef | grep [n]ginx
[root@laptop run]# echo 1 > /run/nginx.pid

[root@laptop run]# /usr/sbin/nginx
[root@laptop run]# ps -ef | grep [n]ginx
root      3144     1  0 21:53 ?        00:00:00 nginx: master process /usr/sbin/nginx
nginx     3145  3144  0 21:53 ?        00:00:00 nginx: worker process
[root@laptop run]# cat /run/nginx.pid 
3144

[root@laptop run]# nginx -v
nginx version: nginx/1.2.6
Comment 3 bucky 2016-11-10 17:55:44 UTC
This issue continues to be present in 2.4.18 as shipped by RHEL 7 as package:

httpd24-httpd-2.4.18-11.el7.x86_64

Until I found this bug report, I was puzzled that an nfs process was being identified as httpd.

So if Edward Quick would like me to send him a beer, I will be delighted to do so.
Comment 4 Luca Toscano 2016-11-26 08:35:55 UTC
Thanks a lot for the tests, bz.apache.org/bugzilla/show_bug.cgi?id=60261 was a recent similar use case in which the same PID is re-used in Docker containers (so since it is the same PID it is safe to proceed).

In the upcoming release (2.4.24) the code looks more or less like this:

#Read the pid file and store the result in 'otherpid'
rv = ap_read_pid(pconf, ap_pid_fname, &otherpid);
if (otherpid != getpid() && kill(otherpid, 0) == 0) { # httpd already running }

In this case, the new PID is different from the one used by the old httpd process (so otherpid != getpid()) but it is used by a completely different running process (so kill(otherpid, 0) == 0 is also true), that overlaps with the regular case in which httpd is already started and it is correct to end up in the "httpd already running" error case.

Waiting for other feedback since I am not sure how to solve this issue simply looking at PIDs (something more might be needed).
Comment 5 Dennis Clarke 2018-08-13 10:58:45 UTC
Same problem seen with a build from trunk rev 1833619 thus :

tls13# /usr/local/bin/apachectl start 
httpd (pid 2548) already running

tls13# ps -ef | grep "2548"
    root  2548  2489   0 10:26:09 pts/10      0:00 -sh
    root  4423  2548   0 10:54:29 pts/10      0:00 grep 2548
tls13# 

Deleting the left behind sock file does nothing : 

tls13# ls -lap /usr/local/www/var/run 
total 9
drwxr-xr-x   2 webservd webservd       3 Aug 13 10:35 ./
drwxr-xr-x   4 root     root           4 Jun 15 19:28 ../
srwx------   1 webservd webservd       0 Aug 13 10:35 cgid.sock.2548
tls13# 

tls13# /usr/local/bin/httpd -V
Server version: Apache/2.5.1-dev (Unix)
Server built:   Jun 15 2018 19:01:31
Server's Module Magic Number: 20180422:1
Server loaded:  APR 1.6.3, APR-UTIL 1.5.3, PCRE 8.40 2017-01-11
Compiled using: APR 1.6.3, APR-UTIL 1.5.3, PCRE 8.40 2017-01-11
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
Server compiled with....
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_PROC_PTHREAD_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=256
 -D HTTPD_ROOT="/usr/local"
 -D SUEXEC_BIN="/usr/local/bin/suexec"
 -D DEFAULT_PIDLOG="httpd.pid"
 -D DEFAULT_SCOREBOARD="apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="www/conf/mime.types"
 -D SERVER_CONFIG_FILE="www/conf/httpd.conf"
tls13# 

Temporary brute force method I used was to simply reboot the server and
get a new set of pids in use.

tls13 # uptime 
 10:57am  up 1 min(s),  1 user,  load average: 0.12, 0.08, 0.04
tls13 # /usr/local/bin/apachectl start 
tls13 # 

Not pretty but works for the moment.