I am using Apache/2.2.15 (Unix) with mpm worker and mod_fcgid 2.3.5, server keeps spawning new php processes though FcgidMaxProcesses is set to 2. Relevant config entries: # MPM: MaxClients 400 ServerLimit 16 StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadsPerChild 25 MaxRequestsPerChild 10000 # FCGID: LoadModule fcgid_module modules/mod_fcgid.so FcgidBusyScanInterval 60 FcgidBusyTimeout 300 FcgidMaxRequestsPerProcess 10000 FcgidFixPathinfo 1 AddHandler fcgid-script .php FcgidIdleTimeout 300 FcgidProcessLifetime 3600 FcgidInitialEnv PHP_FCGI_MAX_REQUESTS 10000 FcgidInitialEnv PHP_FCGI_CHILDREN 0 FcgidWrapper "/usr/local/phpw3/bin/php-cgi -c /usr/local/apache/conf/php.ini" .php FcgidMaxProcesses 2 FcgidMaxProcessesPerClass 2 FcgidPassHeader Authorization # ps aux|grep php|grep w3|wc -l 17 And it is increasing every time when the client request a .php page. I have only 4 running httpd processes, which means even if FcgidMaxProcesses is a per Apache process setting (I assume it is), it should limit the number of spawned php processes to 8.
As I suspected, it has something to do with MPM worker. Using the config above with these tweaks: #MinSpareThreads 25 #MaxSpareThreads 75 #ThreadsPerChild 25 MinSpareThreads 1 MaxSpareThreads 1 ThreadsPerChild 1 I have 3 running httpd processes and it wont spawn more than 6 PHP processes, just like expected. This behaviour is wrong anyway. 1. It means static int g_total_process is a per thread variable. 2. Even if it was a per process variable, FcgidMaxProcesses is unusable as I cant control the maximum number of fcgi applications. Just imagine, you have a busy server with lots of preforked processes, this way mod_fcgid would just keep spawning new processes which makes it nothing better than a simple cgi. Some shared memory solution would be needed.
g_total_process is only referenced from the process manager. (The process manager is a single-threaded process.) Thus, it doesn't need to be in shared memory. Something you could check is how many of the FastCGI processes you find are in error state (visible in the mod_fcgid section within the mod_status report) and how many are zombies (visible in ps output). (Or decrease FcgidErrorScanInterval and FcgidZombieScanInterval to 1 so that processes don't stay in that state more than 1 second.)
> Something you could check is how many of the FastCGI processes you find are in > error state (visible in the mod_fcgid section within the mod_status report) and > how many are zombies (visible in ps output). I can see only php-cgi processes with state S (sleeping I guess) in ps aux output. In mod_fcgid of server-status there is mod_fcgid status: Total FastCGI processes: 35 States of all of them are ready, accesses is mostly 1 (there are a few 2).
I just discovered the same problem on Fedora 11 (mod_fcgid 2.2). The following patch solves the concrete problem, but it should not be considered as proposed one. It works for me, so it may be used as temporary solution. In the patch I just move check of amount of run processes into beginning of is_spawn_allowed function. My guess is there is some race condition (the situation that I see in logs leads me to that). I've seen in logs the lines like "mod_fcgid: XXXXX total process count 195 >= 20, skip the spawn request" where "195" can be "25", "201", "345" etc (I have 600 VirtualHosts and MaxProcessCount set to 20), i.e. sometimes Apache permit spawn of new process, sometimes - not. The probability of permitting is low if amount of http queries (on different VritualHosts) is high. I've looked on fcgid_proctbl_unix.c and found line "g_proc_array = _global_memory->procnode_array;" (g_proc_array used in is_spawn_allowed via g_stat_list_header), if _global_memory not set before this line, it may cause the issue I see. Sorry if I completely wrong with my guesses, I just want to share my experience with this problem. I have no experience with system programming at all (an I have no C knowledge), so please don't trust me ;) PS: the patch: $ diff -u fcgid_spawn_ctl.c.orig fcgid_spawn_ctl.c --- fcgid_spawn_ctl.c.orig 2010-03-26 13:33:36.000000000 +0600 +++ fcgid_spawn_ctl.c 2010-03-26 13:34:13.000000000 +0600 @@ -146,6 +146,14 @@ if (!command || !g_stat_pool) return 1; + /* Total process count higher than up limit? */ + if (g_total_process >= g_max_process) { + ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, main_server, + "mod_fcgid: %s total process count %d >= %d, skip the spawn request", + command->cgipath, g_total_process, g_max_process); + return 0; + } + /* Can I find the node base on inode, device id and share group id? */ for (current_node = g_stat_list_header; current_node != NULL; current_node = current_node->next) { @@ -180,14 +188,6 @@ return 0; } - /* Total process count higher than up limit? */ - if (g_total_process >= g_max_process) { - ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, main_server, - "mod_fcgid: %s total process count %d >= %d, skip the spawn request", - command->cgipath, g_total_process, g_max_process); - return 0; - } - /* Process count of this class higher than up limit? */
Oh, just a comment about php-cgi processes state - in my case there is no one Error or Zombie php-cgi process.
Same problem exists using mpm prefork, so its NOT a worker related problem.
I added some extra debug lines to fcgid_spawn_ctl.c's is_spawn_allowed function, the function returns 1 at: if (!current_node) return 1;
I think I got the bug. I am using mass vhost (mod_vhost_cd, but it prolly doesnt really matter), no <Virtualhost> stuff. I added some more debug lines to is_spawn_allowed: ap_log_error(APLOG_MARK, APLOG_WARNING, 0, main_server, "COMMAND: inode %d, deviceid %d share_grp_id %d virtualhost %s uid %d gid %d", (int) command->inode, (int) command->deviceid, (int) command->share_grp_id, command->virtualhost, (int) command->uid, (int) command->gid ); /* Can I find the node base on inode, device id and share group id? */ for (current_node = g_stat_list_header; current_node != NULL; current_node = current_node->next) { ap_log_error(APLOG_MARK, APLOG_WARNING, 0, main_server, "NODE: inode %d, deviceid %d share_grp_id %d virtualhost %s uid %d gid %d", (int) current_node->inode, (int) current_node->deviceid, (int) current_node->share_grp_id, current_node->virtualhost, (int) current_node->uid, (int) current_node->gid ); if (current_node->inode == command->inode && current_node->deviceid == command->deviceid && current_node->share_grp_id == command->share_grp_id && current_node->virtualhost == command->virtualhost && current_node->uid == command->uid && current_node->gid == command->gid) { ap_log_error(APLOG_MARK, APLOG_WARNING, 0, main_server, "NODE FOUND!" ); break; } } if (!current_node) { ap_log_error(APLOG_MARK, APLOG_WARNING, 0, main_server, "SPAWNING ALLOWED: !current_node"); return 1; } else { ... at first php request: [Sun Mar 28 23:44:26 2010] [warn] COMMAND: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost uid 65534 gid 65534 [Sun Mar 28 23:44:26 2010] [warn] SPAWNING ALLOWED: !current_node at second request: [Sun Mar 28 23:41:59 2010] [warn] COMMAND: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost uid 65534 gid 65534 [Sun Mar 28 23:41:59 2010] [warn] NODE: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost uid 65534 gid 65534 [Sun Mar 28 23:41:59 2010] [warn] SPAWNING ALLOWED: !current_node As you can see there was no NODE FOUND tho all integer based entries are the same. This means virtualhost is the same string but not on the same memory address. After commenting out the line && current_node->virtualhost == command->virtualhost at second request we got: [Sun Mar 28 23:52:50 2010] [warn] COMMAND: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost uid 65534 gid 65534 [Sun Mar 28 23:52:50 2010] [warn] NODE: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost uid 65534 gid 65534 [Sun Mar 28 23:52:50 2010] [warn] NODE FOUND!
Ok, I recommend some FcgidNoVhostCheckInProcMgr option with default value off. Opinions?
Dont want to fix this issue?
>Dont want to fix this issue? I haven't had time yet to sort through the related issues; I guess noone else has either. Here's another perspective that ends up in the same code: http://mail-archives.apache.org/mod_mbox/httpd-dev/201004.mbox/%3Cq2l81403a941004131831lce28460bqfc9fa53c2058e79b@mail.gmail.com%3E
(In reply to comment #4) > I just discovered the same problem on Fedora 11 (mod_fcgid 2.2). The following > patch solves the concrete problem, but it should not be considered as proposed > one. It works for me, so it may be used as temporary solution. > > In the patch I just move check of amount of run processes into beginning of > is_spawn_allowed function. Your patch looks correct to me. I expect to commit shortly. > > My guess is there is some race condition (the situation that I see in logs > leads me to that). The oddity (not a race condition in the normal sense) is that the limit will be ignored when no instance of the process we're trying to start is currently active.
The patch in this PR was updated to work with the current source and is now committed with revision 939478.