Bug 48981 - FcgidMaxProcesses is not honoured
Summary: FcgidMaxProcesses is not honoured
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_fcgid (show other bugs)
Version: 2.2.15
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-24 21:57 UTC by erno.kovacs
Modified: 2010-04-29 16:44 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description erno.kovacs 2010-03-24 21:57:19 UTC
I am using Apache/2.2.15 (Unix) with mpm worker and mod_fcgid 2.3.5, server keeps spawning new php processes though FcgidMaxProcesses is set to 2.

Relevant config entries:

# MPM:
MaxClients 400
ServerLimit 16
StartServers 2
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild  10000

# FCGID:
LoadModule fcgid_module modules/mod_fcgid.so

FcgidBusyScanInterval 60
FcgidBusyTimeout 300
FcgidMaxRequestsPerProcess 10000
FcgidFixPathinfo 1
AddHandler fcgid-script .php
FcgidIdleTimeout 300
FcgidProcessLifetime 3600
FcgidInitialEnv PHP_FCGI_MAX_REQUESTS 10000
FcgidInitialEnv PHP_FCGI_CHILDREN 0
FcgidWrapper "/usr/local/phpw3/bin/php-cgi -c /usr/local/apache/conf/php.ini" .php
FcgidMaxProcesses 2
FcgidMaxProcessesPerClass 2
FcgidPassHeader Authorization


# ps aux|grep php|grep w3|wc -l
17

And it is increasing every time when the client request a .php page.

I have only 4 running httpd processes, which means even if FcgidMaxProcesses is a per Apache process setting (I assume it is), it should limit the number of spawned php processes to 8.
Comment 1 erno.kovacs 2010-03-25 09:02:02 UTC
As I suspected, it has something to do with MPM worker.
Using the config above with these tweaks:
#MinSpareThreads 25
#MaxSpareThreads 75
#ThreadsPerChild 25
MinSpareThreads 1
MaxSpareThreads 1
ThreadsPerChild 1

I have 3 running httpd processes and it wont spawn more than 6 PHP processes, just like expected.

This behaviour is wrong anyway.
1. It means static int g_total_process is a per thread variable.
2. Even if it was a per process variable, FcgidMaxProcesses is unusable as I cant control the maximum number of fcgi applications. Just imagine, you have a busy server with lots of preforked processes, this way mod_fcgid would just keep spawning new processes which makes it nothing better than a simple cgi.

Some shared memory solution would be needed.
Comment 2 Jeff Trawick 2010-03-25 10:40:55 UTC
g_total_process is only referenced from the process manager.  (The process manager is a single-threaded process.)  Thus, it doesn't need to be in shared memory.

Something you could check is how many of the FastCGI processes you find are in error state (visible in the mod_fcgid section within the mod_status report) and how many are zombies (visible in ps output).

(Or decrease FcgidErrorScanInterval and FcgidZombieScanInterval to 1 so that processes don't stay in that state more than 1 second.)
Comment 3 erno.kovacs 2010-03-25 10:51:43 UTC
> Something you could check is how many of the FastCGI processes you find are in
> error state (visible in the mod_fcgid section within the mod_status report) and
> how many are zombies (visible in ps output).

I can see only php-cgi processes with state S (sleeping I guess) in ps aux output.
In mod_fcgid of server-status there is 

mod_fcgid status:
Total FastCGI processes: 35 
States of all of them are ready, accesses is mostly 1 (there are a few 2).
Comment 4 rkosolapov 2010-03-26 08:11:19 UTC
I just discovered the same problem on Fedora 11 (mod_fcgid 2.2).  The following patch solves the concrete problem, but it should not be considered as proposed one.  It works for me, so it may be used as temporary solution.

In the patch I just move check of amount of run processes into beginning of is_spawn_allowed function.

My guess is there is some race condition (the situation that I see in logs leads me to that).  I've seen in logs the lines like "mod_fcgid: XXXXX total process count 195 >= 20, skip the spawn request" where "195" can be "25", "201", "345" etc (I have 600 VirtualHosts and MaxProcessCount set to 20), i.e. sometimes Apache permit spawn of new process, sometimes - not.  The probability of permitting is low if amount of http queries (on different VritualHosts) is high.

I've looked on fcgid_proctbl_unix.c and found line "g_proc_array = _global_memory->procnode_array;" (g_proc_array used in is_spawn_allowed via g_stat_list_header), if _global_memory not set before this line, it may cause the issue I see.

Sorry if I completely wrong with my guesses, I just want to share my experience with this problem.  I have no experience with system programming at all (an I have no C knowledge), so please don't trust me ;)



PS: the patch:
$ diff -u fcgid_spawn_ctl.c.orig fcgid_spawn_ctl.c
--- fcgid_spawn_ctl.c.orig	2010-03-26 13:33:36.000000000 +0600
+++ fcgid_spawn_ctl.c	2010-03-26 13:34:13.000000000 +0600
@@ -146,6 +146,14 @@
 	if (!command || !g_stat_pool)
 		return 1;
 
+        /* Total process count higher than up limit? */
+        if (g_total_process >= g_max_process) {
+          ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, main_server,
+                       "mod_fcgid: %s total process count %d >= %d, skip the spawn request",
+                       command->cgipath, g_total_process, g_max_process);
+          return 0;
+        }
+
 	/* Can I find the node base on inode, device id and share group id? */
 	for (current_node = g_stat_list_header;
 		 current_node != NULL; current_node = current_node->next) {
@@ -180,14 +188,6 @@
 			return 0;
 		}
 
-		/* Total process count higher than up limit? */
-		if (g_total_process >= g_max_process) {
-			ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, main_server,
-						 "mod_fcgid: %s total process count %d >= %d, skip the spawn request",
-						 command->cgipath, g_total_process, g_max_process);
-			return 0;
-		}
-
 		/*
 		   Process count of this class higher than up limit?
 		 */
Comment 5 rkosolapov 2010-03-26 08:12:48 UTC
Oh, just a comment about php-cgi processes state - in my case there is no one Error or Zombie php-cgi process.
Comment 6 erno.kovacs 2010-03-28 21:10:04 UTC
Same problem exists using mpm prefork, so its NOT a worker related problem.
Comment 7 erno.kovacs 2010-03-28 21:15:07 UTC
I added some extra debug lines to fcgid_spawn_ctl.c's is_spawn_allowed function, the function returns 1 at:

    if (!current_node)
        return 1;
Comment 8 erno.kovacs 2010-03-28 21:54:26 UTC
I think I got the bug. I am using mass vhost (mod_vhost_cd, but it prolly doesnt really matter), no <Virtualhost> stuff.

I added some more debug lines to is_spawn_allowed:


       ap_log_error(APLOG_MARK, APLOG_WARNING, 0, main_server,
                 "COMMAND: inode %d, deviceid %d share_grp_id %d virtualhost %s uid %d gid %d",
          (int) command->inode,
          (int) command->deviceid,
          (int) command->share_grp_id,
          command->virtualhost,
          (int) command->uid,
          (int) command->gid
       );

    /* Can I find the node base on inode, device id and share group id? */
    for (current_node = g_stat_list_header;
         current_node != NULL; current_node = current_node->next) {

       ap_log_error(APLOG_MARK, APLOG_WARNING, 0, main_server,
                 "NODE: inode %d, deviceid %d share_grp_id %d virtualhost %s uid %d gid %d",
          (int) current_node->inode,
          (int) current_node->deviceid,
          (int) current_node->share_grp_id,
          current_node->virtualhost,
          (int) current_node->uid,
          (int) current_node->gid
       );

        if (current_node->inode == command->inode
            && current_node->deviceid == command->deviceid
            && current_node->share_grp_id == command->share_grp_id
            && current_node->virtualhost == command->virtualhost
            && current_node->uid == command->uid
            && current_node->gid == command->gid)
        {
           ap_log_error(APLOG_MARK, APLOG_WARNING, 0, main_server,
                 "NODE FOUND!" );
            break;
        }
    }

    if (!current_node)
    {
       ap_log_error(APLOG_MARK, APLOG_WARNING, 0, main_server,
                 "SPAWNING ALLOWED: !current_node");
       return 1;
    }
    else {
...




at first php request:

[Sun Mar 28 23:44:26 2010] [warn] COMMAND: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost  uid 65534 gid 65534
[Sun Mar 28 23:44:26 2010] [warn] SPAWNING ALLOWED: !current_node

at second request:

[Sun Mar 28 23:41:59 2010] [warn] COMMAND: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost  uid 65534 gid 65534
[Sun Mar 28 23:41:59 2010] [warn] NODE: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost  uid 65534 gid 65534
[Sun Mar 28 23:41:59 2010] [warn] SPAWNING ALLOWED: !current_node


As you can see there was no NODE FOUND tho all integer based entries are the same. This means virtualhost is the same string but not on the same memory address.

After commenting out the line
 && current_node->virtualhost == command->virtualhost

at second request we got:
[Sun Mar 28 23:52:50 2010] [warn] COMMAND: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost  uid 65534 gid 65534
[Sun Mar 28 23:52:50 2010] [warn] NODE: inode 26113935, deviceid 2309 share_grp_id 1 virtualhost  uid 65534 gid 65534
[Sun Mar 28 23:52:50 2010] [warn] NODE FOUND!
Comment 9 erno.kovacs 2010-03-30 11:00:36 UTC
Ok, I recommend some FcgidNoVhostCheckInProcMgr option with default value off.
Opinions?
Comment 10 erno.kovacs 2010-04-19 08:08:52 UTC
Dont want to fix this issue?
Comment 11 Jeff Trawick 2010-04-19 08:29:22 UTC
>Dont want to fix this issue?

I haven't had time yet to sort through the related issues; I guess noone else has either.  Here's another perspective that ends up in the same code:

http://mail-archives.apache.org/mod_mbox/httpd-dev/201004.mbox/%3Cq2l81403a941004131831lce28460bqfc9fa53c2058e79b@mail.gmail.com%3E
Comment 12 Jeff Trawick 2010-04-29 16:36:06 UTC
(In reply to comment #4)
> I just discovered the same problem on Fedora 11 (mod_fcgid 2.2).  The following
> patch solves the concrete problem, but it should not be considered as proposed
> one.  It works for me, so it may be used as temporary solution.
> 
> In the patch I just move check of amount of run processes into beginning of
> is_spawn_allowed function.

Your patch looks correct to me.  I expect to commit shortly.
> 
> My guess is there is some race condition (the situation that I see in logs
> leads me to that).

The oddity (not a race condition in the normal sense) is that the limit will be ignored when no instance of the process we're trying to start is currently active.
Comment 13 Jeff Trawick 2010-04-29 16:44:38 UTC
The patch in this PR was updated to work with the current source and is now committed with revision 939478.