Bug 53693 - mod_fcgid performance suffers under increase in request concurrency
Summary: mod_fcgid performance suffers under increase in request concurrency
Status: REOPENED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_fcgid (show other bugs)
Version: 2.2-HEAD
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: PatchAvailable
: 56308 56719 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-08-10 23:47 UTC by Mike
Modified: 2021-02-17 18:05 UTC (History)
4 users (show)



Attachments
Spawn faster under low process count (2.76 KB, patch)
2012-08-14 21:44 UTC, Dominic Benson
Details | Diff
Rewrite handle_request in fcgid_bridge.c to fix 1sec delay (5.63 KB, patch)
2017-12-14 11:29 UTC, Merijn van den Kroonenberg
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mike 2012-08-10 23:47:40 UTC
https://gist.github.com/3318709


I wrote a simple PHP script to simulate activity (sleep for 500ms).  I set up a vhost to handle PHP requests via mod_fcgid. I 
- used PHP5.2, PHP5.3, PHP5.4 -- doesn't seem to matter. 
- used the latest trunk snapshot of mod_fcgid (2.3.8), and that didn't change behavior 
- used Apache 2.2.22 (and 2.2.17) with both worker and prefork mpms
- httpd is configured with 512 StartServers and at least 1024 MaxClients

Trying to figure out how to prevent serialized blocking of requests  in mod_fcgid.

What seems to happen is mod_fcgid on initial process spin-up won't tune the number of available of child processes to handle the concurrent request load..  there is a serialization that occurs.  I have tried turning a number of the spawn parameters to try to spin up more at once.. and I can't seem to make it happen.

So here's an extreme case of 100 concurrent clients with 1 requests to be made, each:

Concurrency Level:      100
Time taken for tests:   5.672 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      15600 bytes
HTML transferred:       0 bytes
Requests per second:    17.63 [#/sec] (mean)
Time per request:       5671.918 [ms] (mean)
Time per request:       56.719 [ms] (mean, across all concurrent requests)
Transfer rate:          2.69 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        4    8   2.1      8      10
Processing:   614 2956 1604.2   2665    5661
Waiting:      614 2956 1604.2   2664    5661
Total:        618 2963 1604.1   2674    5671

Percentage of the requests served within a certain time (ms)
  50%   2674
  66%   3612
  75%   4540
  80%   4627
  90%   5633
  95%   5634
  98%   5668
  99%   5671
 100%   5671 (longest request)


The process spin up for this over time looks like:
Fri Aug 10 16:37:19 MST 2012
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache    3571  0.0  0.3  57004  8040 ?        R    16:37   0:00 /usr/bin/php-cgi
apache    3572  0.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3573  0.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3574  0.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3575  0.0  0.3  56872  7928 ?        R    16:37   0:00 /usr/bin/php-cgi
apache    3576  0.0  0.3  56872  7832 ?        R    16:37   0:00 /usr/bin/php-cgi
apache    3577  0.0  0.3  56872  7868 ?        R    16:37   0:00 /usr/bin/php-cgi
apache    3578  0.0  0.3  56744  7808 ?        R    16:37   0:00 /usr/bin/php-cgi
apache    3580  0.0  0.0   8112   384 ?        R    16:37   0:00 /usr/bin/php-cgi
10
Fri Aug 10 16:37:19 MST 2012
Fri Aug 10 16:37:19 MST 2012
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache    3571  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3572  5.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3573  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3574  5.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3575  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3576  6.0  0.4  57132  8416 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3577  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3578  6.0  0.4  57132  8416 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3580  6.0  0.4  57132  8416 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3582  5.0  0.4  57132  8408 ?        S    16:37   0:00 /usr/bin/php-cgi
10
Fri Aug 10 16:37:19 MST 2012
Fri Aug 10 16:37:19 MST 2012
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache    3571  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3572  5.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3573  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3574  5.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3575  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3576  6.0  0.4  57132  8416 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3577  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3578  6.0  0.4  57132  8416 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3580  6.0  0.4  57132  8416 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3582  5.0  0.4  57132  8408 ?        S    16:37   0:00 /usr/bin/php-cgi
10
Fri Aug 10 16:37:19 MST 2012
Fri Aug 10 16:37:19 MST 2012
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache    3571  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3572  5.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3573  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3574  5.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3575  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3576  6.0  0.4  57132  8416 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3577  6.0  0.4  57132  8412 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3578  6.0  0.4  57132  8416 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3580  6.0  0.4  57132  8416 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3582  5.0  0.4  57132  8408 ?        S    16:37   0:00 /usr/bin/php-cgi
10
Fri Aug 10 16:37:19 MST 2012
...
Fri Aug 10 16:37:23 MST 2012
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache    3571  1.2  0.4  57132  8480 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3572  1.0  0.4  57132  8480 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3573  1.2  0.4  57132  8480 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3574  1.2  0.4  57132  8480 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3575  1.2  0.4  57132  8480 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3576  1.2  0.4  57132  8484 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3577  1.2  0.4  57132  8480 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3578  1.2  0.4  57132  8484 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3580  1.2  0.4  57132  8484 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3582  1.0  0.4  57132  8476 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3657  1.5  0.4  57132  8484 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3706  2.0  0.4  57132  8480 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3760  3.5  0.4  57132  8480 ?        S    16:37   0:00 /usr/bin/php-cgi
apache    3816  6.0  0.4  57132  8408 ?        S    16:37   0:00 /usr/bin/php-cgi
14

It takes at least 4 seconds to spin up to 14 processes (well under our concurrent connection limit). 

On a very small scale, I ran some tests to simulate this:

test.php script:

usleep(500000);
echo $_SERVER['SCRIPT_FILENAME']." :: ".$_SERVER['SERVER_NAME'];

With this script, each run should take ~500ms plus some very minor overhead.

o Loaded Apache with the worker MPM.. Used ab concurrency level of 2, feeding 10 requests, we see a bottleneck due to this serialization:

Concurrency Level:      2
Time taken for tests:   3.116 seconds
Complete requests:      10
Failed requests:        0
Write errors:           0
Total transferred:      1560 bytes
HTML transferred:       0 bytes
Requests per second:    3.21 [#/sec] (mean)
Time per request:       623.257 [ms] (mean)
Time per request:       311.628 [ms] (mean, across all concurrent requests)
Transfer rate:          0.49 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        3    3   0.2      3       3
Processing:   504  619 336.6    505    1574
Waiting:      504  619 336.6    505    1574
Total:        507  622 336.6    508    1577

Percentage of the requests served within a certain time (ms)
  50%    508
  66%    508
  75%    508
  80%    577
  90%   1577
  95%   1577
  98%   1577
  99%   1577
 100%   1577 (longest request)

o Once the processes are spun up however, we see 500ms (+~10ms overhead) for all requests as long as our concurrency does not exceed the number of child processes available

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache   30209  0.0  0.4  57132  8488 ?        S    15:30   0:00 /usr/bin/php-cgi
apache   30214  0.0  0.4  57132  8480 ?        S    15:30   0:00 /usr/bin/php-cgi


Concurrency Level:      2
Time taken for tests:   2.546 seconds
Complete requests:      10
Failed requests:        0
Write errors:           0
Total transferred:      1560 bytes
HTML transferred:       0 bytes
Requests per second:    3.93 [#/sec] (mean)
Time per request:       509.200 [ms] (mean)
Time per request:       254.600 [ms] (mean, across all concurrent requests)
Transfer rate:          0.60 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        3    3   0.1      3       3
Processing:   504  506   1.7    505     509
Waiting:      504  506   1.7    505     509
Total:        507  508   1.7    508     512

Percentage of the requests served within a certain time (ms)
  50%    508
  66%    509
  75%    509
  80%    511
  90%    512
  95%    512
  98%    512
  99%    512
 100%    512 (longest request)

o Adding +1 concurrency level (taking concurrency to 3), we produce a serializing bottleneck again:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache   30209  0.0  0.4  57132  8488 ?        S    15:30   0:00 /usr/bin/php-cgi
apache   30214  0.0  0.4  57132  8484 ?        S    15:30   0:00 /usr/bin/php-cgi
apache   31866  1.2  0.4  57132  8488 ?        S    15:34   0:00 /usr/bin/php-cgi

Concurrency Level:      3
Time taken for tests:   2.083 seconds
Complete requests:      10
Failed requests:        0
Write errors:           0
Total transferred:      1560 bytes
HTML transferred:       0 bytes
Requests per second:    4.80 [#/sec] (mean)
Time per request:       625.025 [ms] (mean)
Time per request:       208.342 [ms] (mean, across all concurrent requests)
Transfer rate:          0.73 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        3    3   0.2      3       3
Processing:   504  612 337.9    505    1573
Waiting:      504  612 337.9    505    1573
Total:        507  615 337.9    508    1576

Percentage of the requests served within a certain time (ms)
  50%    508
  66%    508
  75%    508
  80%    509
  90%   1576
  95%   1576
  98%   1576
  99%   1576
 100%   1576 (longest request)

o And again, at 3 concurrency since 3 already spun up processes, no bottleneck:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache   30209  0.0  0.4  57132  8488 ?        S    15:30   0:00 /usr/bin/php-cgi
apache   30214  0.0  0.4  57132  8484 ?        S    15:30   0:00 /usr/bin/php-cgi
apache   31866  1.2  0.4  57132  8488 ?        S    15:34   0:00 /usr/bin/php-cgi


Concurrency Level:      3
Time taken for tests:   2.032 seconds
Complete requests:      10
Failed requests:        0
Write errors:           0
Total transferred:      1716 bytes
HTML transferred:       0 bytes
Requests per second:    4.92 [#/sec] (mean)
Time per request:       609.480 [ms] (mean)
Time per request:       203.160 [ms] (mean, across all concurrent requests)
Transfer rate:          0.82 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        3    3   0.2      3       3
Processing:   504  505   0.5    505     506
Waiting:      504  505   0.5    505     506
Total:        507  508   0.5    508     508

Percentage of the requests served within a certain time (ms)
  50%    508
  66%    508
  75%    508
  80%    508
  90%    508
  95%    508
  98%    508
  99%    508
 100%    508 (longest request)
Comment 1 Mike 2012-08-14 17:44:48 UTC
This might be related to enhancement submission:

https://issues.apache.org/bugzilla/show_bug.cgi?id=52174
Comment 2 Mike 2012-08-14 19:51:07 UTC
I tried changing in modules/fcgid/fcgid_bridge.c,


           /* Avoid sleeping the very first time through if there are no
               busy processes; the problem is just that we haven't spawned
               anything yet, so waiting is pointless */
            if (i > 0 || j > 0 || count_busy_processes(r, &fcgi_request)) {
                apr_sleep(apr_time_from_sec(1));


to


           if (i > 0 || j > 0 || count_busy_processes(r, &fcgi_request)) {
                apr_sleep(apr_time_from_sec(0));


and the serialization block seems to have stopped.

Is this meant to just be an artificial anti-thrashing mechanism? 

If so, is there a better way I can prevent too many processes for trying to spin up concurrently, than adding in this 1s time delay?  The 1s delay totally kills concurrent requests.
Comment 3 Dominic Benson 2012-08-14 21:44:46 UTC
Created attachment 29233 [details]
Spawn faster under low process count

The principle of not rushing into trying to spawn new processes is good: indeed in the case that we're running at max processes per class, it is the only thing that gives time to try to handle the actual requests (rather than spinning on the server). However, 1s is a very long time, and we're doing it if *any* process is currently busy. As a minimum, FcgidMinProcessesPerClass should be considered - under that level, we should be perfectly happy to launch a new process to handle a request. 
Also, if launching another process is a plausible thing to do (i.e. we're below the limit), we shouldn't wait 1s before re-checking: requests should be being handled a lot more quickly than that. So instead, wait 250ms by default.
This patch is based in principle around high/low water marks for process count; at this point, these are just min and max processes per class. Picking figures between these would be better, but these should probably be configurable.
Comment 4 Dominic Benson 2012-08-15 08:55:31 UTC
Adjusted title slightly to reflect what the OP correctly noted: that the problem is that requests that can't immediately be handled be an existing process are delayed by 1s (excepting when there are no busy processes at all).

See also the comments in my patch: ultimately, additional tunables are called for, but adding options isn't something to be done lightly (given the doc impact), and certainly warrants some thought. Possibly explicit high/low, or a target process count, and possibly a configurable delay could be used - or some/all of these values could be generated at startup based on the bounds specified by existing options.
Comment 5 Ryan Pan 2012-08-26 09:36:17 UTC
a fix is commited to trunk r1377398
New protocol to avoid too much sleep, improve performance while stress testing
1. procmgr_send_spawn_cmd() now return a status code from PM, so process handler now know the spawn request is denyed or not.
2. if a new process is created, no sleep is needed.
3. if no process is created, sleep a while
Comment 6 Jeff Trawick 2013-10-08 11:23:18 UTC
Ryan Pan's fix for this bug (and a couple of follow-on fixes) was reverted with http://svn.apache.org/r1529061 due to issues encountered with testing the proposal for mod_fcgid 2.3.8, which could not be released.

Related discussion is in this mailing list thread:

http://mail-archives.apache.org/mod_mbox/httpd-dev/201309.mbox/browser

Note that at the same time a separate Windows-specific bug could result in more processes than necessary.  That didn't affect other platforms and didn't explain all the bad symptoms encountered in Steffen's Windows setup.
Comment 7 Stefan Priebe 2014-12-26 20:53:21 UTC
Any news on this bug?

Why is lowering the sleep from 1s to f.e. 50ms not a solution?
Comment 8 Merijn van den Kroonenberg 2017-12-14 11:29:34 UTC
Created attachment 35611 [details]
Rewrite handle_request in fcgid_bridge.c to fix 1sec delay

I rewrote the handle_request in fcgid_bridge.c to fix 1 sec delay issue.
The original situation would wait 1 second before trying to aquire a process and then spawn one.

This has the drawback of creating 'sluggisch' feel on low traffic sites which make use of ajax calls (parrallel requests). If there is only one process available the parallel ajax request will be delayed by a second. After the second that one process will be probably free, so the request will be handled and no new process will be spawned. As a result the next request will behave exactly the same, with the same 1 sec delay because it takes more requests to actually spawn a new process.

This rewrite will throw the one second delay out of the window. It will check more often if a process is available and it will try less often to spawn a new process.

The original code would take 64 seconds of trying before it gave up and a HTTP_SERVICE_UNAVAILABLE was returned. My new code takes 60.8 seconds for this to happen, but what happens during this time is much different.

Original:
64000ms (64x spawn attempts, 128 process apply attempts)

New:
60800ms (8x spawn attempts, 148 process apply attempts)

But where the old code was linear, it would just check every second, the new code is not.
This table will show the spawn attempts and the process apply attempts. There are 8 spawn attempts. and for each spawn attempt a number of process apply attempts is done. The time between these attempts also differs, small waits at the beginning (and end) and long waits in the middle.

           0) 2  x  50ms =   100ms
           1) 8  x 200ms =  1600ms
           2) 14 x 350ms =  4900ms
           3) 20 x 500ms = 10000ms
           4) 26 x 650ms = 16900ms
           5) 26 x 500ms = 13000ms
           6) 26 x 350ms =  9100ms
           7) 26 x 200ms =  5200ms

Shortening the waits at the end will prevent long waiting requests to starve and hopefully allow less HTTP_SERVICE_UNAVAILABLE when there is a short peak/overload on the server.

We are using this patch in production for two months now after a three month test period. Both on servers with single high load sites and with low load small sites.
Comment 9 Merijn van den Kroonenberg 2017-12-14 11:40:48 UTC
*** Bug 56308 has been marked as a duplicate of this bug. ***
Comment 10 Merijn van den Kroonenberg 2017-12-14 11:42:51 UTC
*** Bug 56719 has been marked as a duplicate of this bug. ***
Comment 11 Thomas 2019-02-10 16:57:39 UTC
Any progress on this issue?
Using mod_fcgid-2.3.9 is production environment and running into the same performance issues.
Comment 12 Fabian Schneider 2021-02-17 18:04:27 UTC
another solution would be to reduce the polling interval, as your suggestion would cause HTTP 503 errors appear almost immediately if no slot is available instead of 60 seconds.

To archive this the first step is to change the retry amount, e.g. to reduce the interval to 0.1 seconds:

From:
 FCGID_APPLY_TRY_COUNT 2

To:
 FCGID_APPLY_TRY_COUNT 11

Beside of this the interval has to be reduced:

            if (i > 0 || j > 0 || count_busy_processes(r, &fcgi_request)) {
                apr_sleep(apr_time_from_sec(1));

To

            if (i > 0 || j > 0 || count_busy_processes(r, &fcgi_request)) {
                apr_sleep(apr_time_from_msec(1000 / (FCGID_APPLY_TRY_COUNT - 1));

To improve the performance it can make sense to write the requested 100ms delay in the example hardcoded `apr_time_from_msec(100)` or as precompiler directive, as it needs to be calculated several hundred times per request for high traffic environments.