https://gist.github.com/3318709 I wrote a simple PHP script to simulate activity (sleep for 500ms). I set up a vhost to handle PHP requests via mod_fcgid. I - used PHP5.2, PHP5.3, PHP5.4 -- doesn't seem to matter. - used the latest trunk snapshot of mod_fcgid (2.3.8), and that didn't change behavior - used Apache 2.2.22 (and 2.2.17) with both worker and prefork mpms - httpd is configured with 512 StartServers and at least 1024 MaxClients Trying to figure out how to prevent serialized blocking of requests in mod_fcgid. What seems to happen is mod_fcgid on initial process spin-up won't tune the number of available of child processes to handle the concurrent request load.. there is a serialization that occurs. I have tried turning a number of the spawn parameters to try to spin up more at once.. and I can't seem to make it happen. So here's an extreme case of 100 concurrent clients with 1 requests to be made, each: Concurrency Level: 100 Time taken for tests: 5.672 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 15600 bytes HTML transferred: 0 bytes Requests per second: 17.63 [#/sec] (mean) Time per request: 5671.918 [ms] (mean) Time per request: 56.719 [ms] (mean, across all concurrent requests) Transfer rate: 2.69 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 4 8 2.1 8 10 Processing: 614 2956 1604.2 2665 5661 Waiting: 614 2956 1604.2 2664 5661 Total: 618 2963 1604.1 2674 5671 Percentage of the requests served within a certain time (ms) 50% 2674 66% 3612 75% 4540 80% 4627 90% 5633 95% 5634 98% 5668 99% 5671 100% 5671 (longest request) The process spin up for this over time looks like: Fri Aug 10 16:37:19 MST 2012 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 3571 0.0 0.3 57004 8040 ? R 16:37 0:00 /usr/bin/php-cgi apache 3572 0.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3573 0.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3574 0.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3575 0.0 0.3 56872 7928 ? R 16:37 0:00 /usr/bin/php-cgi apache 3576 0.0 0.3 56872 7832 ? R 16:37 0:00 /usr/bin/php-cgi apache 3577 0.0 0.3 56872 7868 ? R 16:37 0:00 /usr/bin/php-cgi apache 3578 0.0 0.3 56744 7808 ? R 16:37 0:00 /usr/bin/php-cgi apache 3580 0.0 0.0 8112 384 ? R 16:37 0:00 /usr/bin/php-cgi 10 Fri Aug 10 16:37:19 MST 2012 Fri Aug 10 16:37:19 MST 2012 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 3571 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3572 5.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3573 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3574 5.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3575 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3576 6.0 0.4 57132 8416 ? S 16:37 0:00 /usr/bin/php-cgi apache 3577 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3578 6.0 0.4 57132 8416 ? S 16:37 0:00 /usr/bin/php-cgi apache 3580 6.0 0.4 57132 8416 ? S 16:37 0:00 /usr/bin/php-cgi apache 3582 5.0 0.4 57132 8408 ? S 16:37 0:00 /usr/bin/php-cgi 10 Fri Aug 10 16:37:19 MST 2012 Fri Aug 10 16:37:19 MST 2012 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 3571 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3572 5.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3573 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3574 5.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3575 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3576 6.0 0.4 57132 8416 ? S 16:37 0:00 /usr/bin/php-cgi apache 3577 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3578 6.0 0.4 57132 8416 ? S 16:37 0:00 /usr/bin/php-cgi apache 3580 6.0 0.4 57132 8416 ? S 16:37 0:00 /usr/bin/php-cgi apache 3582 5.0 0.4 57132 8408 ? S 16:37 0:00 /usr/bin/php-cgi 10 Fri Aug 10 16:37:19 MST 2012 Fri Aug 10 16:37:19 MST 2012 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 3571 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3572 5.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3573 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3574 5.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3575 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3576 6.0 0.4 57132 8416 ? S 16:37 0:00 /usr/bin/php-cgi apache 3577 6.0 0.4 57132 8412 ? S 16:37 0:00 /usr/bin/php-cgi apache 3578 6.0 0.4 57132 8416 ? S 16:37 0:00 /usr/bin/php-cgi apache 3580 6.0 0.4 57132 8416 ? S 16:37 0:00 /usr/bin/php-cgi apache 3582 5.0 0.4 57132 8408 ? S 16:37 0:00 /usr/bin/php-cgi 10 Fri Aug 10 16:37:19 MST 2012 ... Fri Aug 10 16:37:23 MST 2012 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 3571 1.2 0.4 57132 8480 ? S 16:37 0:00 /usr/bin/php-cgi apache 3572 1.0 0.4 57132 8480 ? S 16:37 0:00 /usr/bin/php-cgi apache 3573 1.2 0.4 57132 8480 ? S 16:37 0:00 /usr/bin/php-cgi apache 3574 1.2 0.4 57132 8480 ? S 16:37 0:00 /usr/bin/php-cgi apache 3575 1.2 0.4 57132 8480 ? S 16:37 0:00 /usr/bin/php-cgi apache 3576 1.2 0.4 57132 8484 ? S 16:37 0:00 /usr/bin/php-cgi apache 3577 1.2 0.4 57132 8480 ? S 16:37 0:00 /usr/bin/php-cgi apache 3578 1.2 0.4 57132 8484 ? S 16:37 0:00 /usr/bin/php-cgi apache 3580 1.2 0.4 57132 8484 ? S 16:37 0:00 /usr/bin/php-cgi apache 3582 1.0 0.4 57132 8476 ? S 16:37 0:00 /usr/bin/php-cgi apache 3657 1.5 0.4 57132 8484 ? S 16:37 0:00 /usr/bin/php-cgi apache 3706 2.0 0.4 57132 8480 ? S 16:37 0:00 /usr/bin/php-cgi apache 3760 3.5 0.4 57132 8480 ? S 16:37 0:00 /usr/bin/php-cgi apache 3816 6.0 0.4 57132 8408 ? S 16:37 0:00 /usr/bin/php-cgi 14 It takes at least 4 seconds to spin up to 14 processes (well under our concurrent connection limit). On a very small scale, I ran some tests to simulate this: test.php script: usleep(500000); echo $_SERVER['SCRIPT_FILENAME']." :: ".$_SERVER['SERVER_NAME']; With this script, each run should take ~500ms plus some very minor overhead. o Loaded Apache with the worker MPM.. Used ab concurrency level of 2, feeding 10 requests, we see a bottleneck due to this serialization: Concurrency Level: 2 Time taken for tests: 3.116 seconds Complete requests: 10 Failed requests: 0 Write errors: 0 Total transferred: 1560 bytes HTML transferred: 0 bytes Requests per second: 3.21 [#/sec] (mean) Time per request: 623.257 [ms] (mean) Time per request: 311.628 [ms] (mean, across all concurrent requests) Transfer rate: 0.49 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 3 3 0.2 3 3 Processing: 504 619 336.6 505 1574 Waiting: 504 619 336.6 505 1574 Total: 507 622 336.6 508 1577 Percentage of the requests served within a certain time (ms) 50% 508 66% 508 75% 508 80% 577 90% 1577 95% 1577 98% 1577 99% 1577 100% 1577 (longest request) o Once the processes are spun up however, we see 500ms (+~10ms overhead) for all requests as long as our concurrency does not exceed the number of child processes available USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 30209 0.0 0.4 57132 8488 ? S 15:30 0:00 /usr/bin/php-cgi apache 30214 0.0 0.4 57132 8480 ? S 15:30 0:00 /usr/bin/php-cgi Concurrency Level: 2 Time taken for tests: 2.546 seconds Complete requests: 10 Failed requests: 0 Write errors: 0 Total transferred: 1560 bytes HTML transferred: 0 bytes Requests per second: 3.93 [#/sec] (mean) Time per request: 509.200 [ms] (mean) Time per request: 254.600 [ms] (mean, across all concurrent requests) Transfer rate: 0.60 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 3 3 0.1 3 3 Processing: 504 506 1.7 505 509 Waiting: 504 506 1.7 505 509 Total: 507 508 1.7 508 512 Percentage of the requests served within a certain time (ms) 50% 508 66% 509 75% 509 80% 511 90% 512 95% 512 98% 512 99% 512 100% 512 (longest request) o Adding +1 concurrency level (taking concurrency to 3), we produce a serializing bottleneck again: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 30209 0.0 0.4 57132 8488 ? S 15:30 0:00 /usr/bin/php-cgi apache 30214 0.0 0.4 57132 8484 ? S 15:30 0:00 /usr/bin/php-cgi apache 31866 1.2 0.4 57132 8488 ? S 15:34 0:00 /usr/bin/php-cgi Concurrency Level: 3 Time taken for tests: 2.083 seconds Complete requests: 10 Failed requests: 0 Write errors: 0 Total transferred: 1560 bytes HTML transferred: 0 bytes Requests per second: 4.80 [#/sec] (mean) Time per request: 625.025 [ms] (mean) Time per request: 208.342 [ms] (mean, across all concurrent requests) Transfer rate: 0.73 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 3 3 0.2 3 3 Processing: 504 612 337.9 505 1573 Waiting: 504 612 337.9 505 1573 Total: 507 615 337.9 508 1576 Percentage of the requests served within a certain time (ms) 50% 508 66% 508 75% 508 80% 509 90% 1576 95% 1576 98% 1576 99% 1576 100% 1576 (longest request) o And again, at 3 concurrency since 3 already spun up processes, no bottleneck: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 30209 0.0 0.4 57132 8488 ? S 15:30 0:00 /usr/bin/php-cgi apache 30214 0.0 0.4 57132 8484 ? S 15:30 0:00 /usr/bin/php-cgi apache 31866 1.2 0.4 57132 8488 ? S 15:34 0:00 /usr/bin/php-cgi Concurrency Level: 3 Time taken for tests: 2.032 seconds Complete requests: 10 Failed requests: 0 Write errors: 0 Total transferred: 1716 bytes HTML transferred: 0 bytes Requests per second: 4.92 [#/sec] (mean) Time per request: 609.480 [ms] (mean) Time per request: 203.160 [ms] (mean, across all concurrent requests) Transfer rate: 0.82 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 3 3 0.2 3 3 Processing: 504 505 0.5 505 506 Waiting: 504 505 0.5 505 506 Total: 507 508 0.5 508 508 Percentage of the requests served within a certain time (ms) 50% 508 66% 508 75% 508 80% 508 90% 508 95% 508 98% 508 99% 508 100% 508 (longest request)
This might be related to enhancement submission: https://issues.apache.org/bugzilla/show_bug.cgi?id=52174
I tried changing in modules/fcgid/fcgid_bridge.c, /* Avoid sleeping the very first time through if there are no busy processes; the problem is just that we haven't spawned anything yet, so waiting is pointless */ if (i > 0 || j > 0 || count_busy_processes(r, &fcgi_request)) { apr_sleep(apr_time_from_sec(1)); to if (i > 0 || j > 0 || count_busy_processes(r, &fcgi_request)) { apr_sleep(apr_time_from_sec(0)); and the serialization block seems to have stopped. Is this meant to just be an artificial anti-thrashing mechanism? If so, is there a better way I can prevent too many processes for trying to spin up concurrently, than adding in this 1s time delay? The 1s delay totally kills concurrent requests.
Created attachment 29233 [details] Spawn faster under low process count The principle of not rushing into trying to spawn new processes is good: indeed in the case that we're running at max processes per class, it is the only thing that gives time to try to handle the actual requests (rather than spinning on the server). However, 1s is a very long time, and we're doing it if *any* process is currently busy. As a minimum, FcgidMinProcessesPerClass should be considered - under that level, we should be perfectly happy to launch a new process to handle a request. Also, if launching another process is a plausible thing to do (i.e. we're below the limit), we shouldn't wait 1s before re-checking: requests should be being handled a lot more quickly than that. So instead, wait 250ms by default. This patch is based in principle around high/low water marks for process count; at this point, these are just min and max processes per class. Picking figures between these would be better, but these should probably be configurable.
Adjusted title slightly to reflect what the OP correctly noted: that the problem is that requests that can't immediately be handled be an existing process are delayed by 1s (excepting when there are no busy processes at all). See also the comments in my patch: ultimately, additional tunables are called for, but adding options isn't something to be done lightly (given the doc impact), and certainly warrants some thought. Possibly explicit high/low, or a target process count, and possibly a configurable delay could be used - or some/all of these values could be generated at startup based on the bounds specified by existing options.
a fix is commited to trunk r1377398 New protocol to avoid too much sleep, improve performance while stress testing 1. procmgr_send_spawn_cmd() now return a status code from PM, so process handler now know the spawn request is denyed or not. 2. if a new process is created, no sleep is needed. 3. if no process is created, sleep a while
Ryan Pan's fix for this bug (and a couple of follow-on fixes) was reverted with http://svn.apache.org/r1529061 due to issues encountered with testing the proposal for mod_fcgid 2.3.8, which could not be released. Related discussion is in this mailing list thread: http://mail-archives.apache.org/mod_mbox/httpd-dev/201309.mbox/browser Note that at the same time a separate Windows-specific bug could result in more processes than necessary. That didn't affect other platforms and didn't explain all the bad symptoms encountered in Steffen's Windows setup.
Any news on this bug? Why is lowering the sleep from 1s to f.e. 50ms not a solution?
Created attachment 35611 [details] Rewrite handle_request in fcgid_bridge.c to fix 1sec delay I rewrote the handle_request in fcgid_bridge.c to fix 1 sec delay issue. The original situation would wait 1 second before trying to aquire a process and then spawn one. This has the drawback of creating 'sluggisch' feel on low traffic sites which make use of ajax calls (parrallel requests). If there is only one process available the parallel ajax request will be delayed by a second. After the second that one process will be probably free, so the request will be handled and no new process will be spawned. As a result the next request will behave exactly the same, with the same 1 sec delay because it takes more requests to actually spawn a new process. This rewrite will throw the one second delay out of the window. It will check more often if a process is available and it will try less often to spawn a new process. The original code would take 64 seconds of trying before it gave up and a HTTP_SERVICE_UNAVAILABLE was returned. My new code takes 60.8 seconds for this to happen, but what happens during this time is much different. Original: 64000ms (64x spawn attempts, 128 process apply attempts) New: 60800ms (8x spawn attempts, 148 process apply attempts) But where the old code was linear, it would just check every second, the new code is not. This table will show the spawn attempts and the process apply attempts. There are 8 spawn attempts. and for each spawn attempt a number of process apply attempts is done. The time between these attempts also differs, small waits at the beginning (and end) and long waits in the middle. 0) 2 x 50ms = 100ms 1) 8 x 200ms = 1600ms 2) 14 x 350ms = 4900ms 3) 20 x 500ms = 10000ms 4) 26 x 650ms = 16900ms 5) 26 x 500ms = 13000ms 6) 26 x 350ms = 9100ms 7) 26 x 200ms = 5200ms Shortening the waits at the end will prevent long waiting requests to starve and hopefully allow less HTTP_SERVICE_UNAVAILABLE when there is a short peak/overload on the server. We are using this patch in production for two months now after a three month test period. Both on servers with single high load sites and with low load small sites.
*** Bug 56308 has been marked as a duplicate of this bug. ***
*** Bug 56719 has been marked as a duplicate of this bug. ***
Any progress on this issue? Using mod_fcgid-2.3.9 is production environment and running into the same performance issues.
another solution would be to reduce the polling interval, as your suggestion would cause HTTP 503 errors appear almost immediately if no slot is available instead of 60 seconds. To archive this the first step is to change the retry amount, e.g. to reduce the interval to 0.1 seconds: From: FCGID_APPLY_TRY_COUNT 2 To: FCGID_APPLY_TRY_COUNT 11 Beside of this the interval has to be reduced: if (i > 0 || j > 0 || count_busy_processes(r, &fcgi_request)) { apr_sleep(apr_time_from_sec(1)); To if (i > 0 || j > 0 || count_busy_processes(r, &fcgi_request)) { apr_sleep(apr_time_from_msec(1000 / (FCGID_APPLY_TRY_COUNT - 1)); To improve the performance it can make sense to write the requested 100ms delay in the example hardcoded `apr_time_from_msec(100)` or as precompiler directive, as it needs to be calculated several hundred times per request for high traffic environments.