Bug 58488

Summary: Stoped PHP-FPM pool in multi-pool system with comunication over UDS exhausts connection and causes mod_proxy_fcgi crash
Product: Apache httpd-2 Reporter: carlos.nieto
Component: mod_proxy_fcgiAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: normal CC: carlos.nieto, szg0000, toscano.luca
Priority: P2    
Version: 2.4.9   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description carlos.nieto 2015-10-08 12:10:06 UTC
In a multi-pool PHP-FPM when one of the PHP-FPM pools is stopped, but apache continues processing requests towards the pool, the pools of others applications begins to fails.

In all the application's pools communication is realized over UDS.

There are registered segmentation faults in Apache logs.

[Tue Oct 06 12:52:14.272956 2015] [core:notice] [pid 8563:tid 140096632240128] AH00052: child pid 13967 exit signal Segmentation fault (11)

Also the /var/log/messages show segfaults.

Oct  6 12:21:30 phcaeproma01 kernel: httpd[13647]: segfault at 55 ip 00007f6ac91aaab2 sp 00007f6ac17fa860 error 6 in mod_proxy_fcgi.so[7f6ac91a8000+4000]

This direction, corresponds to:

addr2line -e /opt/apache-2.4/modules/mod_proxy.so -fCi 0x6AB2
ap_proxy_connection_create
/home/software/php/httpd-2.4.9/modules/proxy/proxy_util.c:2790

Line 2790 in proxy_util.c correspons to:

    ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, s, APLOGNO(00962)
                 "%s: connection complete to %pI (%s)",
                 proxy_function, backend_addr, conn->hostname);

This line, make be sanitized in similar form to bug https://bz.apache.org/bugzilla/show_bug.cgi?id=56858

But the source of error, i think, is the stablishment of the connection, when connections are UDS, it seems to me, that mod_proxy_fcgi does not detect that the PHP-FPM pool is closed and continues making connections instead of return a 503 or similar error condition. The UDS file is present, but nobody hear the request, and the connections are continuosly creating until some condition causes no more connections can be created, and the above ap_log_error receives a null connection.
Comment 1 Luca Toscano 2016-07-26 08:14:27 UTC
Hi Carlos,

really sorry for the delay. This bug sounds really bad, would you mind to add some basic data about your httpd version and maybe some steps about how to repro it?
Comment 2 carlos.nieto 2016-07-26 09:02:45 UTC
Of course Luca, httpd version was 2.4.9, you can repro the bug in a configuration with at least two PHP-FPM pools backends and a apache frontend communicating over UDS. Configuration is something like this:

<VirtualHost *:80>
    ServerName poolA.test.com
    DocumentRoot "/opt/poolA/htdocs/"

    <Proxy "unix:/dev/shm/poolA.sock|fcgi://php-fpm-poolA.local">
      ProxySet min=0
      ProxySet acquire=20
      ProxySet connectiontimeout=100ms
      ProxySet retry=0
      ProxySet timeout=300
      Require all granted
    </Proxy>

    # Sintaxis 2.4.10
    <FilesMatch \.php$>
      SetHandler "proxy:fcgi://php-fpm-poolA.local"
      Require all granted
    </FilesMatch>

    RedirectMatch ^/$ /index.php
</VirtualHost>

<VirtualHost *:80>
    ServerName poolB.test.com
    DocumentRoot "/opt/poolB/htdocs/"

    <Proxy "unix:/dev/shm/poolB.sock|fcgi://php-fpm-poolB.local">
      ProxySet min=0
      ProxySet acquire=20
      ProxySet connectiontimeout=100ms
      ProxySet retry=0
      ProxySet timeout=300
      Require all granted
    </Proxy>

    # Sintaxis 2.4.10
    <FilesMatch \.php$>
      SetHandler "proxy:fcgi://php-fpm-poolB.local"
      Require all granted
    </FilesMatch>

    RedirectMatch ^/$ /index.php
</VirtualHost>

So, you can shutdown one of the PHP-FPM pools, by example poolB, killing their FPM processes, and inject requests to both pools, the active pool poolA responds, and the no active pool, poolB, returns 503 errors.

But if you persist the requests to both pool, then sometime crash happens and both pools fails.

I have observed this behavior in a production system with high load in a pool with we have shutdowned for maintenance and then past a time, the other pools have begun these failures, so, instead of a simple shutdown, we had to set up a maintenance page.
Comment 3 Luca Toscano 2016-08-11 12:08:05 UTC
Extra question: since 2.4.9 is a bit old, have you tried to reproduce with a more recent version of httpd? 

I am asking this because I noticed this entry in the changelog (https://www.apache.org/dist/httpd/CHANGES_2.4) for 2.4.10:

  *) mod_proxy_fcgi: Don't segfault when failing to connect to the backend.
     (regression in 2.4.9 release) [Jeff Trawick]

https://svn.apache.org/r1592998

If you still have patience and time it would be really great if you could test at least 2.4.10 and see if it solves the problem (or maybe only the patch mentioned above).

Also 2.4.9 -> 2.4.12 brought a lot of mod_proxy_fcgi fixes. The more recent version of httpd the better :)

Thanks!

Luca
Comment 4 Luca Toscano 2017-02-04 15:28:09 UTC
Hi Carlos,

any news?

Luca