Bug 56984

Summary: Random and frequent Segmentation fault (11)
Product: Apache httpd-2 Reporter: victor_volpe
Component: PlatformAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED INVALID    
Severity: critical CC: jkaluza, katilthe, ylavic.dev
Priority: P2    
Version: 2.4.7   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description victor_volpe 2014-09-15 13:47:36 UTC
After I moved from 2.2 to 2.4 start the headaches. The crashes are random and occur very frequently. No hardware problem related, same behavior on an Cloud server. Same behavior with PHP 5.4.32 and 5.6.0. The crash occurs in any PHP script, with no warning/error in the logs.

OS: FreeBSD 10.0-RELEASE-p7 amd64

error_log:
[Sun Sep 14 22:15:20.114756 2014] [core:notice] [pid 80717:tid 34410095616] AH00052: child pid 81760 exit signal Segmentation fault (11)
[Sun Sep 14 22:15:22.180905 2014] [core:notice] [pid 80717:tid 34410095616] AH00052: child pid 81727 exit signal Segmentation fault (11)
[Sun Sep 14 22:27:14.973207 2014] [core:notice] [pid 80717:tid 34410095616] AH00052: child pid 81807 exit signal Segmentation fault (11)

Apache 2.4.10 compile options:
./configure --prefix=/usr/apache --exec-prefix=/usr --sysconfdir=/etc/apache --enable-modules=none --enable-mods-static="authn_core authn_file authz_host authz_groupfile authz_user authz_core auth_basic socache_shmcb so include filter mime log_config env expires headers setenvif version ssl unixd status autoindex suexec cgi negotiation dir actions userdir alias rewrite"

PHP 5.6.0 compile options:
./configure --prefix=/usr/php5 --exec-prefix=/usr --with-apxs2=/usr/bin/apxs --with-config-file-path=/etc --with-openssl --with-zlib --with-bz2 --with-curl --enable-ftp --with-gd --with-jpeg-dir --with-png-dir --with-zlib-dir --enable-mbstring --with-mcrypt --with-mssql --with-mysql --with-mysql-sock=/tmp/mysql.sock --with-mysqli=/usr/local/bin/mysql_config --enable-sockets --enable-zip

php.ini custom configurations:
expose_php = Off
error_reporting = E_ALL & ~E_NOTICE
date.timezone = "America/Sao_Paulo"
default_charset = ""

gdb output:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 807418000 (LWP 100188/httpd)]
0x00000008038b6157 in zend_stack_push (stack=0x808a9e4e8, element=0x808a9e4c8, size=<value optimized out>) at zend_stack.c:42
42              stack->elements[stack->top] = (void *) emalloc(size);
Current language:  auto; currently minimal
(gdb) bt
#0  0x00000008038b6157 in zend_stack_push (stack=0x808a9e4e8, element=0x808a9e4c8, size=<value optimized out>) at zend_stack.c:42
#1  0x000000080387ee63 in compile_file (file_handle=<value optimized out>, type=<value optimized out>, tsrm_ls=0x8089a4900) at zend_language_scanner.l:583
#2  0x0000000803752abf in phar_compile_file (file_handle=0x7ffffdbed920, type=2, tsrm_ls=0x8089a4900) at phar.c:3411
#3  0x00000008038b9735 in zend_execute_scripts (type=2, tsrm_ls=0x8089a4900, retval=0x0, file_count=1) at zend.c:1307
#4  0x000000080394cb8a in php_handler (r=0x80ba1c0a0) at sapi_apache2.c:669
#5  0x0000000000471fdb in ap_run_handler ()
#6  0x00000000004728bd in ap_invoke_handler ()
#7  0x00000000004b8489 in ap_process_async_request ()
#8  0x00000000004b301e in ap_process_http_async_connection ()
#9  0x00000000004b2d12 in ap_process_http_connection ()
#10 0x000000000048326b in ap_run_process_connection ()
#11 0x000000000053b2df in process_socket ()
#12 0x0000000000537b29 in worker_thread ()
#13 0x000000080208c4a4 in pthread_create () from /lib/libthr.so.3
#14 0x0000000000000000 in ?? ()
(gdb) bt full
#0  0x00000008038b6157 in zend_stack_push (stack=0x808a9e4e8, element=0x808a9e4c8, size=<value optimized out>) at zend_stack.c:42
No locals.
#1  0x000000080387ee63 in compile_file (file_handle=<value optimized out>, type=<value optimized out>, tsrm_ls=0x8089a4900) at zend_language_scanner.l:583
        original_lex_state = {yy_leng = 0, yy_start = 0x0, yy_text = 0x0, yy_cursor = 0x0, yy_marker = 0x0, yy_limit = 0x0, yy_state = 0, state_stack = {top = 0, max = 0, elements = 0x0}, in = 0x0, lineno = 0, filename = 0x0,
  script_org = 0x0, script_org_size = 0, script_filtered = 0x0, script_filtered_size = 0, input_filter = 0, output_filter = 0, script_encoding = 0x0}
        retval_znode = {op_type = 1, u = {op = {constant = 1, var = 1, num = 1, hash = 1, opline_num = 1, jmp_addr = 0x1, zv = 0x1, literal = 0x1, ptr = 0x1}, constant = {value = {lval = 1, dval = 4.9406564584124654e-324, str = {
          val = 0x1 <Error reading address 0x1: Bad address>, len = 22}, ht = 0x1, obj = {handle = 1, handlers = 0x16}}, refcount__gc = 1, type = 1 '\001', is_ref__gc = 0 '\0'}, op_array = 0x1}, EA = 51444928}
        original_active_op_array = (zend_op_array *) 0x0
        original_in_compilation = 0 '\0'
        compiler_result = <value optimized out>
        retval = <value optimized out>
#2  0x0000000803752abf in phar_compile_file (file_handle=0x7ffffdbed920, type=2, tsrm_ls=0x8089a4900) at phar.c:3411
        __bailout = {{_sjb = {34417748607, 140737450531104, 140737450530424, 140737450530720, 140737450531232, 0, 0, 34504067328, 140737450476415, 34389426431, 1280, 34359738368}}}
        __bailout = {{_sjb = {34417748607, 140737450531104, 140737450530424, 140737450530720, 140737450531232, 0, 0, 34504067328, 140737450476415, 34389426431, 1280, 34359738368}}}
        __orig_bailout = (sigjmp_buf *) 0x7ffffdbed9a0
        name = 0x0
        phar = (phar_archive_data *) 0x50
        name = 0x0
        res = <value optimized out>
        failed = Error accessing memory address 0x0: Bad address.
Comment 1 Mike Rumph 2014-09-16 00:25:22 UTC
Hello Victor,

My guess is that when you went from 2.2 to 2.4 you switched from MPM prefork to MPM worker.  This would happen by default.
You can verify this by running httpd -M.
If I remember correctly, PHP routines are known to not be thread-safe.
You can try switching back to MPM prefork and see if your problems go away.

You can do this by adding --with-mpm=prefork to your httpd configure.

Thanks,

Mike Rumph
Comment 2 victor_volpe 2014-09-16 03:11:35 UTC
(In reply to Mike Rumph from comment #1)
> Hello Victor,
> 
> My guess is that when you went from 2.2 to 2.4 you switched from MPM prefork
> to MPM worker.  This would happen by default.
> You can verify this by running httpd -M.
> If I remember correctly, PHP routines are known to not be thread-safe.
> You can try switching back to MPM prefork and see if your problems go away.
> 
> You can do this by adding --with-mpm=prefork to your httpd configure.
> 
> Thanks,
> 
> Mike Rumph

OMG! Now my Apache is working like a charm. Thank you very much pal.
Comment 3 gmoniker 2015-03-12 17:41:18 UTC
Even with prefork worker, the PHP5 module segfaults on pipelined client requests when the script is not cached by the PHP5 opcache.
Comment 4 gmoniker 2015-03-12 17:42:15 UTC
The bug is instantly reproducible on a fresh Ubuntu 14.04.

Install server.
apt-get install apache2 php5 netcat

Setup a small PHP script with output in /var/www/html/
For Example as test.php
<?php
 echo date('U').PHP_EOL;
?>

shutdown Apache service:
apache2ctl stop

Disable opcache with semicolon in front in /etc/php5/apache2/conf.d/05-opcache.ini

Start Apache debug server in separate shell
apache2ctl -X

Execute this script (thanks to biggi at stefna dot is):
echo -e "GET /test.php HTTP/1.1\nHost: localhost\n\nGET /test.php HTTP/1.1\nHost: localhost\n\n"|nc localhost 80

Result: immediate segfault

If you do only one request per connection, no problem
If you activate opcache, you will only get one response, the second one appears on the stdout of the debug apache
If you fill the opcache it will segfault if it can't fit the script anymore

I confirmed the same behaviour on Centos 7 with its default packaged Apache
Comment 6 jkaluza 2015-03-19 13:09:52 UTC
I think this is result of expected httpd-2.4.x behaviour change.

In httpd-2.2.x, after each request, httpd destroyed r->pool. Apparently, PHP used that to clear its internal data before handling next request.

In httpd-2.4.x, we are passing EOR bucket down the stream and once it's freed, r->pool is destroyed. Thanks to the scanning done in core_output_filter, the response with EOR bucket does not have to be sent right after the ap_pass_brigade call.

So the result is, that request is handled, EOR bucket is created, but it's not sent yet. r->pool is therefore not cleared, but new request handling is started. This leads to mod_php crash, because its internal data structures are not cleared in r->pool clean-up *before* the next request comes.

In 2.2.x, r->pool has been cleaned-up once its handling has been finished.
In 2.4.x, r->pool is cleaned-up once it is fully sent.

I think PHP could should be fixed to not use r->pool to clean-up things before next request.
Comment 7 jkaluza 2015-03-19 14:07:28 UTC
> In 2.2.x, r->pool has been cleaned-up once its handling has been finished.
> In 2.4.x, r->pool is cleaned-up once it is fully sent.

This should have been:

In 2.2.x, r->pool has been cleaned-up once the request handling has been finished.
In 2.4.x, r->pool is cleaned-up once the response is fully sent.
Comment 8 gmoniker 2015-03-19 18:59:59 UTC
There is now activity going on in the php5 Apache2 handler to solve the segfaults with pipelined client requests by not relying anymore on the destruction of the request pool between two client requests on the same connection.

However there is some confusion about when the pool destructions are eventually going to run. It seems that disconnection is the only signal for a prefork worker to process the EOR buckets. Responses are fully sent earlier but not the EOR bucket. What if the client indicated keep-alive? 

In this situation, module programmers will have to think hard about what they are allocating in the pool. There will be lots of modules that rely on cleanup callbacks to run after each client request gets processed, and a module programmer cannot influence the amount of requests sent on one connection by the client or for how long it stays connected. It seems that doing your own memory management is the only option going forward for anything that exceeds a couple of kilobytes.
Comment 9 gmoniker 2015-03-23 11:49:16 UTC
The problem with Apache 2.4 hanging on to pool data in case of persistent client connection until sometime in follow-on requests is not quite as bad as I was afraid of. Some output exceeding the buffer space in a follow-on request will be enough to flush the EOR bucket and destroy the pool space.

But is there any reason not to flush the response data and log the transaction immediately after processing a client request? Instead of pool destroy, the worker could call a flush after processing one client request.

Even if this is expected behaviour, and given that the PHP handler is doing it wrong by depending on the previous request cleanup having been completed when entering the processing of a new client request, it seems kind of strange that output, and memory of one script is held back and only flushed dependent on  circumstances that are completely extraneous to it.

If for example follow-on processing causes a segfault and no response was sent yet by the first script to the client, the client will believe that the script never finished. So this behaviour seems to run counter to the expectation of stateless requests. Given that the user of a browser does not usually have much control of whether his browser uses pipelined connections or not, he cannot be said to have made a conscious choice for stateful behaviour.
Comment 10 Yann Ylavic 2015-03-23 17:00:24 UTC
(In reply to gmoniker from comment #9)
> But is there any reason not to flush the response data and log the
> transaction immediately after processing a client request? Instead of pool
> destroy, the worker could call a flush after processing one client request.

That would, I think, defeat the main goal of pipelining which is: fill in the pipe.

> 
> Even if this is expected behaviour, and given that the PHP handler is doing
> it wrong by depending on the previous request cleanup having been completed
> when entering the processing of a new client request, it seems kind of
> strange that output, and memory of one script is held back and only flushed
> dependent on  circumstances that are completely extraneous to it.

The circonstance is precisely that the next (pipelined) request is already here when the current request is done, otherwise the flush is done (see the test on !c->data_in_input_filters in ap_process_request(), or in MPM event the test on c->data_in_output_filters so that all data get written before entering keepalive state).

IOW, the current implementation considers pipelined requests should be answered with pipelined responses, as much as possible.

> 
> If for example follow-on processing causes a segfault and no response was
> sent yet by the first script to the client, the client will believe that the
> script never finished. So this behaviour seems to run counter to the
> expectation of stateless requests. Given that the user of a browser does not
> usually have much control of whether his browser uses pipelined connections
> or not, he cannot be said to have made a conscious choice for stateful
> behaviour.

If some process crashes, I'm not sure the client should rely on the number responses it actually received to deduce any state on the server side, it solely depends on whether the requests sent were idempotent or not (IMHO).
Comment 11 gmoniker 2015-03-24 00:04:21 UTC
Hello Yann,

Thanks for your clear story about the design principles. I can see that coalescing writes to the client means more performance. I was also missing the point about idempotent requests. The browser/client should not resend possible non-idempotent requests as soon as they have left the client even if it doesn't get a reply from the server.

I still think though that there are some concerns about this when you have requests with a long runtime, high pool usage and very small ack-like response. It could take several requests before any output gets sent back and memory gets cleared.

Also apache modules that didn't get updated to take this into account can misbehave as is now painfully clear. Not only can they have issues with pool cleanups not having run, but also with reentry of handlers that are also registered for output filters and respond to an EOS bucket of a previous request while processing a follow-on request.

Most of the time I am just the custodian of a webserver and am most concerned with it functioning reliable and not having the highest possible performance. So I guess what I should be asking is, can we make it a feature request to have the option of configging Apache 2.4 to do a flush after each client request?

Thanks.
Comment 12 Yann Ylavic 2015-03-24 08:19:00 UTC
(In reply to gmoniker from comment #11)
> 
> I still think though that there are some concerns about this when you have
> requests with a long runtime, high pool usage and very small ack-like
> response. It could take several requests before any output gets sent back
> and memory gets cleared.

The maximum number of pipelined responses is 5 (see ap_core_output_filter()'s MAX_REQUESTS_IN_PIPELINE), and when this limit is reached, all 5 requests are destroyed at once.

> 
> Also apache modules that didn't get updated to take this into account can
> misbehave as is now painfully clear. Not only can they have issues with pool
> cleanups not having run, but also with reentry of handlers that are also
> registered for output filters and respond to an EOS bucket of a previous
> request while processing a follow-on request.

Modules switching from httpd < 2.4 to >= 2.4 should normally take the new features into account (eg. pipelining), otherwise no major advancement would be possible on the httpd side.

> 
> Most of the time I am just the custodian of a webserver and am most
> concerned with it functioning reliable and not having the highest possible
> performance. So I guess what I should be asking is, can we make it a feature
> request to have the option of configging Apache 2.4 to do a flush after each
> client request?

Maybe MAX_REQUESTS_IN_PIPELINE could be configurable, or/and proxy/cgi(/php?) modules improved to flush when further responses take time to come (mod_proxy_http does that already IIRC)...

That's another story though, and we should discuss this on user@ (or dev@) mailing list, or create another bugzilla ticket (enhancement) for this.
Comment 13 Yann Ylavic 2015-04-22 18:59:48 UTC
*** Bug 57846 has been marked as a duplicate of this bug. ***