Bug 59840

Summary: Segmentation fault during the request processing
Product: Apache httpd-2 Reporter: Konstantin J. Chernov <k.j.chernov>
Component: mod_http2Assignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED CLOSED    
Severity: major CC: apache-bugzilla, cbarbara
Priority: P2    
Version: 2.4.23   
Target Milestone: ---   
Hardware: Sun   
OS: Solaris   
Attachments: pstack output of coredumps
core.apache.httpd.12673.1473072482.stack
three more full pstack outputs

Description Konstantin J. Chernov 2016-07-11 02:18:36 UTC
SunOS 5.10 Generic_150401-20 i86pc

Hi.

After turning on mod_http2 and enabling h2 protocol, I've noticed child crashes approx. once per day:

[2016-07-08 21:07:10.035281] [core:notice] [pid 15804:tid 1] AH00051: child pid 1617 exit signal Segmentation fault (11), possible coredump in /pub/apache/cores
[2016-07-09 18:13:12.624654] [core:notice] [pid 15804:tid 1] AH00051: child pid 675 exit signal Segmentation fault (11), possible coredump in /pub/apache/cores

Got two core dumps for now:
$ mdb core.apache.httpd.1617.1467990419 
Loading modules: [ ld.so.1 libc.so.1 libuutil.so.1 ]
> ::stack
libcrypto.so.1.0.0`X509_get_ext_d2i()
mod_ssl.so`modssl_var_extract_san_entries+0x56()
mod_ssl.so`ssl_hook_Fixup+0x1b9()
ap_run_fixups+0x61()
ap_process_request_internal+0xd08()
ap_process_async_request+0x6f7()
ap_process_request+0x24()
mod_http2.so`h2_task_process_request+0x3fd()
mod_http2.so`h2_task_process_conn+0x228()
ap_run_process_connection+0x61()
mod_http2.so`h2_task_do+0x5e2()
mod_http2.so`execute+0x50()
libapr-1.so.0.5.2`dummy_worker+0x30()
libc.so.1`_thr_setup+0x5b()
libc.so.1`_lwp_start()

$ mdb core.apache.httpd.675.1468066381 
Loading modules: [ ld.so.1 libc.so.1 libuutil.so.1 ]
> ::stack
libcrypto.so.1.0.0`X509_get_subject_name()
mod_ssl.so`ssl_hook_Fixup+0x19f()
ap_run_fixups+0x61()
ap_process_request_internal+0xd08()
ap_process_async_request+0x6f7()
ap_process_request+0x24()
mod_http2.so`h2_task_process_request+0x3fd()
mod_http2.so`h2_task_process_conn+0x228()
ap_run_process_connection+0x61()
mod_http2.so`h2_task_do+0x5e2()
mod_http2.so`execute+0x50()
libapr-1.so.0.5.2`dummy_worker+0x30()
libc.so.1`_thr_setup+0x5b()
libc.so.1`_lwp_start()
> 

Tested with both 2.4.18/2.4.23, openssl 1.0.2g/1.0.2h.

If there's anything more I can do -- will be glad to help.
Comment 1 Stefan Eissing 2016-07-11 10:19:45 UTC
Thanks for the report. There seem to be issues with concurrent access to SSL variables when SSLOptions StdEnvVars is active.

I have to check back with the mod_ssl devs on how to best mitigate this issue. For now, you may try removing the 'StdEnvVars' option - if your application can live with it - and test if that fixes the issue.
Comment 2 Konstantin J. Chernov 2016-07-12 03:39:44 UTC
Thanks for the advice!

I removed StdEnvVars from the SSLOptions, will see if there are any changes.
Comment 3 Konstantin J. Chernov 2016-07-13 01:25:09 UTC
Got another segfault yesterday, though I'm not sure whether it is related to the original problem:

[2016-07-12 23:58:06.817849] [core:notice] [pid 15804:tid 1] AH00051: child pid 29484 exit signal Segmentation fault (11), possible coredump in /pub/apache/cores

$ mdb core.apache.httpd.29484.1468346277 
> ::stack
mod_http2.so`h2_brigade_mem_size+0x2e()
mod_http2.so`h2_conn_io_pass+0x33b()
mod_http2.so`on_send_data_cb+0xa45()
libnghttp2.so.14.8.0`nghttp2_session_mem_send_internal+0x165()
libnghttp2.so.14.8.0`nghttp2_session_send+0x59()
mod_http2.so`h2_session_send+0x68()
mod_http2.so`h2_session_process+0x1863()
mod_http2.so`h2_conn_run+0x56()
mod_http2.so`h2_h2_process_conn+0x1122()
ap_run_process_connection+0x61()
process_socket+0x4ab()
worker_thread+0x351()
libapr-1.so.0.5.2`dummy_worker+0x30()
libc.so.1`_thr_setup+0x5b()
libc.so.1`_lwp_start()
Comment 4 Stefan Eissing 2016-07-26 14:05:02 UTC
Unless we are talking about memory corruption, the second one is unrelated. Since this is the first one of that kind, I am unsure of the reason. 

Has this ever happened since?
Comment 5 Konstantin J. Chernov 2016-07-26 14:18:31 UTC
Stefan,

after upgrading to 2.4.23 and turning on http2 module I've got several segfaults, even though the number of them reduced after I turned off StdEnvVars. Before upgrade, I was using 2.4.18 without http2, and never had a segfault.

Here are the backtraces (not mentioning one from my last comment):
$ mdb core.apache.httpd.1407.1468993750 
Loading modules: [ ld.so.1 libc.so.1 libuutil.so.1 ]
> ::stack
libc.so.1`_read+0xa()
ap_mpm_podx_check+0x38()
child_main+0x418()
make_child+0x143()
perform_idle_server_maintenance+0x67f()
server_main_loop+0x346()
event_run+0x320()
ap_run_mpm+0x71()
main+0xe89()
_start+0x6c()

$ mdb core.apache.httpd.14988.1469184078 
Loading modules: [ ld.so.1 libc.so.1 libuutil.so.1 ]
> ::stack
libc.so.1`_read+0xa()
ap_mpm_podx_check+0x38()
child_main+0x418()
make_child+0x143()
perform_idle_server_maintenance+0x67f()
server_main_loop+0x346()
event_run+0x320()
ap_run_mpm+0x71()
main+0xe89()
_start+0x6c()

$ mdb core.apache.httpd.23206.1469009815 
Loading modules: [ ld.so.1 libc.so.1 libuutil.so.1 ]
> ::stack
mod_deflate.so`deflate_out_filter+0x2301()
mod_filter.so`filter_harness+0x1dc()
ap_pass_brigade+0xbc()
end_output_stream+0xb8()
ap_finalize_request_protocol+0x32()
ap_die_r+0x32()
ap_process_async_request+0x793()
ap_process_request+0x24()
mod_http2.so`h2_task_process_request+0x3fd()
mod_http2.so`h2_task_process_conn+0x228()
ap_run_process_connection+0x61()
mod_http2.so`h2_task_do+0x5e2()
mod_http2.so`execute+0x50()
libapr-1.so.0.5.2`dummy_worker+0x30()
libc.so.1`_thr_setup+0x5b()
libc.so.1`_lwp_start()

I've also included pstack output.

If there's any info I can provide to make things more clear, I'll be glad to do so.
Comment 6 Konstantin J. Chernov 2016-07-26 14:19:39 UTC
Created attachment 34074 [details]
pstack output of coredumps
Comment 7 Stefan Eissing 2016-07-28 12:45:45 UTC
These are all very weird places. I would ask you to disable h2 for the time being and check if 2.4.23 runs as stable on your platform as 2.4.18 did. Let's just verify that there not something else here going on.
Comment 8 Konstantin J. Chernov 2016-07-28 14:59:07 UTC
Stefan,

thanks for the reply, I'll disable h2 tomorrow and leave it disabled for about a week.
Comment 9 Konstantin J. Chernov 2016-08-08 01:02:39 UTC
There were no segfaults since 29/07/2016 (the date when I disabled http2 module).
I assume the problem is not caused by 2.4.23 itself :(
Comment 10 Konstantin J. Chernov 2016-08-15 02:49:14 UTC
Right after enabling http2 again, got a few more segfaults :(

Here's the latest one (+2 more with the same backtraces as in my previous comments):
> ::stack
libaprutil-1.so.0.5.4`apr_brigade_cleanup+0x43()
libaprutil-1.so.0.5.4`brigade_cleanup+0x18()
libapr-1.so.0.5.2`run_cleanups+0x3c()
libapr-1.so.0.5.2`apr_pool_destroy+0x57()
eor_bucket_destroy+0x2a()
libaprutil-1.so.0.5.4`apr_brigade_cleanup+0x60()
ap_process_request_after_handler+0xc6()
ap_process_async_request+0x79f()
ap_process_request+0x24()
mod_http2.so`h2_task_process_request+0x3fd()
mod_http2.so`h2_task_process_conn+0x228()
ap_run_process_connection+0x61()
mod_http2.so`h2_task_do+0x5e2()
mod_http2.so`execute+0x50()
libapr-1.so.0.5.2`dummy_worker+0x30()
libc.so.1`_thr_setup+0x5b()
libc.so.1`_lwp_start()
Comment 11 Stefan Eissing 2016-08-25 12:45:56 UTC
Thanks Konstantin. At least it's clear that http2 is the culprit and the backtrace points to connection shutdown. Have do analyze that a bit more...
Comment 12 Stefan Eissing 2016-08-29 07:53:11 UTC
Konstantin,

are you able to build a new mod_http2 for testing? The current state of branch 2.4.x contains a cleanup fix that might address the issues you observe.

As an alternative, you can also you the github release at https://github.com/icing/mod_h2/releases/tag/v1.6.1 if you have apxs and autoconf on your system.
Comment 13 Konstantin J. Chernov 2016-08-29 10:04:28 UTC
Hi.

Will it be enough to replace mod_http2 folder contents from https://github.com/apache/httpd/tree/trunk/modules/http2 and recompile the Apache?
Comment 14 Konstantin J. Chernov 2016-08-30 07:13:16 UTC
I've replaced modules/http2 from https://github.com/apache/httpd/tree/2.4.x (hopefully, this link is correct) and rebuilt the apache with the rest of the sources left unchanged, then replaced single mod_http2.so on prod environment.

Will see if there are any changes.
Comment 15 Konstantin J. Chernov 2016-09-06 01:20:05 UTC
After more than a week of stable work, there were 3 segfaults yesterday, all with the same stack:
> ::stack
libc.so.1`_read+0xa()
ap_mpm_podx_check+0x38()
child_main+0x418()
make_child+0x143()
perform_idle_server_maintenance+0x67f()
server_main_loop+0x346()
event_run+0x320()
ap_run_mpm+0x71()
main+0xe89()
_start+0x6c()

I'm not really sure they are somehow connected to mod_http2, though without mod_http2 there are no segfaults at all.
Comment 16 Eric Covener 2016-09-06 01:38:59 UTC
(In reply to Konstantin J. Chernov from comment #15)
> After more than a week of stable work, there were 3 segfaults yesterday, all
> with the same stack:
> > ::stack
> libc.so.1`_read+0xa()
> ap_mpm_podx_check+0x38()
> child_main+0x418()
> make_child+0x143()
> perform_idle_server_maintenance+0x67f()
> server_main_loop+0x346()
> event_run+0x320()
> ap_run_mpm+0x71()
> main+0xe89()
> _start+0x6c()
> 
> I'm not really sure they are somehow connected to mod_http2, though without
> mod_http2 there are no segfaults at all.

That is not the crashing thread, there must be some other that looks a little more suspicious.
Comment 17 Konstantin J. Chernov 2016-09-06 03:02:39 UTC
Created attachment 34203 [details]
core.apache.httpd.12673.1473072482.stack
Comment 18 Konstantin J. Chernov 2016-09-06 03:03:21 UTC
I've attached full pstack of one of three coredumps: core.apache.httpd.12673.1473072482.stack
Hopefully it will give more valuable info.
Comment 19 Stefan Eissing 2016-09-07 03:02:53 UTC
Hmm, I do not see it. Someone with better eyes than me?
Comment 20 Eric Covener 2016-09-07 03:08:43 UTC
(In reply to Stefan Eissing from comment #19)
> Hmm, I do not see it. Someone with better eyes than me?

t94 is suspicious
Comment 21 Konstantin J. Chernov 2016-09-08 00:47:13 UTC
Created attachment 34222 [details]
three more full pstack outputs
Comment 22 Stefan Eissing 2016-09-12 08:45:22 UTC
Just for completeness since I did not ask yet: do you see anything in the error logs from mod_http2. Any warning?
Comment 23 Konstantin J. Chernov 2016-09-13 07:17:41 UTC
I've looked through the logs: out of 5 crashes we had during the last week, there were only one crash that corresponds with the warning from mod_http2:
[2016-09-10 14:34:41.832137] [http2:warn] [pid 10635:tid 91] [C:AB0Zx5UDdAA] AH03198: h2_mplx(172): release, waiting for 87865 seconds now for 1 h2_workers to return, have still 21 tasks outstanding

But that was not written by the process(PID/TID) that crashed, as the messages from this pid/tid appears further in the logs.
Comment 24 Stefan Eissing 2018-04-28 11:15:55 UTC
Since the mentioned releases, various bugs have been identified and fixed that could explain the observed behaviour. Also fuzzing is being done for the HTTP/2 implementation which improved robustness.

If there is no new data to report on a current version, I'd like to close this ticket.