Summary: | Segmentation fault during the request processing | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Konstantin J. Chernov <k.j.chernov> |
Component: | mod_http2 | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED CLOSED | ||
Severity: | major | CC: | apache-bugzilla, cbarbara |
Priority: | P2 | ||
Version: | 2.4.23 | ||
Target Milestone: | --- | ||
Hardware: | Sun | ||
OS: | Solaris | ||
Attachments: |
pstack output of coredumps
core.apache.httpd.12673.1473072482.stack three more full pstack outputs |
Description
Konstantin J. Chernov
2016-07-11 02:18:36 UTC
Thanks for the report. There seem to be issues with concurrent access to SSL variables when SSLOptions StdEnvVars is active. I have to check back with the mod_ssl devs on how to best mitigate this issue. For now, you may try removing the 'StdEnvVars' option - if your application can live with it - and test if that fixes the issue. Thanks for the advice! I removed StdEnvVars from the SSLOptions, will see if there are any changes. Got another segfault yesterday, though I'm not sure whether it is related to the original problem:
[2016-07-12 23:58:06.817849] [core:notice] [pid 15804:tid 1] AH00051: child pid 29484 exit signal Segmentation fault (11), possible coredump in /pub/apache/cores
$ mdb core.apache.httpd.29484.1468346277
> ::stack
mod_http2.so`h2_brigade_mem_size+0x2e()
mod_http2.so`h2_conn_io_pass+0x33b()
mod_http2.so`on_send_data_cb+0xa45()
libnghttp2.so.14.8.0`nghttp2_session_mem_send_internal+0x165()
libnghttp2.so.14.8.0`nghttp2_session_send+0x59()
mod_http2.so`h2_session_send+0x68()
mod_http2.so`h2_session_process+0x1863()
mod_http2.so`h2_conn_run+0x56()
mod_http2.so`h2_h2_process_conn+0x1122()
ap_run_process_connection+0x61()
process_socket+0x4ab()
worker_thread+0x351()
libapr-1.so.0.5.2`dummy_worker+0x30()
libc.so.1`_thr_setup+0x5b()
libc.so.1`_lwp_start()
Unless we are talking about memory corruption, the second one is unrelated. Since this is the first one of that kind, I am unsure of the reason. Has this ever happened since? Stefan, after upgrading to 2.4.23 and turning on http2 module I've got several segfaults, even though the number of them reduced after I turned off StdEnvVars. Before upgrade, I was using 2.4.18 without http2, and never had a segfault. Here are the backtraces (not mentioning one from my last comment): $ mdb core.apache.httpd.1407.1468993750 Loading modules: [ ld.so.1 libc.so.1 libuutil.so.1 ] > ::stack libc.so.1`_read+0xa() ap_mpm_podx_check+0x38() child_main+0x418() make_child+0x143() perform_idle_server_maintenance+0x67f() server_main_loop+0x346() event_run+0x320() ap_run_mpm+0x71() main+0xe89() _start+0x6c() $ mdb core.apache.httpd.14988.1469184078 Loading modules: [ ld.so.1 libc.so.1 libuutil.so.1 ] > ::stack libc.so.1`_read+0xa() ap_mpm_podx_check+0x38() child_main+0x418() make_child+0x143() perform_idle_server_maintenance+0x67f() server_main_loop+0x346() event_run+0x320() ap_run_mpm+0x71() main+0xe89() _start+0x6c() $ mdb core.apache.httpd.23206.1469009815 Loading modules: [ ld.so.1 libc.so.1 libuutil.so.1 ] > ::stack mod_deflate.so`deflate_out_filter+0x2301() mod_filter.so`filter_harness+0x1dc() ap_pass_brigade+0xbc() end_output_stream+0xb8() ap_finalize_request_protocol+0x32() ap_die_r+0x32() ap_process_async_request+0x793() ap_process_request+0x24() mod_http2.so`h2_task_process_request+0x3fd() mod_http2.so`h2_task_process_conn+0x228() ap_run_process_connection+0x61() mod_http2.so`h2_task_do+0x5e2() mod_http2.so`execute+0x50() libapr-1.so.0.5.2`dummy_worker+0x30() libc.so.1`_thr_setup+0x5b() libc.so.1`_lwp_start() I've also included pstack output. If there's any info I can provide to make things more clear, I'll be glad to do so. Created attachment 34074 [details]
pstack output of coredumps
These are all very weird places. I would ask you to disable h2 for the time being and check if 2.4.23 runs as stable on your platform as 2.4.18 did. Let's just verify that there not something else here going on. Stefan, thanks for the reply, I'll disable h2 tomorrow and leave it disabled for about a week. There were no segfaults since 29/07/2016 (the date when I disabled http2 module). I assume the problem is not caused by 2.4.23 itself :( Right after enabling http2 again, got a few more segfaults :(
Here's the latest one (+2 more with the same backtraces as in my previous comments):
> ::stack
libaprutil-1.so.0.5.4`apr_brigade_cleanup+0x43()
libaprutil-1.so.0.5.4`brigade_cleanup+0x18()
libapr-1.so.0.5.2`run_cleanups+0x3c()
libapr-1.so.0.5.2`apr_pool_destroy+0x57()
eor_bucket_destroy+0x2a()
libaprutil-1.so.0.5.4`apr_brigade_cleanup+0x60()
ap_process_request_after_handler+0xc6()
ap_process_async_request+0x79f()
ap_process_request+0x24()
mod_http2.so`h2_task_process_request+0x3fd()
mod_http2.so`h2_task_process_conn+0x228()
ap_run_process_connection+0x61()
mod_http2.so`h2_task_do+0x5e2()
mod_http2.so`execute+0x50()
libapr-1.so.0.5.2`dummy_worker+0x30()
libc.so.1`_thr_setup+0x5b()
libc.so.1`_lwp_start()
Thanks Konstantin. At least it's clear that http2 is the culprit and the backtrace points to connection shutdown. Have do analyze that a bit more... Konstantin, are you able to build a new mod_http2 for testing? The current state of branch 2.4.x contains a cleanup fix that might address the issues you observe. As an alternative, you can also you the github release at https://github.com/icing/mod_h2/releases/tag/v1.6.1 if you have apxs and autoconf on your system. Hi. Will it be enough to replace mod_http2 folder contents from https://github.com/apache/httpd/tree/trunk/modules/http2 and recompile the Apache? I've replaced modules/http2 from https://github.com/apache/httpd/tree/2.4.x (hopefully, this link is correct) and rebuilt the apache with the rest of the sources left unchanged, then replaced single mod_http2.so on prod environment. Will see if there are any changes. After more than a week of stable work, there were 3 segfaults yesterday, all with the same stack:
> ::stack
libc.so.1`_read+0xa()
ap_mpm_podx_check+0x38()
child_main+0x418()
make_child+0x143()
perform_idle_server_maintenance+0x67f()
server_main_loop+0x346()
event_run+0x320()
ap_run_mpm+0x71()
main+0xe89()
_start+0x6c()
I'm not really sure they are somehow connected to mod_http2, though without mod_http2 there are no segfaults at all.
(In reply to Konstantin J. Chernov from comment #15) > After more than a week of stable work, there were 3 segfaults yesterday, all > with the same stack: > > ::stack > libc.so.1`_read+0xa() > ap_mpm_podx_check+0x38() > child_main+0x418() > make_child+0x143() > perform_idle_server_maintenance+0x67f() > server_main_loop+0x346() > event_run+0x320() > ap_run_mpm+0x71() > main+0xe89() > _start+0x6c() > > I'm not really sure they are somehow connected to mod_http2, though without > mod_http2 there are no segfaults at all. That is not the crashing thread, there must be some other that looks a little more suspicious. Created attachment 34203 [details]
core.apache.httpd.12673.1473072482.stack
I've attached full pstack of one of three coredumps: core.apache.httpd.12673.1473072482.stack Hopefully it will give more valuable info. Hmm, I do not see it. Someone with better eyes than me? (In reply to Stefan Eissing from comment #19) > Hmm, I do not see it. Someone with better eyes than me? t94 is suspicious Created attachment 34222 [details]
three more full pstack outputs
Just for completeness since I did not ask yet: do you see anything in the error logs from mod_http2. Any warning? I've looked through the logs: out of 5 crashes we had during the last week, there were only one crash that corresponds with the warning from mod_http2: [2016-09-10 14:34:41.832137] [http2:warn] [pid 10635:tid 91] [C:AB0Zx5UDdAA] AH03198: h2_mplx(172): release, waiting for 87865 seconds now for 1 h2_workers to return, have still 21 tasks outstanding But that was not written by the process(PID/TID) that crashed, as the messages from this pid/tid appears further in the logs. Since the mentioned releases, various bugs have been identified and fixed that could explain the observed behaviour. Also fuzzing is being done for the HTTP/2 implementation which improved robustness. If there is no new data to report on a current version, I'd like to close this ticket. |