Bug 65627 - apache httpd segfault on child exit
Summary: apache httpd segfault on child exit
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Core (show other bugs)
Version: 2.4.49
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: FixedInTrunk
Depends on:
Blocks:
 
Reported: 2021-10-11 20:58 UTC by acmondor
Modified: 2021-10-14 14:36 UTC (History)
3 users (show)



Attachments
Patch to prevent the segfault. (463 bytes, patch)
2021-10-11 20:58 UTC, acmondor
Details | Diff
server info output (114.14 KB, text/html)
2021-10-12 17:34 UTC, acmondor
Details
server info output (62.75 KB, text/plain)
2021-10-12 17:43 UTC, acmondor
Details
error log with additional info from mpm_itk (5.43 KB, text/plain)
2021-10-12 20:44 UTC, acmondor
Details

Note You need to log in before you can comment on or make changes to this bug.
Description acmondor 2021-10-11 20:58:50 UTC
Created attachment 38061 [details]
Patch to prevent the segfault.

This is to report a bug found on Gentoo and the fix for it.

The original bug reported on Gentoo was "www-servers/apache-2.4.49: segfault on (almost) every request" (https://bugs.gentoo.org/816258).

Testing revealed it seems to be specific to the use of mpm-itk, however there may be other senarios as well. And a review of the segfault traceback (see the above mentioned gentoo bug report) and the changes between 2.4.48 and 2.4.49 identified code changes in server/connection.c as the reason for the segfault. With those changes ap_lingering_close() attempts to pass a NULL pointer to apr_socket_close() when mpm-itk is in use, but there may be other senarios as well.

The changes between 2.4.48 and 2.4.49 in server/connection.c where introduced by this revision: https://svn.apache.org/viewvc?view=revision&revision=1891721

The fix for this bug (see attached patch file) is to avoid calling apr_socket_close() with a NULL like the previous code did.

This bug was found in 2.4.49, but is still present in 2.4.51 and that is were the fix was tested.
Comment 1 Ruediger Pluem 2021-10-12 07:51:45 UTC
While the patch is useful, csd should not really be NULL here. There was an issue in 2.4.49 that caused these segfaults under certain conditions, but this was fixed in 2.4.50 via https://svn.apache.org/viewvc?view=revision&revision=1893654. Can you please provide a stacktrace with 2.4.50 or 2.4.51 that shows the crash as something seems to wrong somewhere else and this might need fixing as well?
Comment 2 Jean Weisbuch 2021-10-12 10:04:36 UTC
The exact same bug happens on 2.4.49 to 2.4.51, here is a full backtrace with 2.4.51 :

Program received signal SIGSEGV, Segmentation fault.
0x00007fcac0ea5113 in apr_socket_close (thesocket=0x0) at ./network_io/unix/sockets.c:183
183	./network_io/unix/sockets.c: No such file or directory.
(gdb) bt f
#0  0x00007fcac0ea5113 in apr_socket_close (thesocket=0x0) at ./network_io/unix/sockets.c:183
No locals.
#1  0x0000558958ff7735 in ap_lingering_close (c=0x7fcac1699290) at connection.c:159
        dummybuf = "\002", '\000' <repeats 151 times>, "(\260i\301\312\177\000\000\360\241\",\374\177\000\000\020\027\373X\211U\000\000p\250\",\374\177", '\000' <repeats 18 times>...
        nbytes = 4294967295
        now = 94048392017996
        timeup = 0
        csd = 0x0
#2  0x000055895901392f in child_main (child_num_arg=9, child_bucket=0) at prefork.c:655
        current_conn = 0x7fcac1699290
        csd = 0x7fcac16990a0
        thd = 0x7fcac169b0a0
        osthd = 140508805886016
        sig_mask = {__val = {0, 0, 0, 0, 0, 0, 0, 0, 0, 536870912, 0, 0, 94048392146393, 0, 0, 0}}
        ptrans = 0x7fcac1699028
        allocator = 0x55895ab6ad20
        status = 0
        i = -1
        lr = 0x7fcac1743380
        pollset = 0x7fcac169b138
        sbh = 0x7fcac169b130
        bucket_alloc = 0x7fcac1695028
        last_poll_idx = 1
        lockfile = 0x0
#3  0x0000558959013b92 in make_child (s=0x7fcac173e328, slot=9) at prefork.c:756
        bucket = 0
        pid = 0
#4  0x0000558959013f6a in perform_idle_server_maintenance (p=0x7fcac176d028) at prefork.c:860
        i = 1
        idle_count = 8
        ws = 0x7fcac16a0ad0
        free_length = 4
        free_slots = {8, 9, 10, 11, 0, 0, 0, 0, 0, 0, -1058364471, 32714, 0, 0, -1049187848, 32714, 199, 0, -1058360763, 32714, 0, 0, 0, 0, 1492850448, 21897, 1410853888, 1240662886, 740468064, 32764, 1492901478, 21897}
        last_non_dead = 7
        total_non_dead = 8
#5  0x0000558959014718 in prefork_run (_pconf=0x7fcac176d028, plog=0x7fcac173a028, s=0x7fcac173e328) at prefork.c:1053
        status = 1493285039
        pid = {pid = -1, in = 0x49f303665417ec00, out = 0x55895901b8af, err = 0x7fcac16bc478}
        child_slot = -1058447294
        exitwhy = (APR_PROC_EXIT | unknown: 21896)
        processed_status = 32714
        index = 21897
        remaining_children_to_start = 0
        i = 32764
#6  0x0000558958fbd272 in ap_run_mpm (pconf=0x7fcac176d028, plog=0x7fcac173a028, s=0x7fcac173e328) at mpm_common.c:95
        pHook = 0x7fcac16d71d8
        n = 0
        rv = -1
#7  0x0000558958fb3653 in main (argc=3, argv=0x7ffc2c22a878) at main.c:819
        c = 0 '\000'
        showcompile = 0
        showdirectives = 0
        confname = 0x558959017f63 "conf/httpd.conf"
        def_server_root = 0x558959017f73 "/opt/apache"
        temp_error_log = 0x0
        error = 0x0
        process = 0x7fcac176f118
        pconf = 0x7fcac176d028
        plog = 0x7fcac173a028
        ptemp = 0x7fcac173c028
        pcommands = 0x7fcac1744028
        opt = 0x7fcac1744118
        rv = 0
        mod = 0x558959237b40 <ap_prelinked_modules+32>
        opt_arg = 0x6562b026 <error: Cannot access memory at address 0x6562b026>
        signal_server = 0x558958ffb2ee <ap_signal_server>
        rc = 0
Comment 3 Yann Ylavic 2021-10-12 12:13:22 UTC
Does this always imply mpm_itk?

I see that mpm_itk does this:

    ap_hook_process_connection(itk_fork_process, NULL, NULL,
                               APR_HOOK_REALLY_FIRST);

then:

int itk_fork_process(conn_rec *c)
{
    if (have_forked) {
         return DECLINED;
    }

    pid_t pid = fork(), child_pid;
    int status;
    switch (pid) {
    case -1:
        [...]
	return HTTP_INTERNAL_SERVER_ERROR;
    case 0:
        /* Child; runs processing as usual, then dies.
         * This is a bit tricky in that we need to run ap_run_process_connection()
         * even though we are a process_connection hook ourselves!
         * That is the only way we can exit cleanly after the hook
         * is done. Thus, we set have_forked to signal that we don't
         * want to end up in infinite recursion.
         */
        have_forked = 1;
        ap_close_listeners();
        ap_run_process_connection(c);
        ap_lingering_close(c);
	exit(0);
    default: /* parent; just wait for child to be done */
	do {
	    child_pid = waitpid(pid, &status, 0);
	} while (child_pid == -1 && errno == EINTR);
        [...]
        /*
	 * It is important that ap_lingering_close() is called in the child
	 * and not here, ...
	 * However, we close the socket itself here so that we don't keep a
	 * reference to it around, and then set the socket pointer to NULL so
	 * that when prefork tries to close it, it goes into early exit.
	 */
	apr_socket_close(ap_get_conn_socket(c));
	ap_set_core_module_config(c->conn_config, NULL);

        /* make sure the MPM does not process this connection */
	return OK;
    }
}

So:
    ap_set_core_module_config(c->conn_config, NULL);
is what sets csd to NULL in ap_lingering_close.
Comment 4 Ruediger Pluem 2021-10-12 12:20:55 UTC
Can you please issue the following gdb commands with this core dump:

frame 1
print *(c->conn_config)
print c->aborted

Furthermore an error log which is set to trace8 around the point of time the crash happens could be quite helpful and if possible the request that triggered it.

Can you also temporarily configure mod_info (http://httpd.apache.org/docs/2.4/mod/mod_info.html) for such a server and provide the output of the server info page?
Comment 5 Yann Ylavic 2021-10-12 12:23:59 UTC
(In reply to Yann Ylavic from comment #3)
> 
> So:
>     ap_set_core_module_config(c->conn_config, NULL);
> is what sets csd to NULL in ap_lingering_close.

But before r1891721, the apr_socket_close(csd) was protected by:

AP_DECLARE(int) ap_start_lingering_close(conn_rec *c)
{
    apr_socket_t *csd = ap_get_conn_socket(c);

    if (!csd) {
        return 1;
    }
    [...]
    return 0;
}

So I think that attachment 38061 [details] is the right thing to do, for compatibility.
Comment 6 Yann Ylavic 2021-10-12 12:29:39 UTC
(In reply to Yann Ylavic from comment #5)
> 
> So I think that attachment 38061 [details] is the right thing to do, for
> compatibility.

The other option is to change mpm_itk to:
-	apr_socket_close(ap_get_conn_socket(c));
-	ap_set_core_module_config(c->conn_config, NULL);
+       c->aborted = 1;
Comment 7 Ruediger Pluem 2021-10-12 12:40:26 UTC
(In reply to Yann Ylavic from comment #6)
> (In reply to Yann Ylavic from comment #5)
> > 
> > So I think that attachment 38061 [details] is the right thing to do, for
> > compatibility.
> 
> The other option is to change mpm_itk to:
> -	apr_socket_close(ap_get_conn_socket(c));
> -	ap_set_core_module_config(c->conn_config, NULL);
> +       c->aborted = 1;

The question is, how we want to allow if at all another module to say that we should get out of the way with regards to lingering closes. Do we allow to set the socket to NULL via  ap_set_core_module_config or do we demand that is has to set c->aborted to 1 as you suggest. Depending on this we could modify the proposed patch to either make it an assert that csd != NULL or log an error message in case csd == NULL as this is not expected. If setting the socket to NULL is the accepted way though, the patch proposed here is correct as it is.
Comment 8 Yann Ylavic 2021-10-12 13:01:44 UTC
(In reply to Ruediger Pluem from comment #7)
> 
> The question is, how we want to allow if at all another module to say that
> we should get out of the way with regards to lingering closes. Do we allow
> to set the socket to NULL via  ap_set_core_module_config or do we demand
> that is has to set c->aborted to 1 as you suggest.

Yeah indeed that's the question. Thinking more about it, c->aborted = 1 will still call the output filter chain so in the case of mod_itk it may cause issues (no request_rec in the forking/parent process).
We have supported the NULL socket so far so we probably still need to in 2.4.x, mpm_prefork (which mpm_itk is still requiring AFAICT) will call ap_lingering_close() after ap_process_connection() in any case, so it seems that NULL socket is the only safe option for third-party modules as of now.
Comment 9 Ruediger Pluem 2021-10-12 13:25:00 UTC
(In reply to Yann Ylavic from comment #8)
> (In reply to Ruediger Pluem from comment #7)
> > 
> > The question is, how we want to allow if at all another module to say that
> > we should get out of the way with regards to lingering closes. Do we allow
> > to set the socket to NULL via  ap_set_core_module_config or do we demand
> > that is has to set c->aborted to 1 as you suggest.
> 
> Yeah indeed that's the question. Thinking more about it, c->aborted = 1 will
> still call the output filter chain so in the case of mod_itk it may cause
> issues (no request_rec in the forking/parent process).
> We have supported the NULL socket so far so we probably still need to in
> 2.4.x, mpm_prefork (which mpm_itk is still requiring AFAICT) will call
> ap_lingering_close() after ap_process_connection() in any case, so it seems
> that NULL socket is the only safe option for third-party modules as of now.

Fair enough. Then I am fine with the patch.
Comment 10 Yann Ylavic 2021-10-12 16:57:36 UTC
I checked r1894171 in trunk but it would be useful to hear from Jean in comment 2.

Jean, your stack trace does not seem to involve mod_itk but simply mod_prefork, am I correct? Do you know which other module sets the socket to NULL?

The fix in trunk allows for the socket to be NULL in ap_lingering_close() but not in ap_start_lingering_close() which is called by mpm_event only (supposedly), so there I added an ap_assert() to catch this unexpected situation (it will kill the process should that happen), but it won't help if any module could be run by mpm_event and set the socket to NULL..
Comment 11 acmondor 2021-10-12 17:32:05 UTC
Here is the additional info requested. First the the gdb info:

Core was generated by `/usr/sbin/apache2 -D INFO -D SSL -D PHP -D MPM_ITK -D STATUS -D SECURITY -d /us'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f26d402fdac in apr_socket_close (thesocket=0x0) at network_io/unix/sockets.c:213
213         return apr_pool_cleanup_run(thesocket->pool, thesocket, socket_cleanup);
(gdb) 
(gdb) bt
#0  0x00007f26d402fdac in apr_socket_close (thesocket=0x0) at network_io/unix/sockets.c:213
#1  0x0000563794dd8292 in ap_lingering_close (c=0x563797134010) at connection.c:159
#2  0x0000563794df5829 in child_main (child_num_arg=4, child_bucket=0) at prefork.c:655
#3  0x0000563794df5a87 in make_child (s=0x563796e4e888, slot=4) at prefork.c:756
#4  0x0000563794df5ae7 in startup_children (number_to_start=1) at prefork.c:774
#5  0x0000563794df6159 in prefork_run (_pconf=0x563796e253c8, plog=0x563796e52608, s=0x563796e4e888) at prefork.c:936
#6  0x0000563794d9ba43 in ap_run_mpm (pconf=0x563796e253c8, plog=0x563796e52608, s=0x563796e4e888) at mpm_common.c:95
#7  0x0000563794d915fc in main (argc=19, argv=0x7ffcb763ef38) at main.c:819
(gdb) 
(gdb) frame 1
#1  0x0000563794dd8292 in ap_lingering_close (c=0x563797134010) at connection.c:159
159             apr_socket_close(csd);
(gdb) print *(c->conn_config)
$1 = <incomplete type>
(gdb) print *(c)
$2 = {pool = 0x563797133da8, base_server = 0x563796f456b0, vhost_lookup_data = 0x563796f56b18, local_addr = 0x563797133e70, client_addr = 0x563797133f30, client_ip = 0x563797134418 "192.168.1.13", 
  remote_host = 0x0, remote_logname = 0x0, local_ip = 0x563797134408 "192.168.1.13", local_host = 0x0, id = 4, conn_config = 0x5637971340e0, notes = 0x563797134268, input_filters = 0x563797134440, 
  output_filters = 0x5637971344b0, sbh = 0x563797131ef0, bucket_alloc = 0x563797138e58, cs = 0x0, data_in_input_filters = 0, data_in_output_filters = 0, clogging_input_filters = 0, 
  double_reverse = 0, aborted = 0, keepalive = AP_CONN_UNKNOWN, keepalives = 0, log = 0x0, log_id = 0x0, current_thread = 0x563797131e10, master = 0x0, outgoing = 0}
(gdb) print c->aborted
$3 = 0
(gdb) x/100xw 0x5637971340e0
0x5637971340e0: 0x00000000      0x00000000      0x00000000      0x00000000
0x5637971340f0: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134100: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134110: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134120: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134130: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134140: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134150: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134160: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134170: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134180: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134190: 0x00000000      0x00000000      0x00000000      0x00000000
0x5637971341a0: 0x00000000      0x00000000      0x00000000      0x00000000
0x5637971341b0: 0x00000000      0x00000000      0x00000000      0x00000000
0x5637971341c0: 0x00000000      0x00000000      0x00000000      0x00000000
0x5637971341d0: 0x00000000      0x00000000      0x00000000      0x00000000
0x5637971341e0: 0x97134428      0x00005637      0x00000000      0x00000000
0x5637971341f0: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134200: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134210: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134220: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134230: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134240: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134250: 0x00000000      0x00000000      0x00000000      0x00000000
0x563797134260: 0x00000000      0x00000000      0x97133da8      0x00005637

Even after looking through the httpd code I couldn't figure how to get around that '<incomplete type>' from 'print *(c->conn_config)', so I just did a memory dump.


Here is the trace8 error log:

[Tue Oct 12 10:54:04.886976 2021] [ELF1] [default] [default] [core:trace5] [pid 11324:tid protocol.c(711): [client 192.168.1.13:41666] Request received from client: GET / HTTP/1.1
[Tue Oct 12 10:54:04.887549 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [setenvif:trace4] [pid 11324:tid util_expr_eval.c(859): [client 192.168.1.13:41666] Evaluation of expression from /home/www.mac003/config/request_denied.include:84 gave: 0
[Tue Oct 12 10:54:04.887585 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11324:tid http_request.c(435): [client 192.168.1.13:41666] Headers received from client:
[Tue Oct 12 10:54:04.887599 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11324:tid http_request.c(438): [client 192.168.1.13:41666]   Host: test001.acmondor.ca
[Tue Oct 12 10:54:04.887611 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11324:tid http_request.c(438): [client 192.168.1.13:41666]   User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7
[Tue Oct 12 10:54:04.887624 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11324:tid http_request.c(438): [client 192.168.1.13:41666]   Accept: */*
[Tue Oct 12 10:54:04.887817 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [authz_core:debug] [pid 11324:tid mod_authz_core.c(815): [client 192.168.1.13:41666] AH01626: authorization result of Require all granted: granted
[Tue Oct 12 10:54:04.887834 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [authz_core:debug] [pid 11324:tid mod_authz_core.c(815): [client 192.168.1.13:41666] AH01626: authorization result of Require not env request_denied: neutral
[Tue Oct 12 10:54:04.887851 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [authz_core:debug] [pid 11324:tid mod_authz_core.c(815): [client 192.168.1.13:41666] AH01626: authorization result of <RequireAll>: granted
[Tue Oct 12 10:54:04.887864 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [authz_core:debug] [pid 11324:tid mod_authz_core.c(815): [client 192.168.1.13:41666] AH01626: authorization result of <RequireAny>: granted
[Tue Oct 12 10:54:04.887877 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [core:trace3] [pid 11324:tid request.c(360): [client 192.168.1.13:41666] request authorized without authentication by access_checker_ex hook: /
[Tue Oct 12 10:54:04.888102 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace3] [pid 11324:tid http_filters.c(1126): [client 192.168.1.13:41666] Response sent with status 200, headers:
[Tue Oct 12 10:54:04.888119 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace5] [pid 11324:tid http_filters.c(1135): [client 192.168.1.13:41666]   Date: Tue, 12 Oct 2021 16:54:04 GMT
[Tue Oct 12 10:54:04.888134 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace5] [pid 11324:tid http_filters.c(1138): [client 192.168.1.13:41666]   Server: Apache
[Tue Oct 12 10:54:04.888146 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11324:tid http_filters.c(956): [client 192.168.1.13:41666]   Last-Modified: Mon, 29 Jul 2019 20:18:21 GMT
[Tue Oct 12 10:54:04.888159 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11324:tid http_filters.c(956): [client 192.168.1.13:41666]   ETag: \\"2d-58ed797e5d022\\"
[Tue Oct 12 10:54:04.888171 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11324:tid http_filters.c(956): [client 192.168.1.13:41666]   Accept-Ranges: bytes
[Tue Oct 12 10:54:04.888183 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11324:tid http_filters.c(956): [client 192.168.1.13:41666]   Content-Length: 45
[Tue Oct 12 10:54:04.888194 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11324:tid http_filters.c(956): [client 192.168.1.13:41666]   Content-Type: text/html
[Tue Oct 12 10:54:04.888302 2021] [ELF1] [default] [core:trace6] [pid 11324:tid core_filters.c(810): [client 192.168.1.13:41666] writev_nonblocking: 256/256
[Tue Oct 12 10:54:05.533291 2021] [ELF1] [mac003] [core:trace4] [pid 11141:tid mpm_common.c(538): mpm child 11327 (gen 0/slot 3) started
[Tue Oct 12 10:54:06.534425 2021] [ELF1] [mac003] [core:notice] [pid 11141:tid AH00051: child pid 11148 exit signal Segmentation fault (11), possible coredump in /tmp
[Tue Oct 12 10:54:06.534478 2021] [ELF1] [mac003] [core:trace4] [pid 11141:tid mpm_common.c(538): mpm child 11148 (gen 0/slot 4) exited


I once again confirmed that the problem does away when mpm-itk is disabled. Here's what the error log looks like in that case:

[Tue Oct 12 11:00:43.710355 2021] [ELF1] [mac003] [core:debug] [pid 11878:tid log.c(1570): AH02639: Using SO_REUSEPORT: yes (1)
[Tue Oct 12 11:00:43.710372 2021] [ELF1] [mac003] [mpm_prefork:debug] [pid 11878:tid prefork.c(953): AH00165: Accept mutex: pthread (default: pthread)
[Tue Oct 12 11:00:55.470355 2021] [ELF1] [default] [default] [core:trace5] [pid 11881:tid protocol.c(711): [client 192.168.1.13:41770] Request received from client: GET / HTTP/1.1
[Tue Oct 12 11:00:55.471298 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [setenvif:trace4] [pid 11881:tid util_expr_eval.c(859): [client 192.168.1.13:41770] Evaluation of expression from /home/www.mac003/config/request_denied.include:84 gave: 0
[Tue Oct 12 11:00:55.471393 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11881:tid http_request.c(435): [client 192.168.1.13:41770] Headers received from client:
[Tue Oct 12 11:00:55.471408 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11881:tid http_request.c(438): [client 192.168.1.13:41770]   Host: test001.acmondor.ca
[Tue Oct 12 11:00:55.471418 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11881:tid http_request.c(438): [client 192.168.1.13:41770]   User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7
[Tue Oct 12 11:00:55.471431 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11881:tid http_request.c(438): [client 192.168.1.13:41770]   Accept: */*
[Tue Oct 12 11:00:55.471553 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [authz_core:debug] [pid 11881:tid mod_authz_core.c(815): [client 192.168.1.13:41770] AH01626: authorization result of Require all granted: granted
[Tue Oct 12 11:00:55.471570 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [authz_core:debug] [pid 11881:tid mod_authz_core.c(815): [client 192.168.1.13:41770] AH01626: authorization result of Require not env request_denied: neutral
[Tue Oct 12 11:00:55.471580 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [authz_core:debug] [pid 11881:tid mod_authz_core.c(815): [client 192.168.1.13:41770] AH01626: authorization result of <RequireAll>: granted
[Tue Oct 12 11:00:55.471590 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [authz_core:debug] [pid 11881:tid mod_authz_core.c(815): [client 192.168.1.13:41770] AH01626: authorization result of <RequireAny>: granted
[Tue Oct 12 11:00:55.471600 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [core:trace3] [pid 11881:tid request.c(360): [client 192.168.1.13:41770] request authorized without authentication by access_checker_ex hook: /
[Tue Oct 12 11:00:55.471845 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace3] [pid 11881:tid http_filters.c(1126): [client 192.168.1.13:41770] Response sent with status 200, headers:
[Tue Oct 12 11:00:55.471859 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace5] [pid 11881:tid http_filters.c(1135): [client 192.168.1.13:41770]   Date: Tue, 12 Oct 2021 17:00:55 GMT
[Tue Oct 12 11:00:55.471919 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace5] [pid 11881:tid http_filters.c(1138): [client 192.168.1.13:41770]   Server: Apache
[Tue Oct 12 11:00:55.471932 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11881:tid http_filters.c(956): [client 192.168.1.13:41770]   Last-Modified: Mon, 29 Jul 2019 20:18:21 GMT
[Tue Oct 12 11:00:55.471941 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11881:tid http_filters.c(956): [client 192.168.1.13:41770]   ETag: \\"2d-58ed797e5d022\\"
[Tue Oct 12 11:00:55.471952 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11881:tid http_filters.c(956): [client 192.168.1.13:41770]   Accept-Ranges: bytes
[Tue Oct 12 11:00:55.471962 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11881:tid http_filters.c(956): [client 192.168.1.13:41770]   Content-Length: 45
[Tue Oct 12 11:00:55.471971 2021] [ELF1] [test001.acmondor.ca] [test001.acmondor.ca] [http:trace4] [pid 11881:tid http_filters.c(956): [client 192.168.1.13:41770]   Content-Type: text/html
[Tue Oct 12 11:00:55.472135 2021] [ELF1] [default] [core:trace6] [pid 11881:tid core_filters.c(810): [client 192.168.1.13:41770] writev_nonblocking: 256/256

Will attach a file with the server info output.
Comment 12 acmondor 2021-10-12 17:34:59 UTC
Created attachment 38063 [details]
server info output
Comment 13 acmondor 2021-10-12 17:43:26 UTC
Created attachment 38064 [details]
server info output

The html output didn't come through properly, so here's the same info as plain text.
Comment 14 Ruediger Pluem 2021-10-12 18:42:48 UTC
Thanks. My understanding now is that this only happens with mpm-itk and the reason why this happens in this case seems to be clear from the analysis of Yann. His patch should fix this.
Comment 15 Ruediger Pluem 2021-10-12 18:46:17 UTC
(In reply to Yann Ylavic from comment #10)
> I checked r1894171 in trunk but it would be useful to hear from Jean in
> comment 2.
> 
> Jean, your stack trace does not seem to involve mod_itk but simply
> mod_prefork, am I correct? Do you know which other module sets the socket to
> NULL?

This remains somewhat a mystery. Hence the same data provided as for mpm-itk case would be very helpful

> 
> The fix in trunk allows for the socket to be NULL in ap_lingering_close()
> but not in ap_start_lingering_close() which is called by mpm_event only
> (supposedly), so there I added an ap_assert() to catch this unexpected
> situation (it will kill the process should that happen), but it won't help
> if any module could be run by mpm_event and set the socket to NULL..

I guess this should be fine. We may run into a lot of trouble with event MPM if people fiddle around with the socket on their own. Hence failing with the assert seems better then strange hard to debug other issues.
Comment 16 acmondor 2021-10-12 20:44:27 UTC
Created attachment 38065 [details]
error log with additional info from mpm_itk

Out of curiousity I added some additional error log statement to mod_itk to answer the question "why  stack trace does not seem to involve mod_itk". The result is shown in the attached log file. The lines with 'itk_fork_process:' are from the itk_fork_process function shown above in comment #3, their locations should be obvious.

From that, the reason that mod_itk is not in the stack trace is because its hook has run its course and returned to prefork_run in mpm-prefork.
Comment 17 acmondor 2021-10-12 20:57:07 UTC
Yann's patch works fine with httpd 2.4.51 on Gentoo.
Comment 18 Yann Ylavic 2021-10-13 09:49:09 UTC
(In reply to acmondor from comment #16)
> 
> From that, the reason that mod_itk is not in the stack trace is because its
> hook has run its course and returned to prefork_run in mpm-prefork.

Ah indeed, I misread the mpm_itk code and thought that ap_lingering_close() was also called explicitely in the parent process.

Thanks for testing, I'll propose a backport to 2.4.x ASAP.