64452 – modproxy.tmp Files filling up /tmp after upgrade to 2.4.43

Bug 64452 - modproxy.tmp Files filling up /tmp after upgrade to 2.4.43

Summary: modproxy.tmp Files filling up /tmp after upgrade to 2.4.43

Status:	RESOLVED FIXED

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	mod_proxy_http (show other bugs)
Version:	2.4.43
Hardware:	PC Linux

Importance:	P2 major (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:	FixedInTrunk

Depends on:
Blocks:

Reported:	2020-05-19 13:02 UTC by friesoft@gmail.com
Modified:	2020-06-26 10:27 UTC (History)
CC List:	1 user (show)

Attachments
gdb procedure (2.94 KB, text/plain) 2020-05-19 19:27 UTC, Yann Ylavic	Details
gdb with next (24.21 KB, text/plain) 2020-05-28 10:29 UTC, Michael Haas	Details
gdb trace (26.20 KB, text/plain) 2020-05-28 15:00 UTC, Michael Haas	Details
gdb eor_bucket_destroy (7.33 KB, text/plain) 2020-05-28 18:00 UTC, Michael Haas	Details
gdb dump-pool (19.89 KB, text/plain) 2020-05-29 12:04 UTC, Michael Haas	Details
Preserve EOS in spool_reqbody_cl() (673 bytes, patch) 2020-05-29 16:17 UTC, Yann Ylavic	Details \| Diff
Preserve EOS in spool_reqbody_cl() [v2] (838 bytes, patch) 2020-05-29 16:21 UTC, Yann Ylavic	Details \| Diff
Preserve EOS in spool_reqbody_cl() [v3] (561 bytes, patch) 2020-05-29 16:24 UTC, Yann Ylavic	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description friesoft@gmail.com 2020-05-19 13:02:43 UTC

After upgrading from Apache httpd 2.4.41 to 2.4.43 (RHEL 7, self compiled) we are encountering problems with /tmp partition filling up rapidly because of many "modproxy.tmp.*" files lingering.

The problem is encountered using ProxyPass with http as proxy target and doing POST requests (payload size doesn't really matter, but it is really nice posting an ISO image for testing, filling up /tmp).

We can not enable proxy-sendchunked because the application we are proxying to has problems with it as soon as we enable the flag. This was the recommendation of this article: https://access.redhat.com/solutions/3217891

Using systemd with PrivateTmp enabled the files are cleaned up on restart of the unit. If using "legacy" sysv init Scripts we need to stop httpd, clean up /tmp and start up again as httpd still holds a file handle open to every modproxy.tmp.* file.

This leads to gigabytes of /tmp usage after some days..

My guess would be that this change is the source of the problem as it was the only related change in a while:
https://github.com/apache/httpd/commit/f7d35dc166732bff0e55d1509202d76d53f8c270#diff-037f73ad4a026d77ac9c31a007388308

Comment 1 Yann Ylavic 2020-05-19 13:57:21 UTC

Any particular message in error_log file (like "child pid xxx exit signal..")?

Comment 2 friesoft@gmail.com 2020-05-19 14:01:56 UTC

No nothing in error log. Only as disk was already full:

[Tue May 12 07:17:10.399683 2020] [proxy_http:error] [pid 31956:tid 139971851650816] (20014)Internal error (specific information not available): [client 192.168.1.47:35042] AH01089: search for temporary directory failed, referer https://ourcooldomain.com/contextpath "000000000000000000000000097511ac-7cd4-5eba3156-bc7f0700-918e3e0ddcf9"

[Tue May 12 07:38:48.303085 2020] [proxy_http:error] [pid 31956:tid 139971960755968] (28)No space left on device: [client 192.168.1.47:34974] AH01091: write to temporary file /tmp/modproxy.tmp.E9nczB failed, referer https://ourcooldomain.com/contextpath "000000000000000000000000097511ac-7cd4-5eba3668-c2ffd700-0709374c64de"

Comment 3 Yann Ylavic 2020-05-19 14:10:51 UTC

Did you also look at the main error log? I don't know if RHEL uses a single error log file or one per vhost (plus the main one), so I ask..

Comment 4 friesoft@gmail.com 2020-05-19 14:26:55 UTC

This is from the "main" errorlog. As already stated we are using a self compiled Apache. There is nothing related in the vhost error logs..

Comment 5 Yann Ylavic 2020-05-19 14:54:13 UTC

I can only see a reason for the file to not be cleaned up: another request cleanup that runs before and fails (killing the cleanup chain). 

Which modules are in use?

Comment 6 Yann Ylavic 2020-05-19 14:57:00 UTC

Also, since you can reproduce easily (I can't), can you run a debugger with some instructions?

Comment 7 friesoft@gmail.com 2020-05-19 15:18:06 UTC

Here are the modules loaded (apachectl -M):
Loaded Modules:
 core_module (static)
 so_module (static)
 http_module (static)
 logio_module (shared)
 mime_magic_module (shared)
 expires_module (shared)
 headers_module (shared)
 unique_id_module (shared)
 proxy_module (shared)
 proxy_connect_module (shared)
 proxy_http_module (shared)
 proxy_ajp_module (shared)
 ssl_module (shared)
 vhost_alias_module (shared)
 rewrite_module (shared)
 unixd_module (shared)
 authz_core_module (shared)
 authz_host_module (shared)
 authz_groupfile_module (shared)
 dir_module (shared)
 authz_user_module (shared)
 setenvif_module (shared)
 alias_module (shared)
 log_config_module (shared)
 authn_core_module (shared)
 authn_file_module (shared)
 mime_module (shared)
 socache_shmcb_module (shared)
 socache_dbm_module (shared)
 slotmem_shm_module (shared)
 auth_basic_module (shared)
 include_module (shared)
 env_module (shared)
 status_module (shared)
 mpm_event_module (shared)
 jk_module (shared)
 sm_module (shared)

- 

Yes what shall I do to debug the issue? Shall I enable some higher LogLevel too?

Comment 8 Yann Ylavic 2020-05-19 19:27:06 UTC

Created attachment 37252 [details]
gdb procedure

Here are the gdb steps for me to reach the cleanup point, does it work the same for you?

Comment 9 Yann Ylavic 2020-05-19 19:33:43 UTC

Comment on attachment 37252 [details]
gdb procedure

>(gdb) run
>Starting program: /path/to/httpd -f /path/to/httpd.conf -X
>[...]
># here httpd is waiting for input, run the from the browser or curl terminal
^typo:
# here httpd is waiting for input, send the request from the browser or curl from another terminal

Comment 10 friesoft@gmail.com 2020-05-20 06:58:19 UTC

Is it also possible to attach to a running httpd? I don't have root permission on the server so I can only start apache using sudo with our systemd unit. Is it sufficient to connect to the parent pid or do I need to connect to the right worker pid?

Comment 11 Ruediger Pluem 2020-05-20 07:26:15 UTC

(In reply to Yann Ylavic from comment #5)
> I can only see a reason for the file to not be cleaned up: another request
> cleanup that runs before and fails (killing the cleanup chain). 

Why should a failing cleanup kill the cleanup chain? If the cleanups are run by apr_pool_clear or apr_pool_destroy no return code is checked and the cleanup functions in the chain are all executed regardless of their return value.

Comment 12 Yann Ylavic 2020-05-20 08:27:34 UTC

(In reply to Bernhard Friedreich from comment #10)
> Is it also possible to attach to a running httpd? I don't have root
> permission on the server so I can only start apache using sudo with our
> systemd unit. Is it sufficient to connect to the parent pid or do I need to
> connect to the right worker pid?

You could attach to a worker pid, but I wouldn't recommend that on a production server anyway, it's going to catch all the requests passing through which will be hard to debug and disturb your production.

Can you run "strace" on a worker pid? Something like "strace -p <pid> 2>/tmp/strace.out", but it may require sudo too..
If it works, let it run a little while to see if unlink("/tmp/modproxy.tmp.xxx") gets called (and succeeds).

Comment 13 Yann Ylavic 2020-05-20 08:43:18 UTC

(In reply to Ruediger Pluem from comment #11)
> Why should a failing cleanup kill the cleanup chain? If the cleanups are run
> by apr_pool_clear or apr_pool_destroy no return code is checked and the
> cleanup functions in the chain are all executed regardless of their return
> value.

Yeah, I don't know why I thought that..

I can't see any reason for that file stay then, it should be cleaned up with the request (i.e. always), and there seems to be no child crash.

(In reply to Bernhard Friedreich from comment #0)
> 
> Using systemd with PrivateTmp enabled the files are cleaned up on restart of
> the unit. If using "legacy" sysv init Scripts we need to stop httpd, clean
> up /tmp and start up again as httpd still holds a file handle open to every
> modproxy.tmp.* file.

Is there something special on your system that prevent opened file to be removed? It shouldn't be an issue on unix systems usually.

Comment 14 friesoft@gmail.com 2020-05-20 11:22:19 UTC

(In reply to Yann Ylavic from comment #13)

> Is there something special on your system that prevent opened file to be
> removed? It shouldn't be an issue on unix systems usually.

Between updating from 2.4.41 and 2.4.43 nothing changed on those servers so I guess it has nothing to do wit anything special on our servers..

 
(In reply to Yann Ylavic from comment #12)
> You could attach to a worker pid, but I wouldn't recommend that on a
> production server anyway, it's going to catch all the requests passing
> through which will be hard to debug and disturb your production.
>

I can already reproduce the problem on one of our test environments so I should be able to do this.. worst case I can even also get root there.


(In reply to Yann Ylavic from comment #12)
> Can you run "strace" on a worker pid? Something like "strace -p <pid>
> 2>/tmp/strace.out", but it may require sudo too..
> If it works, let it run a little while to see if
> unlink("/tmp/modproxy.tmp.xxx") gets called (and succeeds).

I'll try this later but I guess it has nothing to do with unlink not working and more to do that those files simply weren't created <= 2.4.41..

Comment 15 friesoft@gmail.com 2020-05-22 10:00:31 UTC

I can now reproduce the problem in my local vm with a more or less stock centos7 and our self compiled apache.

Running gdb the interesting thing is that file_cleanup is reached for apr-tmp but never for modproxy.tmp. Sadly there seems to be a mismatch in line numbers to source - even if the same versions (apr 1.7.0 and apr-util 1.6.1) are used..

Breakpoint 1, ap_proxy_http_prefetch (url=<optimized out>, uri=0x7fffbc010d18, req=<optimized out>) at mod_proxy_http.c:807
807             rv = spool_reqbody_cl(req, &bytes);
(gdb)
Continuing.

Breakpoint 2, apr_file_mktemp (fp=fp@entry=0x7fffe4a63958, template=0x7fffbc00ed80 "/tmp/apr-tmp.XXXXXX", flags=flags@entry=0, p=p@entry=0x7fffbc009f18) at file_io/unix/mktemp.c:177
177     {
(gdb) c
Continuing.

Breakpoint 3, apr_unix_file_cleanup (thefile=0x7fffbc00ed98) at file_io/unix/open.c:80
80          rv = file_cleanup(file, 0);
(gdb) c
Continuing.

Breakpoint 2, apr_file_mktemp (fp=fp@entry=0x7fffe4a63b30, template=0x7fffbc00ee50 "/tmp/modproxy.tmp.XXXXXX", flags=flags@entry=0, p=p@entry=0x7fffbc009f18) at file_io/unix/mktemp.c:177
177     {
(gdb) c
Continuing.

Comment 16 friesoft@gmail.com 2020-05-22 10:26:22 UTC

I've also tried via strace using those arguments:

strace -o /root/process_dump -ff /path/to/apache/bin/httpd -f /path/to/apache/conf/httpd.conf -X

The only unlinks I could find where those:
[root@devbox1 ~]# grep -Hrn "unlink" process_dump*
process_dump.12069:5441:unlink("/path/to/apachelogs/jk-runtime-status.12069.lock") = 0
process_dump.12069:5442:unlink("/path/to/apachelogs/jk-runtime-status.12069") = 0
process_dump.12135:156:unlink("/tmp/apr-tmp.gYvqb8")           = 0

This is the mode using which modproxy.tmp File was opened
open("/tmp/modproxy.tmp.My4PnF", O_RDWR|O_CREAT|O_EXCL, 0600) = 23

Comment 17 friesoft@gmail.com 2020-05-22 11:41:15 UTC

I've just reconfirmed in my local vm that the modproxy.tmp files are cleaned up using httpd 2.4.41 but not with 2.4.43.

Only differences:
httpd: 2.4.41
mod_jk: 1.2.46

vs

httpd: 2.4.43
mod_jk: 1.2.48

Common:
APR: 1.7.0
APR-UTIL: 1.6.1

As the problem happens using ProxyPass on http and has nothing to do with mod_jk this has to be some bug in the new 2.4.43 version..

Comment 18 friesoft@gmail.com 2020-05-22 11:45:53 UTC

Thats how our httpd is compiled:

./configure \
  --prefix=/opt \
  --enable-so \
  --disable-userdir \
  --enable-cache=shared \
  --enable-cgi=shared \
  --enable-expires=shared \
  --enable-headers=shared \
  --enable-logio=shared \
  --enable-mem-cache=shared \
  --enable-mime-magic=shared \
  --enable-nonportable-atomics=yes \
  --enable-proxy=shared \
  --enable-proxy-http=shared \
  --enable-rewrite=shared \
  --enable-ssl=shared \
  --enable-unique-id=shared \
  --enable-usertrack=shared \
  --enable-vhost-alias=shared \
  --enable-mpms-shared='event worker' \
  --with-included-apr \
  --with-included-apr-util \
  --with-ssl=/opt/openssl-1.1.1f

Comment 19 friesoft@gmail.com 2020-05-22 12:29:57 UTC

Looks like this bug can be closed.

The problems seems to occur from a behavior change in 2.4.43 which is problematic for the CA SSO WebAgent module.
As soon as I enable the mod_sm module modproxy.tmp files are created (and never deleted). With the module disabled those files are never even created. Tried it using a fedora iso image for something really big => no modproxy.tmp file.

Thanks for your help and sorry for wasting your time..

Comment 20 friesoft@gmail.com 2020-05-27 10:01:56 UTC

Looks like the problem also happens on other httpd instances without this CA SSO WebAgent module.
One possible source of the problem in this new case could be the modsecurity module but this is still to be verified.

Looks like there is an incompatible change in 2.4.43 which causes problems with some modules..

Comment 21 Michael Haas 2020-05-27 22:29:44 UTC

With mod_security it is the same 2.4.41 deletes the tmp file with 2.4.43 the file is left in use on the filesystem.

here the gdb output from 2.4.41+2.4.43

=> 2.4.41

(gdb) break apr_file_mktemp
Breakpoint 2 at 0x7ffff74f6a64: file file_io/unix/mktemp.c, line 182.
(gdb) break file_cleanup
Breakpoint 3 at 0x7ffff74f6b6f: file file_io/unix/open.c, line 31.
(gdb) continue
Continuing.

Breakpoint 2, apr_file_mktemp (fp=0x7fffa9781740, template=0x7fffa4036228 "/tmp/apr-tmp.XXXXXX", flags=0, p=0x7fffa401f8a8) at file_io/unix/mktemp.c:182
182                            APR_FOPEN_DELONCLOSE : flags;
(gdb) continue
Continuing.

Breakpoint 3, file_cleanup (file=0x7fffa4036240, is_child=0) at file_io/unix/open.c:31
31          apr_status_t rv = APR_SUCCESS;
(gdb) print *file
$1 = {pool = 0x7fffa401f8a8, filedes = 22, fname = 0x7fffa40362b8 "/tmp/apr-tmp.Z3okA8", flags = 2375, eof_hit = 0, is_pipe = 0, timeout = -1, buffered = 0,
  blocking = BLK_UNKNOWN, ungetchar = -1, buffer = 0x0, bufpos = 0, bufsize = 0, dataRead = 0, direction = 0, filePtr = 0, thlock = 0x0}
(gdb) continue
Continuing.

Breakpoint 2, apr_file_mktemp (fp=0x7fffa9781838, template=0x7fffa40362f8 "/tmp/modproxy.tmp.XXXXXX", flags=0, p=0x7fffa401f8a8) at file_io/unix/mktemp.c:182
182                            APR_FOPEN_DELONCLOSE : flags;
(gdb) continue
Continuing.

Breakpoint 3, file_cleanup (file=0x7fffa4036318, is_child=0) at file_io/unix/open.c:31
31          apr_status_t rv = APR_SUCCESS;
(gdb) print *file
$2 = {pool = 0x7fffa401f8a8, filedes = 22, fname = 0x7fffa4036390 "/tmp/modproxy.tmp.qCDiSk", flags = 2375, eof_hit = 0, is_pipe = 0, timeout = -1, buffered = 0,
  blocking = BLK_UNKNOWN, ungetchar = -1, buffer = 0x0, bufpos = 0, bufsize = 0, dataRead = 0, direction = 0, filePtr = 0, thlock = 0x0}
(gdb) next
32          int fd = file->filedes;
(gdb) next
37          file->filedes = -1;
(gdb) next
39          if (close(fd) == 0) {
(gdb) next
41              if (!is_child && (file->flags & APR_FOPEN_DELONCLOSE)) {
(gdb) next
42                  unlink(file->fname);
(gdb) next
45              if (file->thlock) {


=> 2.4.43


(gdb) break apr_file_mktemp
Breakpoint 2 at 0x7ffff74f6a64: file file_io/unix/mktemp.c, line 182.
(gdb) break file_cleanup
Breakpoint 3 at 0x7ffff74f6b6f: file file_io/unix/open.c, line 31.
(gdb) continue
Continuing.

Breakpoint 2, apr_file_mktemp (fp=0x7fffaa783730, template=0x7fffa4036310 "/tmp/apr-tmp.XXXXXX", flags=0, p=0x7fffa401f898) at file_io/unix/mktemp.c:182
182                            APR_FOPEN_DELONCLOSE : flags;
(gdb) continue
Continuing.

Breakpoint 3, file_cleanup (file=0x7fffa4036328, is_child=0) at file_io/unix/open.c:31
31          apr_status_t rv = APR_SUCCESS;
(gdb) print *file
$1 = {pool = 0x7fffa401f898, filedes = 23, fname = 0x7fffa40363a0 "/tmp/apr-tmp.Vn2UKe", flags = 2375, eof_hit = 0, is_pipe = 0, timeout = -1, buffered = 0,
  blocking = BLK_UNKNOWN, ungetchar = -1, buffer = 0x0, bufpos = 0, bufsize = 0, dataRead = 0, direction = 0, filePtr = 0, thlock = 0x0}
(gdb) continue
Continuing.

Breakpoint 2, apr_file_mktemp (fp=0x7fffaa783828, template=0x7fffa40363e0 "/tmp/modproxy.tmp.XXXXXX", flags=0, p=0x7fffa401f898) at file_io/unix/mktemp.c:182
182                            APR_FOPEN_DELONCLOSE : flags;
(gdb) continue
Continuing.


any ideas?

Comment 22 Rainer Jung 2020-05-27 23:23:32 UTC

Could it be this

https://github.com/SpiderLabs/ModSecurity/pull/2049

mod_security bug?

Commkits are:

https://github.com/SpiderLabs/ModSecurity/commit/d33f4ebc3fc3dcfbbc54113c76d2f3ed5a950598

and

https://github.com/SpiderLabs/ModSecurity/commit/18b47dd5ff99cb96e2d7535b8b50d35e2066b1d9

I haven't checked, whether the file removal is a cleanup in the global pool, but I remember that this modsecurity bug orrupted cleanup lists. Was mod_security also used in the original setup?

Regards,

Rainer

Comment 23 Ruediger Pluem 2020-05-28 07:14:07 UTC

The mod_proxy temporary files are created from the request pool (r->pool).

Comment 24 friesoft@gmail.com 2020-05-28 08:35:02 UTC

(In reply to Rainer Jung from comment #22)
> Was mod_security also used in the original setup?

Do you mean my setup with "original setup"? 
No we aren't using mod_security.

So we now have multiple modules affected by this problem...

Anything else we can do to help nail down the issue?

Comment 25 Yann Ylavic 2020-05-28 08:54:49 UTC

Bernhard, could you please "step" into apr_file_mktemp when reaching the breakpoint (file "/tmp/modproxy.tmp.XXXXXX"), and once in hit "next" for the entire apr_file_mktemp function?

Comment 26 Michael Haas 2020-05-28 10:29:35 UTC

Created attachment 37273 [details]
gdb with next

Comment 27 Yann Ylavic 2020-05-28 12:37:59 UTC

Thanks Michael, unfortunately there are multiple threads being scheduled in your gdb trace, like here:
>504                     apr_bucket_read(e, &data, &bytes_read, APR_BLOCK_READ);
>(gdb) next
>[Switching to Thread 0x7fffb2794700 (LWP 125295)]
>
>Breakpoint 3, file_cleanup (file=0x7fffe81112b8, is_child=0) at file_io/unix/open.c:31
So the rest does not correspond.


If you are using the gdb procedure from attachment 37252 [details], at this point where the spool_reqbody_cl breakpoint is hit:
>Thread 5 "httpd" hit Breakpoint 1, spool_reqbody_cl (req=0x7fffe00044d0, bytes_spooled=0x7ffff59d9910) at mod_proxy_http.c:432
>432	    apr_pool_t *p = req->p;

instead of:
>(gdb) break apr_file_mktemp
>Breakpoint 2 at 0x7ffff7ea1a7f: file file_io/unix/mktemp.c, line 182.
>(gdb) break file_cleanup
>Breakpoint 3 at 0x7ffff7ea1b8a: file file_io/unix/open.c, line 31.

please do this:
(gdb) delete 1 # no need for spool_reqbody_cl breakpoint anymore
(gdb) set scheduler-locking on
(gdb) break apr_file_mktemp thread 5
Breakpoint 2 at ...
(gdb) break file_cleanup thread 5
Breakpoint 3 ...

Note that "thread 5" here corresponds to "Thread 5" hit above at spool_reqbody_cl, it may be a different number in your case.
Now apr_file_mktemp and file_cleanup breakpoints will be hit only for this thread, no more "[Switching to Thread]".

Then you can:
(gdb) continue

Once you hit the apr_file_mktemp breakpoint (for "/tmp/modproxy.tmp.XXXXXX") you don't need to "step" in since the previous trace shows it already, you can simply "continue", but if the file_cleanup breakpoint is not hit (like it happens for Bernhard) it could be interesting to restart the procedure and do "next" from here until the end of the request (note that it can be a lot of "next", but the last command in gdb can be repeated by simply hitting enter, so you don't need to type "next" more than once, then just enter..).


Thanks!

Comment 28 Yann Ylavic 2020-05-28 13:04:26 UTC

(In reply to Michael Haas from comment #21)
> => 2.4.43
> [...]
> Breakpoint 2, apr_file_mktemp (fp=0x7fffaa783828, template=0x7fffa40363e0
> "/tmp/modproxy.tmp.XXXXXX", flags=0, p=0x7fffa401f898) at file_io/unix/mktemp.c:182
> 182                            APR_FOPEN_DELONCLOSE : flags;

I don't see how file_cleanup is not called from this point, unless the request is not destroyed finally.

Can you (or Bernhard) add a breakpoint into ap_process_request_after_handler from here, like this ("xxx" is the thread number as explained in comment 27):
(gdb) break ap_process_request_after_handler thread xxx
(gdb) continue

Once the ap_process_request_after_handler breakpoint is hit, please "step" in the ap_pass_brigade call there, and then do "next" but for this line:
        return next->frec->filter_func.out_func(next, bb);
where "step" is needed to enter each filter.

Thanks.

Comment 29 Yann Ylavic 2020-05-28 13:08:34 UTC

(In reply to Yann Ylavic from comment #28)
> 
> Once the ap_process_request_after_handler breakpoint is hit, please "step"
> in the ap_pass_brigade call there,

Actually, once in ap_process_request_after_handler, it's better to:
(gdb) break ap_pass_brigade thread xxx
(gdb) continue

> and then do "next" but for this line:
>         return next->frec->filter_func.out_func(next, bb);
> where "step" is needed to enter each filter.

Still relevant.

Comment 30 Michael Haas 2020-05-28 15:00:07 UTC

Created attachment 37274 [details]
gdb trace

made the trace, hopefully i got the commands in the proper order.
I also can't recreate the problem with small files (i need to upload a 16MB file). In Production there are also much smaller files.
I was only able to reproduce with curl => apache+mod_sec+mod_proxy => netcat

Comment 31 Yann Ylavic 2020-05-28 16:13:29 UTC

(In reply to Michael Haas from comment #30)
> made the trace, hopefully i got the commands in the proper order.

Yes, thanks a lot. You just missed to "step" in the second call to:
        return next->frec->filter_func.out_func(next, bb);
but it was really easy to miss after all the steps and diving into httpd/AP internals :)

That was instructive though and I could see that the only ap_core_output_filter is involved.

Let's do it differently this time if you wish. Same procedure until the ap_process_request_after_handler breakpoint is hit, then instead of:
>(gdb) break ap_pass_brigade thread xxx
>(gdb) continue
please do:
(gdb) break eor_bucket_destroy thread xxx
(gdb) continue

And once in eor_bucket_destroy is reached (if ever), please do:
(gdb) backtrace
and then "next" until the end of the function, and "continue" eventually to see if some breakpoint fires (if so "backtrace" and "next"s again).

Comment 32 Yann Ylavic 2020-05-28 16:21:54 UTC

(In reply to Yann Ylavic from comment #31)
> And once in eor_bucket_destroy is reached (if ever), please do:
> (gdb) backtrace
> and then "next" until the end of the function

In eor_bucket_destroy there is a call to:
            apr_pool_destroy(r->pool);

It is the root of the request cleanup, and that's where the breakpoint to file_cleanup should fire when you "next" through this call. If it does not happen, let's see...

Comment 33 Michael Haas 2020-05-28 18:00:57 UTC

Created attachment 37277 [details]
gdb eor_bucket_destroy

no breakpoint after first backtrace

Comment 34 Yann Ylavic 2020-05-28 20:27:12 UTC

OK, how strange :/

Please copy this gdb init file (https://svn.apache.org/viewvc/httpd/httpd/trunk/.gdbinit?view=co) to your $HOME directory (i.e. "/root" if you run gdb as root), such that gdb loads httpd debugging scripts at startup.

Now when the breakpoint for apr_file_mktemp fires (for "/tmp/modproxy.tmp.XXXXXX"), do "next" until the function returns to spool_reqbody_cl and then type:

(gdb) dump_pool_and_children r->pool
[shows interesting output]
(gdb) break eor_bucket_destroy thread xxx
(gdb) continue

When the eor_bucket_destroy breakpoint fires, go "next" up to (but not including):
            apr_pool_destroy(r->pool);
and then:
(gdb) dump_pool_and_children r->pool
[shows interesting output, to compare]

Please post the results here.

Comment 35 Yann Ylavic 2020-05-28 20:43:46 UTC

If you downloaded the script already, please redo because there was a bug in dump_pool_and_children which I just fixed.

Comment 36 Michael Haas 2020-05-29 12:04:59 UTC

Created attachment 37280 [details]
gdb dump-pool

Comment 37 Yann Ylavic 2020-05-29 14:02:53 UTC

Thanks. Something dropped apr_unix_file_cleanup from the list in between apr_file_mktemp and eor_bucket_destroy, without running it.

I think that the file is set aside when being forwarded to the backend (so the culprit would be apr_file_setaside), such that its lifetime is now the the one of the backend connection.

It still should be deleted at some point though, let me (in)validate this assumption..

Comment 38 Yann Ylavic 2020-05-29 16:17:00 UTC

Created attachment 37281 [details]
Preserve EOS in spool_reqbody_cl()

Could you please try this patch?

Actually the fix belongs in apr_file_mktemp (see r1878279), so in APR lib, though the bug is triggered by a change in mod_proxy_http.

This patch should work around the bug then..

Comment 39 Yann Ylavic 2020-05-29 16:21:00 UTC

Created attachment 37282 [details]
Preserve EOS in spool_reqbody_cl() [v2]

Small change to take an improbable setting into account.

Comment 40 Yann Ylavic 2020-05-29 16:24:15 UTC

Created attachment 37283 [details]
Preserve EOS in spool_reqbody_cl() [v3]

Yet another update to remove my debugging code..

Comment 41 Michael Haas 2020-05-29 16:54:27 UTC

Thanks Yann, no leftover tmp files anymore.

Comment 42 Yann Ylavic 2020-05-29 17:08:36 UTC

Thanks for helping with debug and tests Michael.

Patch checked in trunk (r1878280), will propose a backport to 2.4.x ASAP.

Comment 43 friesoft@gmail.com 2020-05-29 17:23:14 UTC

Thank you for the support and the lightning fast fix!
Thanks also to Michael for taking the time to debug the issue further :)
Looking forward to 2.4.44.

Comment 44 Yann Ylavic 2020-06-26 10:27:25 UTC

Backported to 2.4, will be in next release.