Bug 65662

Summary: Hunging processes with FIN_WAIT2 and CLOSE_WAIT
Product: Apache httpd-2 Reporter: Piotr Staszeski <tech>
Component: mod_proxyAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED FIXED    
Severity: normal CC: lothar.webmin
Priority: P2    
Version: 2.4.51   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: finwait2.txt
Trace8 Log filtered by one PID, serving the Request IP-Addresses/hostnames anonymised

Description Piotr Staszeski 2021-10-28 15:53:19 UTC
After upgrade from Apache/2.4.46 to 2.4.51 we have noticed multiple leftover processes for proxyied connections with in FIN_WAIT2 state towords proxy and CLOSE_WAIT from proxy to external service. The process does nothing and seems to be hunging

tcpdump attached 

We have tried to add "ProxyTimeout 120" to proxy.conf, but after configuration reload there were no changes. During ProxyTimeout test we have found that FIN_WAIT2 happened on first webservice passing - it means that second test we had in mind, MaxConnectionsPerChild 100, makes no sense
Comment 1 Piotr Staszeski 2021-11-21 17:48:15 UTC
Hello, Is there any news about this bug ?
Comment 2 Eric Covener 2021-11-22 14:28:39 UTC
Can you post backtraces of the hung process?

https://httpd.apache.org/dev/debugging.html#backtrace
Comment 3 Piotr Staszeski 2021-11-22 14:48:17 UTC
gdb  httpd  27036 --batch --eval-command "where" --eval-command "thread apply all bt" --eval-command "detach"  --eval-command "quit" 

warning: Could not load shared library symbols for 22 libraries, e.g. /httpd/lib/libssl.so.1.1.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f4e0ac06f47 in accept4 () from /lib64/libc.so.6
#0  0x00007f4e0ac06f47 in accept4 () from /lib64/libc.so.6
#1  0x00007f4e0b757847 in ?? ()
#2  0x0000000001b7ff88 in ?? ()
#3  0x000000000000001a in ?? ()
#4  0x0000000001b99e28 in ?? ()
#5  0x0000000001b99408 in ?? ()
#6  0x0000000000000080 in ?? ()
#7  0x0000000001b99e28 in ?? ()
#8  0x0000000000e4e1c0 in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f4e0c751740 (LWP 27036)):
#0  0x00007f4e0ac06f47 in accept4 () from /lib64/libc.so.6
#1  0x00007f4e0b757847 in ?? ()
#2  0x0000000001b7ff88 in ?? ()
#3  0x000000000000001a in ?? ()
#4  0x0000000001b99e28 in ?? ()
#5  0x0000000001b99408 in ?? ()
#6  0x0000000000000080 in ?? ()
#7  0x0000000001b99e28 in ?? ()
#8  0x0000000000e4e1c0 in ?? ()
#9  0x0000000000000000 in ?? ()
[Inferior 1 (process 27036) detached]
Comment 4 Piotr Staszeski 2021-11-22 14:49:21 UTC
pstack 27036
#0  0x00007f4e0ac060c3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x00007f4e0b758a59 in ?? ()
#2  0x00007ffcdd3b13fc in ?? ()
#3  0x00007ffcdd3b14a4 in ?? ()
#4  0x00007ffcdd3b14a8 in ?? ()
#5  0x0000000001bb9b98 in ?? ()
#6  0x0000000001bb9cd8 in ?? ()
#7  0x0000000001bb9b80 in ?? ()
#8  0xffffffffffffffff in ?? ()
#9  0x00007ffcdd3b14a4 in ?? ()
#10 0x0000000001bb9ca8 in ?? ()
#11 0x0000000001bb9cd8 in ?? ()
#12 0x00007ffcdd3b14a8 in ?? ()
#13 0x00007f4e092b1721 in ?? ()
#14 0x0000000001b99a98 in ?? ()
#15 0x0000000001bace78 in ?? ()
#16 0x0000000001b994c0 in ?? ()
#17 0x00007ffcdd3b1540 in ?? ()
#18 0x0000000001bb9cd8 in ?? ()
#19 0x00007f4e0b753283 in ?? ()
#20 0x0000000001bb7e20 in ?? ()
#21 0x0000000001b994c0 in ?? ()
#22 0x0000000001bb9ca0 in ?? ()
#23 0x0000000001bb9b80 in ?? ()
#24 0x0000000001bb9b38 in ?? ()
#25 0xffffffffffffffff in ?? ()
#26 0x00007ffcdd3b14a4 in ?? ()
#27 0x00007ffcdd3b14a8 in ?? ()
#28 0x0000000001b99040 in ?? ()
#29 0x0000000001bb9c40 in ?? ()
#30 0x0000000001b99040 in ?? ()
#31 0x0000000001bb9af8 in ?? ()
#32 0x0000000001b994c0 in ?? ()
#33 0x0000000001b99040 in ?? ()
#34 0x00007ffcdd3b15b0 in ?? ()
#35 0x0000000001bb7e20 in ?? ()
#36 0x0000000000440910 in ?? () at util_filter.c:593
#37 0x00007f4e09094f35 in ?? ()
#38 0x0000000001b7bc68 in ?? ()
#39 0x00007f4e0b7477d7 in ?? ()
#40 0x0000000000000000 in ?? ()
Comment 5 Piotr Staszeski 2021-11-22 14:59:26 UTC
new Bug 65689 looks similar to this one
Comment 6 Eric Covener 2021-11-22 15:06:44 UTC
Does your distro provide debug symbols in a separate package for httpd and glibc? It would be helpful to apply them and re-run.
Comment 7 Yann Ylavic 2021-11-22 15:12:03 UTC
(In reply to Piotr Staszeski from comment #5)
> new Bug 65689 looks similar to this one

Do you also use mod_proxy_connect or mod_proxy_wstunnel?
Comment 8 Piotr Staszeski 2021-11-22 15:14:28 UTC
(In reply to Yann Ylavic from comment #7)
> (In reply to Piotr Staszeski from comment #5)
> > new Bug 65689 looks similar to this one
> 
> Do you also use mod_proxy_connect or mod_proxy_wstunnel?

mod_proxy_connect
Comment 9 Yann Ylavic 2021-11-22 15:19:40 UTC
*** Bug 65689 has been marked as a duplicate of this bug. ***
Comment 10 Yann Ylavic 2021-11-22 15:26:55 UTC
> *** Bug 65689 has been marked as a duplicate of this bug. ***

Re-asking here my question from https://bz.apache.org/bugzilla/show_bug.cgi?id=65689#c4 then:

> Also the behavior has changed, with apache 2.4.46 the connection was closed
> smoothly:
> apache -> target ACK
> target -> apache FIN,ACK
> apache -> target ACK
> apache -> target FIN,ACK
> 
> With apache 2.4.51 the last FIN,ACK is not sent anymore but we receive 
> target -> apache RST, ACK after 10 sec.

What happens to the connection on the client side when the target/backend server initiates connection close like the above?

Could you please attach the logs with LogLevel trace8 for this scenario?
Comment 11 Yann Ylavic 2021-11-22 15:38:45 UTC
Please see the discussion in bug 65631 too about connection (half-)closing in mod_proxy connect, and how it depends on both sides to close the connections before proxying stops.
Comment 12 Yann Ylavic 2021-11-22 15:55:04 UTC
(In reply to Piotr Staszeski from comment #0)
> We have tried to add "ProxyTimeout 120" to proxy.conf, but after
> configuration reload there were no changes.

The is r1894290 to honor ProxyTimeout in mod_proxy_connect, but it's not in 2.4 yet.
Comment 13 Yann Ylavic 2021-11-22 16:01:04 UTC
(In reply to Piotr Staszeski from comment #0)
> 
> tcpdump attached 

Missing attachment? If you attach the pcap, please make sure that it includes both the client and backend connections.
Comment 14 Piotr Staszeski 2021-11-22 16:17:16 UTC
Created attachment 38098 [details]
finwait2.txt
Comment 15 Piotr Staszeski 2021-11-22 16:18:26 UTC
(In reply to Yann Ylavic from comment #13)
> (In reply to Piotr Staszeski from comment #0)
> > 
> > tcpdump attached 
> 
> Missing attachment? If you attach the pcap, please make sure that it
> includes both the client and backend connections.

unfortunately I do not have the client side as it's out of my control. Please not that not all connections ends in this state. I'll try to reproduce locally but it may take some time.
Comment 16 Yann Ylavic 2021-11-22 16:57:36 UTC
(In reply to Piotr Staszeski from comment #14)
> Created attachment 38098 [details]
> finwait2.txt

If I read this correctly, we have:
client IP: 10.144.18.82
proxy IP: 10.19.8.160
backend IP: 192.168.6.84

>13:37:35.268129 IP 192.168.6.84.443 > 10.19.8.160.32998: Flags [F.], seq 4325, ack 759, win 14558, options [nop,nop,TS val 2815083039 ecr 1936729975], length 0
^ here the backend initiates connection close to the proxy

>13:37:35.268234 IP 10.19.8.160.8088 > 10.144.18.82.38247: Flags [F.], seq 4385, ack 869, win 505, options [nop,nop,TS val 3015137517 ecr 734631932], length 0
^ here the proxy initiates connection close to the client

>13:37:35.308773 IP 10.19.8.160.32998 > 192.168.6.84.443: Flags [.], ack 4326, win 501, options [nop,nop,TS val 1936735020 ecr 2815083039], length 0
^ here the proxy acks the FIN from the backend

>13:37:35.309014 IP 10.144.18.82.38247 > 10.19.8.160.8088: Flags [.], ack 4386, win 501, options [nop,nop,TS val 734636936 ecr 3015137517], length 0
^ here the client acks the FIN from the proxy

And now both the proxy and backend wait for the client FIN that fully terminates the connections (the connection is half closed only). Until this, there may still be data sent by the client that the proxy should forward to the backend.
Comment 17 Lothar 2021-11-23 08:07:19 UTC
Created attachment 38099 [details]
Trace8 Log filtered by one PID, serving the Request IP-Addresses/hostnames anonymised

The Last Logentry related to this connection was at 08:35:32.
The Connection Status swtiched from Established to CLOSE_WAIT on 08:35:32
at 08:35:42 the Connection was closed by Remote System with TCP RST package.
I replaced the IP-Adresses and hostnames, because it is an productive system.

If you need more info I am at your disposal
Thanks a lot! KR, Lothar
Comment 18 Piotr Staszeski 2021-11-23 08:11:04 UTC
(In reply to Yann Ylavic from comment #16)
> (In reply to Piotr Staszeski from comment #14)
> > Created attachment 38098 [details]
> > finwait2.txt
> 
> If I read this correctly, we have:
> client IP: 10.144.18.82
> proxy IP: 10.19.8.160
> backend IP: 192.168.6.84

yes, thats correct. 
In 2.4.46 those hanged connections had been closed, I guess due to timeout (?), after upgrade to 2.4.51 nothing happens with them.
Comment 19 Piotr Staszeski 2021-11-23 11:21:23 UTC
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f0f80252f47 in accept4 (fd=3, addr=addr@entry=..., addr_len=addr_len@entry=0x7fffa08ee050, flags=flags@entry=524288) at ../sysdeps/unix/sysv/linux/accept4.c:33
33          return INLINE_SYSCALL (accept4, 4, fd, addr.__sockaddr__, addr_len, flags);
#0  0x00007f0f80252f47 in accept4 (fd=3, addr=addr@entry=..., addr_len=addr_len@entry=0x7fffa08ee050, flags=flags@entry=524288) at ../sysdeps/unix/sysv/linux/accept4.c:33
#1  0x00007f0f80da3847 in apr_socket_accept (new=new@entry=0x7fffa08ee138, sock=0xd10350, connection_context=0xd87dd8) at network_io/unix/sockets.c:268
#2  0x0000000000495d33 in ap_unixd_accept (accepted=0x7fffa08ee1d0, lr=0xd10310, ptrans=<optimized out>) at unixd.c:303
#3  0x00007f0f7fb35a48 in child_main (child_num_arg=child_num_arg@entry=56, child_bucket=child_bucket@entry=0) at prefork.c:632
#4  0x00007f0f7fb35cf0 in make_child (s=0xd149e0, slot=56) at prefork.c:756
#5  0x00007f0f7fb36ac6 in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:860
#6  prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1053
#7  0x000000000043e84e in ap_run_mpm (pconf=pconf@entry=0xceb138, plog=0xd1c398, s=0xd149e0) at mpm_common.c:95
#8  0x0000000000436f8b in main (argc=5, argv=0x7fffa08ee5c8) at main.c:819

Thread 1 (Thread 0x7f0f81d9d740 (LWP 20245)):
#0  0x00007f0f80252f47 in accept4 (fd=3, addr=addr@entry=..., addr_len=addr_len@entry=0x7fffa08ee050, flags=flags@entry=524288) at ../sysdeps/unix/sysv/linux/accept4.c:33
#1  0x00007f0f80da3847 in apr_socket_accept (new=new@entry=0x7fffa08ee138, sock=0xd10350, connection_context=0xd87dd8) at network_io/unix/sockets.c:268
#2  0x0000000000495d33 in ap_unixd_accept (accepted=0x7fffa08ee1d0, lr=0xd10310, ptrans=<optimized out>) at unixd.c:303
#3  0x00007f0f7fb35a48 in child_main (child_num_arg=child_num_arg@entry=56, child_bucket=child_bucket@entry=0) at prefork.c:632
#4  0x00007f0f7fb35cf0 in make_child (s=0xd149e0, slot=56) at prefork.c:756
#5  0x00007f0f7fb36ac6 in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:860
#6  prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1053
#7  0x000000000043e84e in ap_run_mpm (pconf=pconf@entry=0xceb138, plog=0xd1c398, s=0xd149e0) at mpm_common.c:95
#8  0x0000000000436f8b in main (argc=5, argv=0x7fffa08ee5c8) at main.c:819
[Inferior 1 (process 20245) detached]
Comment 20 Piotr Staszeski 2021-11-23 11:22:11 UTC
pstack 20245
#0  0x00007f0f80252f47 in accept4 (fd=3, addr=addr@entry=..., addr_len=addr_len@entry=0x7fffa08ee050, flags=flags@entry=524288) at ../sysdeps/unix/sysv/linux/accept4.c:33
#1  0x00007f0f80da3847 in apr_socket_accept (new=new@entry=0x7fffa08ee138, sock=0xd10350, connection_context=0xd87dd8) at network_io/unix/sockets.c:268
#2  0x0000000000495d33 in ap_unixd_accept (accepted=0x7fffa08ee1d0, lr=0xd10310, ptrans=<optimized out>) at unixd.c:303
#3  0x00007f0f7fb35a48 in child_main (child_num_arg=child_num_arg@entry=56, child_bucket=child_bucket@entry=0) at prefork.c:632
#4  0x00007f0f7fb35cf0 in make_child (s=0xd149e0, slot=56) at prefork.c:756
#5  0x00007f0f7fb36ac6 in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:860
#6  prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1053
#7  0x000000000043e84e in ap_run_mpm (pconf=pconf@entry=0xceb138, plog=0xd1c398, s=0xd149e0) at mpm_common.c:95
#8  0x0000000000436f8b in main (argc=5, argv=0x7fffa08ee5c8) at main.c:819
Comment 21 Yann Ylavic 2021-11-23 17:27:01 UTC
(In reply to Lothar from comment #17)
> Created attachment 38099 [details]
> Trace8 Log filtered by one PID, serving the Request IP-Addresses/hostnames
> anonymised
> 
> The Last Logentry related to this connection was at 08:35:32.
> The Connection Status swtiched from Established to CLOSE_WAIT on 08:35:32
> at 08:35:42 the Connection was closed by Remote System with TCP RST package.

So below are the traces when the backend (10.1.1.1:443) initiates the shut down with the proxy which in turn shuts down with the client (10.0.0.1:51469) :

>[Tue Nov 23 08:35:32.217620 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4880): [client 10.0.0.1:51469] proxy: CONNECT: polling (client=31, origin=31)
>[Tue Nov 23 08:35:32.217637 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4903): [client 10.0.0.1:51469] AH10215: proxy: CONNECT: woken up, 1 result(s)
>[Tue Nov 23 08:35:32.217648 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4911): [client 10.0.0.1:51469] proxy: CONNECT: #0: origin: 1/31
>[Tue Nov 23 08:35:32.217658 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4812): [client 10.0.0.1:51469] proxy: CONNECT: origin input ready
>[Tue Nov 23 08:35:32.217703 2021] [proxy:trace2] [pid 35586448] proxy_util.c(4648): (70014)End of file found: [client 10.0.0.1:51469] ap_proxy_transfer_between_connections complete (from 10.1.1.1:443)
>[Tue Nov 23 08:35:32.217718 2021] [proxy:trace3] [pid 35586448] proxy_util.c(4840): [client 10.0.0.1:51469] proxy: CONNECT: origin read shutdown
>[Tue Nov 23 08:35:32.217729 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4880): [client 10.0.0.1:51469] proxy: CONNECT: polling (client=35, origin=0)
>[Tue Nov 23 08:35:32.217745 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4903): [client 10.0.0.1:51469] AH10215: proxy: CONNECT: woken up, 1 result(s)
>[Tue Nov 23 08:35:32.217755 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4911): [client 10.0.0.1:51469] proxy: CONNECT: #0: client: 4/35
>[Tue Nov 23 08:35:32.217766 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4945): [client 10.0.0.1:51469] proxy: CONNECT: client output ready
>[Tue Nov 23 08:35:32.217777 2021] [proxy:trace3] [pid 35586448] proxy_util.c(4968): [client 10.0.0.1:51469] proxy: CONNECT: client write shutdown
>[Tue Nov 23 08:35:32.217813 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4880): [client 10.0.0.1:51469] proxy: CONNECT: polling (client=31, origin=0)

Same as comment 16, the proxy is now waiting (indefinitely) for the client to shut down before shutting down the connection to the backend.
So regarding the states of the connections on the proxy system, the client<>proxy connection is in FIN_WAIT2 and the proxy<>backend connection is in CLOSE_WAIT.
The question is: why doesn't the client close (or reset) the connection when it receives the FIN?
Until then the proxy cannot consider that the communication is complete and can only wait for some time that it happens, unfortunately proxy_connect does not set up a timeout for its connections (thus the infinite wait).

The timeout handling has been added to mod_proxy_connect in trunk (r1894290) and will probably be in the next 2.4 versions, but in the current 2.4 releases I can't think of a way to abort the connection before the client's FIN/RST, this needs to be patched with r1894290.

Possibly mod_proxy could also handle a setting like "SetEnv proxy-nohalfclose" (à la proxy-nokeepalive) to issue a full close (for both read and write on both the client and backend connections) whenever one side sends a FIN.
Note that the half-close change was introduced in 2.4.47 (released in 2.4.48):

  *) mod_proxy: Improve tunneling loop to support half closed connections and
     pending data draining (for protocols like rsync). PR 61616.

Before this mod_proxy was not working correctly for some protocols, so it's a fix really though for backward compatibility there could be an opt-out (for those that don't want to wait for the timeout..).
Comment 22 Yann Ylavic 2021-11-23 17:28:00 UTC
*** Bug 65689 has been marked as a duplicate of this bug. ***
Comment 23 Yann Ylavic 2021-11-25 14:01:54 UTC
In addition to r1894290 to handle ProxyTimeout in mod_proxy_connect, I have also checked in r1895304 which allows to configure something like "SetEnv proxy-nohalfclose" for mod_proxy_connect to leave as soon as a FIN is received on either side (like before 2.4.47).

Could you please try to apply the patch [1] and add "SetEnv proxy-nohalfclose" somewhere in your VirtualHost to see if works as expected for you?

[1] https://patch-diff.githubusercontent.com/raw/apache/httpd/pull/278.patch
Comment 24 Lothar 2021-11-30 06:19:38 UTC
Hi Yann,
now there are no more connections in "CLOSE_WAIT"-state.
The connection between Client and Server still is in "FIN_WAIT_2" state for a while, but this is OK for us.
Thanks a lot for your help!
kr, Lothar
Comment 25 Graham Leggett 2021-12-10 13:26:47 UTC
Backported to v2.4.52.