Summary: | Hunging processes with FIN_WAIT2 and CLOSE_WAIT | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Piotr Staszeski <tech> |
Component: | mod_proxy | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | lothar.webmin |
Priority: | P2 | ||
Version: | 2.4.51 | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: |
finwait2.txt
Trace8 Log filtered by one PID, serving the Request IP-Addresses/hostnames anonymised |
Description
Piotr Staszeski
2021-10-28 15:53:19 UTC
Hello, Is there any news about this bug ? Can you post backtraces of the hung process? https://httpd.apache.org/dev/debugging.html#backtrace gdb httpd 27036 --batch --eval-command "where" --eval-command "thread apply all bt" --eval-command "detach" --eval-command "quit" warning: Could not load shared library symbols for 22 libraries, e.g. /httpd/lib/libssl.so.1.1. Use the "info sharedlibrary" command to see the complete listing. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x00007f4e0ac06f47 in accept4 () from /lib64/libc.so.6 #0 0x00007f4e0ac06f47 in accept4 () from /lib64/libc.so.6 #1 0x00007f4e0b757847 in ?? () #2 0x0000000001b7ff88 in ?? () #3 0x000000000000001a in ?? () #4 0x0000000001b99e28 in ?? () #5 0x0000000001b99408 in ?? () #6 0x0000000000000080 in ?? () #7 0x0000000001b99e28 in ?? () #8 0x0000000000e4e1c0 in ?? () #9 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f4e0c751740 (LWP 27036)): #0 0x00007f4e0ac06f47 in accept4 () from /lib64/libc.so.6 #1 0x00007f4e0b757847 in ?? () #2 0x0000000001b7ff88 in ?? () #3 0x000000000000001a in ?? () #4 0x0000000001b99e28 in ?? () #5 0x0000000001b99408 in ?? () #6 0x0000000000000080 in ?? () #7 0x0000000001b99e28 in ?? () #8 0x0000000000e4e1c0 in ?? () #9 0x0000000000000000 in ?? () [Inferior 1 (process 27036) detached] pstack 27036 #0 0x00007f4e0ac060c3 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x00007f4e0b758a59 in ?? () #2 0x00007ffcdd3b13fc in ?? () #3 0x00007ffcdd3b14a4 in ?? () #4 0x00007ffcdd3b14a8 in ?? () #5 0x0000000001bb9b98 in ?? () #6 0x0000000001bb9cd8 in ?? () #7 0x0000000001bb9b80 in ?? () #8 0xffffffffffffffff in ?? () #9 0x00007ffcdd3b14a4 in ?? () #10 0x0000000001bb9ca8 in ?? () #11 0x0000000001bb9cd8 in ?? () #12 0x00007ffcdd3b14a8 in ?? () #13 0x00007f4e092b1721 in ?? () #14 0x0000000001b99a98 in ?? () #15 0x0000000001bace78 in ?? () #16 0x0000000001b994c0 in ?? () #17 0x00007ffcdd3b1540 in ?? () #18 0x0000000001bb9cd8 in ?? () #19 0x00007f4e0b753283 in ?? () #20 0x0000000001bb7e20 in ?? () #21 0x0000000001b994c0 in ?? () #22 0x0000000001bb9ca0 in ?? () #23 0x0000000001bb9b80 in ?? () #24 0x0000000001bb9b38 in ?? () #25 0xffffffffffffffff in ?? () #26 0x00007ffcdd3b14a4 in ?? () #27 0x00007ffcdd3b14a8 in ?? () #28 0x0000000001b99040 in ?? () #29 0x0000000001bb9c40 in ?? () #30 0x0000000001b99040 in ?? () #31 0x0000000001bb9af8 in ?? () #32 0x0000000001b994c0 in ?? () #33 0x0000000001b99040 in ?? () #34 0x00007ffcdd3b15b0 in ?? () #35 0x0000000001bb7e20 in ?? () #36 0x0000000000440910 in ?? () at util_filter.c:593 #37 0x00007f4e09094f35 in ?? () #38 0x0000000001b7bc68 in ?? () #39 0x00007f4e0b7477d7 in ?? () #40 0x0000000000000000 in ?? () Does your distro provide debug symbols in a separate package for httpd and glibc? It would be helpful to apply them and re-run. (In reply to Piotr Staszeski from comment #5) > new Bug 65689 looks similar to this one Do you also use mod_proxy_connect or mod_proxy_wstunnel? (In reply to Yann Ylavic from comment #7) > (In reply to Piotr Staszeski from comment #5) > > new Bug 65689 looks similar to this one > > Do you also use mod_proxy_connect or mod_proxy_wstunnel? mod_proxy_connect *** Bug 65689 has been marked as a duplicate of this bug. *** > *** Bug 65689 has been marked as a duplicate of this bug. *** Re-asking here my question from https://bz.apache.org/bugzilla/show_bug.cgi?id=65689#c4 then: > Also the behavior has changed, with apache 2.4.46 the connection was closed > smoothly: > apache -> target ACK > target -> apache FIN,ACK > apache -> target ACK > apache -> target FIN,ACK > > With apache 2.4.51 the last FIN,ACK is not sent anymore but we receive > target -> apache RST, ACK after 10 sec. What happens to the connection on the client side when the target/backend server initiates connection close like the above? Could you please attach the logs with LogLevel trace8 for this scenario? Please see the discussion in bug 65631 too about connection (half-)closing in mod_proxy connect, and how it depends on both sides to close the connections before proxying stops. (In reply to Piotr Staszeski from comment #0) > We have tried to add "ProxyTimeout 120" to proxy.conf, but after > configuration reload there were no changes. The is r1894290 to honor ProxyTimeout in mod_proxy_connect, but it's not in 2.4 yet. (In reply to Piotr Staszeski from comment #0) > > tcpdump attached Missing attachment? If you attach the pcap, please make sure that it includes both the client and backend connections. Created attachment 38098 [details]
finwait2.txt
(In reply to Yann Ylavic from comment #13) > (In reply to Piotr Staszeski from comment #0) > > > > tcpdump attached > > Missing attachment? If you attach the pcap, please make sure that it > includes both the client and backend connections. unfortunately I do not have the client side as it's out of my control. Please not that not all connections ends in this state. I'll try to reproduce locally but it may take some time. (In reply to Piotr Staszeski from comment #14) > Created attachment 38098 [details] > finwait2.txt If I read this correctly, we have: client IP: 10.144.18.82 proxy IP: 10.19.8.160 backend IP: 192.168.6.84 >13:37:35.268129 IP 192.168.6.84.443 > 10.19.8.160.32998: Flags [F.], seq 4325, ack 759, win 14558, options [nop,nop,TS val 2815083039 ecr 1936729975], length 0 ^ here the backend initiates connection close to the proxy >13:37:35.268234 IP 10.19.8.160.8088 > 10.144.18.82.38247: Flags [F.], seq 4385, ack 869, win 505, options [nop,nop,TS val 3015137517 ecr 734631932], length 0 ^ here the proxy initiates connection close to the client >13:37:35.308773 IP 10.19.8.160.32998 > 192.168.6.84.443: Flags [.], ack 4326, win 501, options [nop,nop,TS val 1936735020 ecr 2815083039], length 0 ^ here the proxy acks the FIN from the backend >13:37:35.309014 IP 10.144.18.82.38247 > 10.19.8.160.8088: Flags [.], ack 4386, win 501, options [nop,nop,TS val 734636936 ecr 3015137517], length 0 ^ here the client acks the FIN from the proxy And now both the proxy and backend wait for the client FIN that fully terminates the connections (the connection is half closed only). Until this, there may still be data sent by the client that the proxy should forward to the backend. Created attachment 38099 [details]
Trace8 Log filtered by one PID, serving the Request IP-Addresses/hostnames anonymised
The Last Logentry related to this connection was at 08:35:32.
The Connection Status swtiched from Established to CLOSE_WAIT on 08:35:32
at 08:35:42 the Connection was closed by Remote System with TCP RST package.
I replaced the IP-Adresses and hostnames, because it is an productive system.
If you need more info I am at your disposal
Thanks a lot! KR, Lothar
(In reply to Yann Ylavic from comment #16) > (In reply to Piotr Staszeski from comment #14) > > Created attachment 38098 [details] > > finwait2.txt > > If I read this correctly, we have: > client IP: 10.144.18.82 > proxy IP: 10.19.8.160 > backend IP: 192.168.6.84 yes, thats correct. In 2.4.46 those hanged connections had been closed, I guess due to timeout (?), after upgrade to 2.4.51 nothing happens with them. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x00007f0f80252f47 in accept4 (fd=3, addr=addr@entry=..., addr_len=addr_len@entry=0x7fffa08ee050, flags=flags@entry=524288) at ../sysdeps/unix/sysv/linux/accept4.c:33 33 return INLINE_SYSCALL (accept4, 4, fd, addr.__sockaddr__, addr_len, flags); #0 0x00007f0f80252f47 in accept4 (fd=3, addr=addr@entry=..., addr_len=addr_len@entry=0x7fffa08ee050, flags=flags@entry=524288) at ../sysdeps/unix/sysv/linux/accept4.c:33 #1 0x00007f0f80da3847 in apr_socket_accept (new=new@entry=0x7fffa08ee138, sock=0xd10350, connection_context=0xd87dd8) at network_io/unix/sockets.c:268 #2 0x0000000000495d33 in ap_unixd_accept (accepted=0x7fffa08ee1d0, lr=0xd10310, ptrans=<optimized out>) at unixd.c:303 #3 0x00007f0f7fb35a48 in child_main (child_num_arg=child_num_arg@entry=56, child_bucket=child_bucket@entry=0) at prefork.c:632 #4 0x00007f0f7fb35cf0 in make_child (s=0xd149e0, slot=56) at prefork.c:756 #5 0x00007f0f7fb36ac6 in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:860 #6 prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1053 #7 0x000000000043e84e in ap_run_mpm (pconf=pconf@entry=0xceb138, plog=0xd1c398, s=0xd149e0) at mpm_common.c:95 #8 0x0000000000436f8b in main (argc=5, argv=0x7fffa08ee5c8) at main.c:819 Thread 1 (Thread 0x7f0f81d9d740 (LWP 20245)): #0 0x00007f0f80252f47 in accept4 (fd=3, addr=addr@entry=..., addr_len=addr_len@entry=0x7fffa08ee050, flags=flags@entry=524288) at ../sysdeps/unix/sysv/linux/accept4.c:33 #1 0x00007f0f80da3847 in apr_socket_accept (new=new@entry=0x7fffa08ee138, sock=0xd10350, connection_context=0xd87dd8) at network_io/unix/sockets.c:268 #2 0x0000000000495d33 in ap_unixd_accept (accepted=0x7fffa08ee1d0, lr=0xd10310, ptrans=<optimized out>) at unixd.c:303 #3 0x00007f0f7fb35a48 in child_main (child_num_arg=child_num_arg@entry=56, child_bucket=child_bucket@entry=0) at prefork.c:632 #4 0x00007f0f7fb35cf0 in make_child (s=0xd149e0, slot=56) at prefork.c:756 #5 0x00007f0f7fb36ac6 in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:860 #6 prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1053 #7 0x000000000043e84e in ap_run_mpm (pconf=pconf@entry=0xceb138, plog=0xd1c398, s=0xd149e0) at mpm_common.c:95 #8 0x0000000000436f8b in main (argc=5, argv=0x7fffa08ee5c8) at main.c:819 [Inferior 1 (process 20245) detached] pstack 20245 #0 0x00007f0f80252f47 in accept4 (fd=3, addr=addr@entry=..., addr_len=addr_len@entry=0x7fffa08ee050, flags=flags@entry=524288) at ../sysdeps/unix/sysv/linux/accept4.c:33 #1 0x00007f0f80da3847 in apr_socket_accept (new=new@entry=0x7fffa08ee138, sock=0xd10350, connection_context=0xd87dd8) at network_io/unix/sockets.c:268 #2 0x0000000000495d33 in ap_unixd_accept (accepted=0x7fffa08ee1d0, lr=0xd10310, ptrans=<optimized out>) at unixd.c:303 #3 0x00007f0f7fb35a48 in child_main (child_num_arg=child_num_arg@entry=56, child_bucket=child_bucket@entry=0) at prefork.c:632 #4 0x00007f0f7fb35cf0 in make_child (s=0xd149e0, slot=56) at prefork.c:756 #5 0x00007f0f7fb36ac6 in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:860 #6 prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1053 #7 0x000000000043e84e in ap_run_mpm (pconf=pconf@entry=0xceb138, plog=0xd1c398, s=0xd149e0) at mpm_common.c:95 #8 0x0000000000436f8b in main (argc=5, argv=0x7fffa08ee5c8) at main.c:819 (In reply to Lothar from comment #17) > Created attachment 38099 [details] > Trace8 Log filtered by one PID, serving the Request IP-Addresses/hostnames > anonymised > > The Last Logentry related to this connection was at 08:35:32. > The Connection Status swtiched from Established to CLOSE_WAIT on 08:35:32 > at 08:35:42 the Connection was closed by Remote System with TCP RST package. So below are the traces when the backend (10.1.1.1:443) initiates the shut down with the proxy which in turn shuts down with the client (10.0.0.1:51469) : >[Tue Nov 23 08:35:32.217620 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4880): [client 10.0.0.1:51469] proxy: CONNECT: polling (client=31, origin=31) >[Tue Nov 23 08:35:32.217637 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4903): [client 10.0.0.1:51469] AH10215: proxy: CONNECT: woken up, 1 result(s) >[Tue Nov 23 08:35:32.217648 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4911): [client 10.0.0.1:51469] proxy: CONNECT: #0: origin: 1/31 >[Tue Nov 23 08:35:32.217658 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4812): [client 10.0.0.1:51469] proxy: CONNECT: origin input ready >[Tue Nov 23 08:35:32.217703 2021] [proxy:trace2] [pid 35586448] proxy_util.c(4648): (70014)End of file found: [client 10.0.0.1:51469] ap_proxy_transfer_between_connections complete (from 10.1.1.1:443) >[Tue Nov 23 08:35:32.217718 2021] [proxy:trace3] [pid 35586448] proxy_util.c(4840): [client 10.0.0.1:51469] proxy: CONNECT: origin read shutdown >[Tue Nov 23 08:35:32.217729 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4880): [client 10.0.0.1:51469] proxy: CONNECT: polling (client=35, origin=0) >[Tue Nov 23 08:35:32.217745 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4903): [client 10.0.0.1:51469] AH10215: proxy: CONNECT: woken up, 1 result(s) >[Tue Nov 23 08:35:32.217755 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4911): [client 10.0.0.1:51469] proxy: CONNECT: #0: client: 4/35 >[Tue Nov 23 08:35:32.217766 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4945): [client 10.0.0.1:51469] proxy: CONNECT: client output ready >[Tue Nov 23 08:35:32.217777 2021] [proxy:trace3] [pid 35586448] proxy_util.c(4968): [client 10.0.0.1:51469] proxy: CONNECT: client write shutdown >[Tue Nov 23 08:35:32.217813 2021] [proxy:trace8] [pid 35586448] proxy_util.c(4880): [client 10.0.0.1:51469] proxy: CONNECT: polling (client=31, origin=0) Same as comment 16, the proxy is now waiting (indefinitely) for the client to shut down before shutting down the connection to the backend. So regarding the states of the connections on the proxy system, the client<>proxy connection is in FIN_WAIT2 and the proxy<>backend connection is in CLOSE_WAIT. The question is: why doesn't the client close (or reset) the connection when it receives the FIN? Until then the proxy cannot consider that the communication is complete and can only wait for some time that it happens, unfortunately proxy_connect does not set up a timeout for its connections (thus the infinite wait). The timeout handling has been added to mod_proxy_connect in trunk (r1894290) and will probably be in the next 2.4 versions, but in the current 2.4 releases I can't think of a way to abort the connection before the client's FIN/RST, this needs to be patched with r1894290. Possibly mod_proxy could also handle a setting like "SetEnv proxy-nohalfclose" (à la proxy-nokeepalive) to issue a full close (for both read and write on both the client and backend connections) whenever one side sends a FIN. Note that the half-close change was introduced in 2.4.47 (released in 2.4.48): *) mod_proxy: Improve tunneling loop to support half closed connections and pending data draining (for protocols like rsync). PR 61616. Before this mod_proxy was not working correctly for some protocols, so it's a fix really though for backward compatibility there could be an opt-out (for those that don't want to wait for the timeout..). *** Bug 65689 has been marked as a duplicate of this bug. *** In addition to r1894290 to handle ProxyTimeout in mod_proxy_connect, I have also checked in r1895304 which allows to configure something like "SetEnv proxy-nohalfclose" for mod_proxy_connect to leave as soon as a FIN is received on either side (like before 2.4.47). Could you please try to apply the patch [1] and add "SetEnv proxy-nohalfclose" somewhere in your VirtualHost to see if works as expected for you? [1] https://patch-diff.githubusercontent.com/raw/apache/httpd/pull/278.patch Hi Yann, now there are no more connections in "CLOSE_WAIT"-state. The connection between Client and Server still is in "FIN_WAIT_2" state for a while, but this is OK for us. Thanks a lot for your help! kr, Lothar Backported to v2.4.52. |