Bug 61616 - mod_proxy_connect: stall and connection loss on bi-directional traffic
Summary: mod_proxy_connect: stall and connection loss on bi-directional traffic
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_connect (show other bugs)
Version: 2.4.25
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-14 10:09 UTC by Carsten Wolff
Modified: 2019-08-02 22:31 UTC (History)
2 users (show)



Attachments
default vhost with proxy_connect conf (302 bytes, text/plain)
2017-10-14 10:09 UTC, Carsten Wolff
Details
rsync daemon config (262 bytes, text/plain)
2017-10-14 10:10 UTC, Carsten Wolff
Details
script for reproducing the bug (300 bytes, text/plain)
2017-10-14 10:11 UTC, Carsten Wolff
Details
Handle POLLOUT in proxy_connect (14.76 KB, patch)
2017-10-14 20:06 UTC, Yann Ylavic
Details | Diff
Handle POLLOUT in proxy_connect (v2) (14.97 KB, patch)
2017-10-14 20:31 UTC, Yann Ylavic
Details | Diff
Handle POLLOUT in proxy_connect (v3) (16.10 KB, patch)
2017-10-15 08:30 UTC, Yann Ylavic
Details | Diff
File read buffer and flush threshold tuning (35.86 KB, patch)
2017-10-15 12:46 UTC, Yann Ylavic
Details | Diff
trace6 error log of a single test (83.80 KB, application/octet-stream)
2017-10-16 13:49 UTC, Carsten Wolff
Details
Handle POLLOUT in proxy_connect (v4) (18.74 KB, patch)
2017-10-17 08:42 UTC, Yann Ylavic
Details | Diff
shortening the poll timeout for send_brigade_blocking (3.11 KB, patch)
2017-10-17 08:46 UTC, Carsten Wolff
Details | Diff
Handle POLLOUT in proxy_connect (v5) (19.28 KB, patch)
2017-10-17 09:07 UTC, Yann Ylavic
Details | Diff
suppress poll with long timeout in filter code (10.05 KB, patch)
2017-10-19 12:35 UTC, Carsten Wolff
Details | Diff
suppress poll with long timeout in filter code (v2) (8.08 KB, patch)
2017-10-19 15:28 UTC, Carsten Wolff
Details | Diff
Handle POLLOUT in proxy_connect (v6) (22.14 KB, patch)
2017-10-21 13:00 UTC, Yann Ylavic
Details | Diff
Read buffer size and flush threshold tuning (42.25 KB, patch)
2017-10-22 12:52 UTC, Yann Ylavic
Details | Diff
Handle POLLOUT in proxy_connect (v7) (22.51 KB, patch)
2017-11-10 10:01 UTC, Yann Ylavic
Details | Diff
Handle POLLOUT in proxy_connect (v8) (22.65 KB, patch)
2017-11-10 10:26 UTC, Yann Ylavic
Details | Diff
Read buffer size and flush threshold tuning (v2) (43.68 KB, patch)
2017-12-20 13:57 UTC, Yann Ylavic
Details | Diff
Handle POLLOUT in proxy_connect (v9) (25.88 KB, patch)
2017-12-20 14:01 UTC, Yann Ylavic
Details | Diff
Handle POLLOUT in proxy_connect (v10) (25.85 KB, patch)
2017-12-20 14:18 UTC, Yann Ylavic
Details | Diff
pcaps of a stall in rsync handshake (Test4) (3.89 KB, application/octet-stream)
2018-01-02 22:12 UTC, Carsten Wolff
Details
fixes a busy loop trying to shutdown a disconnected socket (1.34 KB, patch)
2018-02-22 09:17 UTC, Carsten Wolff
Details | Diff
Handle POLLOUT in proxy_connect (v11) (26.72 KB, patch)
2018-02-23 17:34 UTC, Yann Ylavic
Details | Diff
rsync timeout: pcap and httpd log (with v11) (238.17 KB, application/octet-stream)
2018-02-23 17:46 UTC, Yann Ylavic
Details
try draining input buffers (10.01 KB, patch)
2018-03-15 21:42 UTC, Carsten Wolff
Details | Diff
patch for fixing the problem of remaining data in input ctx (4.54 KB, patch)
2018-03-16 09:23 UTC, Mohamed El Housseine
Details | Diff
patch for fixing the problem of remaining data in input ctx v2 (3.15 KB, patch)
2018-03-16 13:51 UTC, Mohamed El Housseine
Details | Diff
Handle POLLOUT in proxy_connect (v12) (27.90 KB, patch)
2018-03-18 19:39 UTC, Yann Ylavic
Details | Diff
Incremental patch from v11 to v12 (7.30 KB, patch)
2018-03-18 19:40 UTC, Yann Ylavic
Details | Diff
log of a single process entering a busy loop (4.92 KB, application/octet-stream)
2018-03-19 15:25 UTC, Carsten Wolff
Details
patch for fixing the problem of remaining data in input ctx v3 (4.20 KB, patch)
2018-03-19 17:29 UTC, Mohamed El Housseine
Details | Diff
Handle POLLOUT in proxy_connect (v13) (27.75 KB, patch)
2018-03-20 14:05 UTC, Yann Ylavic
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carsten Wolff 2017-10-14 10:09:36 UTC
Created attachment 35421 [details]
default vhost with proxy_connect conf

When proxy_connect is used for taffic that is less of a request-response nature, it can happen, that the send buffer on the backend socket gets full. In that case, in libapr, when the writev() comes back with EAGAIN, it will block in poll() on the backend socket with a 300-second timeout. No read from any of the two sockets can happen anymore in that situation. Thus, depending on the application protocol that is tunneled through CONNECT, the backend application will then fillup the receive buffer at proxy_connect's backend socket and then stall trying to send indefinitely. Now proxy_connect and the backend application are in a deadlock.

A concrete setup to reproduce it (using debian stretch):

rsync -------------------------> apache ---------------> rsync --daemon
        CONNECT localhost:873

I will attach configuration and a script to this report and can also provide a qemu image, if desired.

In this setup the stall looks like this:

----------------------------- rsync ----------------------------------------

# RSYNC_PROXY=localhost:80 rsync -rcP rsync://localhost:873/testsrc/ /testdst/
[..]
5269
              0   0%    0.00kB/s    0:00:00  
[ .. 5 minute delay .. ]
rsync: connection unexpectedly closed (403374822 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [receiver=3.1.2]
rsync: connection unexpectedly closed (242925 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [generator=3.1.2]

----------------------------- netstat --------------------------------------

# netstat -pnt | grep -v :22
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    

tcp6       0         0 ::1:43642               ::1:80                  ESTABLISHED 21638/rsync
tcp6  169814         0 ::1:80                  ::1:43642               ESTABLISHED 20628/apache2
tcp6  5974823  2608576 ::1:43798               ::1:873                 ESTABLISHED 20628/apache2
tcp6  809935   2760413 ::1:873                 ::1:43798               ESTABLISHED 21639/rsync

----------------------------- strace --------------------------------------

[..]
read(10, "\24\372\342\1\16\255\223\375G\213k\251\215\360\357}\311\275\202\5jQ>\341>\370 jr\357,\4"..., 8000) = 8000    <---- fd 10 is client socket
writev(11, [{iov_base= [..] ., iov_len=7424}], 16) = -1 EAGAIN (Resource temporarily unavailable)                      <---- fd 11 is backend socket
poll([{fd=11, events=POLLOUT}], 1, 300000)
 <detached ...>

----------------------------- gdb -----------------------------------------

(gdb) bt
#0  0x00007f37a88c4690 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f37a8dc5088 in apr_poll () from target:/usr/lib/x86_64-linux-gnu/libapr-1.so.0
#2  0x00005572f5f781cd in send_brigade_blocking (c=0x7f37a95ad9b0, bytes_written=0x7f37a95ade68, bb=0x7f37a95a1068, s=0x7f37a95a6c28) at core_filters.c:747
#3  ap_core_output_filter (f=0x7f37a95add88, new_bb=0x7f37a95a1068) at core_filters.c:542
#4  0x00007f37a58fac77 in ap_proxy_transfer_between_connections (r=r@entry=0x7f37a95a50a0, c_i=0x7f37a95ad290, c_o=0x7f37a95ad9b0, bb_i=<optimized out>, bb_o=0x7f37a95a1068, name=0x7f37a56e39dd "client", sent=0x0, 
    bsize=8192, after=1) at proxy_util.c:3836
#5  0x00007f37a56e31c3 in proxy_connect_handler (r=0x7f37a95a50a0, worker=<optimized out>, conf=<optimized out>, url=<optimized out>, proxyname=<optimized out>, proxyport=<optimized out>) at mod_proxy_connect.c:421
Comment 1 Carsten Wolff 2017-10-14 10:10:43 UTC
Created attachment 35422 [details]
rsync daemon config
Comment 2 Carsten Wolff 2017-10-14 10:11:36 UTC
Created attachment 35423 [details]
script for reproducing the bug
Comment 3 Carsten Wolff 2017-10-14 17:49:16 UTC
One more remark: I selected version 2.4.25 in the bug header, because I used that (debian) version for the reproducer. Effected versions are actually all versions, including 2.2 branch, 2.4 branch as well as trunk.
Comment 4 Yann Ylavic 2017-10-14 20:06:13 UTC
Created attachment 35424 [details]
Handle POLLOUT in proxy_connect

Could you please try this patch on latest 2.4.x?
Comment 5 Yann Ylavic 2017-10-14 20:31:49 UTC
Created attachment 35425 [details]
Handle POLLOUT in proxy_connect (v2)

Small update to the patch to keep POLLOUT until all data are out.
Please test this one instead.
Comment 6 Carsten Wolff 2017-10-15 06:17:44 UTC
First of all, thanks for the very quick turnaround!

Unfortunately, the patch doesn't fix the problem. Actually, I have played with similar approaches for changing proxy_connect_handler() before, like doing a poll() for POLLOUT on the backend upon every POLLIN event from the client before actually calling ap_proxy_transfer_between_connections(), or like prefering the direction "backend to client", when both sockets issue POLLIN events.
But, no change to the poll() logic in mod_proxy_connect.c worked, because it's not there, where the poll() blocks. The whole situation can build up during one call of ap_proxy_transfer_between_connections(), when rsync daemon keeps sending long enough without reading, while rsync client is also sending. In that case apache will always block in apr/network_io/unix/sendrecv.c without a chance to return to mod_proxy again for another poll on the other socket.
Is there a chance to parallelize the two data directions through threading or something?
Comment 7 Yann Ylavic 2017-10-15 08:30:55 UTC
Created attachment 35426 [details]
Handle POLLOUT in proxy_connect (v3)

We have to stop reading on the other side when writing blocks on one side.
How does this new version work?
Comment 8 Carsten Wolff 2017-10-15 10:04:52 UTC
(In reply to Yann Ylavic from comment #7)
> We have to stop reading on the other side when writing blocks on one side.
> How does this new version work?

Not better, I'm afraid:

Thread 2 (Thread 0x7f534bfff700 (LWP 31402)):
#0  0x00007f5354d316ad in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f5355232088 in apr_poll () from target:/usr/lib/x86_64-linux-gnu/libapr-1.so.0
#2  0x000055df6ab6c17d in send_brigade_blocking (c=0x7f53559b6a98, bytes_written=0x7f53559b6fe0, bb=0x7f53559b3ee0, s=0x7f53559b3c48) at core_filters.c:747
#3  ap_core_output_filter (f=0x7f53559b6e70, new_bb=0x7f53559b3ee0) at core_filters.c:542
#4  0x00007f5351d60da7 in ap_proxy_transfer_between_connections (r=r@entry=0x7f53559b20a0, c_i=0x7f53559b6338, c_o=0x7f53559b6a98, bb_i=<optimized out>, bb_o=0x7f53559b3ee0, name=name@entry=0x7f5351b49cee "client", 
    sent=0x0, bsize=8192, after=1) at proxy_util.c:3839
#5  0x00007f5351b495d9 in proxy_connect_handler (r=0x7f53559b20a0, worker=<optimized out>, conf=<optimized out>, url=<optimized out>, proxyname=<optimized out>, proxyport=<optimized out>) at mod_proxy_connect.c:468
Comment 9 Yann Ylavic 2017-10-15 12:46:47 UTC
Created attachment 35427 [details]
File read buffer and flush threshold tuning

Thanks for testing Carsten.

I think I see what happens now, the issue is that whenever send_brigade_blocking() is entered it won't return before all the data are flushed, regardless of the remaining threshold.

This new patch is mostly unrelated (was mainly written some times ago to allow for better control of the outgoing traffic), but the changes in the core output filtering could address this case (see should_block() and how it's (re)used to bail out without blocking while we remain under the flush max threshold).

I didn't committed this patch to trunk yet because it was made for 2.4.x, and some parts quite differ in trunk which makes the frontport non-trivial.

Could you please test it and see if it helps for this case too?
It should be applied in addition to attachment 35426 [details] (v3) because proxy_connect needs to do the right thing regarding POLLOUT still.
Comment 10 Carsten Wolff 2017-10-16 11:14:32 UTC
(In reply to Yann Ylavic from comment #9)
> Could you please test it and see if it helps for this case too?
> It should be applied in addition to attachment 35426 [details] (v3) because
> proxy_connect needs to do the right thing regarding POLLOUT still.

I did now test these two patches together. The result is unchanged. strace and backtrace look the same, send_brigade_blocking() calls apr_poll() with a timeout of 300 seconds. After timeout it comes back with APR_TIMEUP, which results in a socket close.
So it seems, that should_flush() sets flush_upto to a true value and thus has no influence on the situation.
Comment 11 Yann Ylavic 2017-10-16 11:29:21 UTC
Could you run the test with LogLevel trace6 and provide the error_log?
Comment 12 Carsten Wolff 2017-10-16 13:49:35 UTC
Created attachment 35429 [details]
trace6 error log of a single test

(In reply to Yann Ylavic from comment #11)
> Could you run the test with LogLevel trace6 and provide the error_log?

here you are. It should be a clean log of just one test case
Comment 13 Yann Ylavic 2017-10-17 08:42:06 UTC
Created attachment 35430 [details]
Handle POLLOUT in proxy_connect (v4)

Thanks for the traces, I missed that proxy_connect was FLUSHing data.

How about this v4 (to replace attachment 35426 [details] (v3), in addition to attachment 35427 [details] still)?
Comment 14 Carsten Wolff 2017-10-17 08:46:03 UTC
Created attachment 35431 [details]
shortening the poll timeout for send_brigade_blocking

I tried this patch on top of your patches, but unfortunately it just leads to the connection loss happening more quickly:

[Tue Oct 17 09:08:16.947301 2017] [core:trace6] [pid 13571:tid 140023972112128] core_filters.c(878): (11)Resource temporarily unavailable: [client ::1:44128] writev_nonblocking: 62656/65536 bytes
[Tue Oct 17 09:08:16.947332 2017] [core:trace6] [pid 13571:tid 140023972112128] core_filters.c(407): [client ::1:44128] core_output_filter: flushing because of max threshold
[Tue Oct 17 09:08:16.947339 2017] [core:trace6] [pid 13571:tid 140023972112128] core_filters.c(878): (11)Resource temporarily unavailable: [client ::1:44128] writev_nonblocking: 0/68416 bytes
[Tue Oct 17 09:08:16.947344 2017] [core:trace6] [pid 13571:tid 140023972112128] core_filters.c(407): [client ::1:44128] core_output_filter: flushing because of max threshold
[Tue Oct 17 09:08:16.947350 2017] [core:trace1] [pid 13571:tid 140023972112128] core_filters.c(560): (70007)The timeout specified has expired: [client ::1:44128] core_output_filter: writing data to the network
[Tue Oct 17 09:08:16.947356 2017] [proxy:error] [pid 13571:tid 140023972112128] (70007)The timeout specified has expired: [client ::1:44128] AH03307: ap_proxy_transfer_between_connections: error on backend - ap_pass_brigade
[Tue Oct 17 09:08:16.947362 2017] [proxy:trace2] [pid 13571:tid 140023972112128] proxy_util.c(3843): (70007)The timeout specified has expired: [client ::1:44128] ap_proxy_transfer_between_connections complete
[Tue Oct 17 09:08:16.947368 2017] [proxy_connect:trace2] [pid 13571:tid 140023972112128] mod_proxy_connect.c(532): [client ::1:44128] finished with poll() - cleaning up
[Tue Oct 17 09:08:16.947390 2017] [core:trace6] [pid 13571:tid 140023972112128] core_filters.c(418): [client ::1:873] core_output_filter: flushing because of FLUSH bucket
Comment 15 Yann Ylavic 2017-10-17 09:07:40 UTC
Created attachment 35432 [details]
Handle POLLOUT in proxy_connect (v5)

Please use v5 instead, v4 can't work before it does not play well with EAGAIN.
Comment 16 Carsten Wolff 2017-10-17 11:06:33 UTC
(In reply to Yann Ylavic from comment #15)
> Created attachment 35432 [details]
> Handle POLLOUT in proxy_connect (v5)
> 
> Please use v5 instead, v4 can't work before it does not play well with
> EAGAIN.

With v5 + "File read buffer and flush threshold tuning", proxy_connect gets stuck very early on:



[Tue Oct 17 12:55:53.597014 2017] [core:trace5] [pid 9813:tid 140438873556736] protocol.c(645): [client ::1:44136] Request received from client: CONNECT localhost:873 HTTP/1.0
[Tue Oct 17 12:55:53.597118 2017] [http:trace4] [pid 9813:tid 140438873556736] http_request.c(433): [client ::1:44136] Headers received from client:
[Tue Oct 17 12:55:53.597268 2017] [authz_core:debug] [pid 9813:tid 140438873556736] mod_authz_core.c(809): [client ::1:44136] AH01626: authorization result of Require host localhost: granted
[Tue Oct 17 12:55:53.597278 2017] [authz_core:debug] [pid 9813:tid 140438873556736] mod_authz_core.c(809): [client ::1:44136] AH01626: authorization result of <RequireAny>: granted
[Tue Oct 17 12:55:53.597282 2017] [core:trace3] [pid 9813:tid 140438873556736] request.c(304): [client ::1:44136] request authorized without authentication by access_checker_ex hook: localhost:873
[Tue Oct 17 12:55:53.597291 2017] [proxy_connect:trace1] [pid 9813:tid 140438873556736] mod_proxy_connect.c(141): [client ::1:44136] canonicalising URL localhost:873
[Tue Oct 17 12:55:53.597312 2017] [proxy:trace2] [pid 9813:tid 140438873556736] proxy_util.c(1970): [client ::1:44136] *: found forward proxy worker for localhost:873
[Tue Oct 17 12:55:53.597318 2017] [proxy:debug] [pid 9813:tid 140438873556736] mod_proxy.c(1227): [client ::1:44136] AH01143: Running scheme localhost handler (attempt 0)
[Tue Oct 17 12:55:53.597322 2017] [proxy_connect:trace1] [pid 9813:tid 140438873556736] mod_proxy_connect.c(190): [client ::1:44136] serving URL localhost:873
[Tue Oct 17 12:55:53.597329 2017] [proxy_connect:debug] [pid 9813:tid 140438873556736] mod_proxy_connect.c(207): [client ::1:44136] AH01019: connecting localhost:873 to localhost:873
[Tue Oct 17 12:55:53.597372 2017] [proxy_connect:trace1] [pid 9813:tid 140438873556736] mod_proxy_connect.c(233): [client ::1:44136] connecting to remote proxy localhost on port 873
[Tue Oct 17 12:55:53.597384 2017] [proxy:trace2] [pid 9813:tid 140438873556736] proxy_util.c(2078): [client ::1:44136] CONNECT: fam 10 socket created to connect to localhost
[Tue Oct 17 12:55:53.597466 2017] [proxy_connect:trace3] [pid 9813:tid 140438873556736] mod_proxy_connect.c(311): [client ::1:44136] connection complete to [::1]:873 (localhost)
[Tue Oct 17 12:55:53.597483 2017] [proxy_connect:trace1] [pid 9813:tid 140438873556736] mod_proxy_connect.c(333): [client ::1:44136] Returning 200 OK
[Tue Oct 17 12:55:53.597493 2017] [core:trace6] [pid 9813:tid 140438873556736] core_filters.c(418): [client ::1:44136] core_output_filter: flushing because of FLUSH bucket
[Tue Oct 17 12:55:53.597522 2017] [core:trace6] [pid 9813:tid 140438873556736] core_filters.c(878): [client ::1:44136] writev_nonblocking: 76/76 bytes
[Tue Oct 17 12:55:53.597522 2017] [proxy_connect:trace2] [pid 9813:tid 140438873556736] mod_proxy_connect.c(356): [client ::1:44136] setting up poll()
[Tue Oct 17 12:55:53.597651 2017] [proxy:trace2] [pid 9813:tid 140438873556736] proxy_util.c(3844): [client ::1:44136] ap_proxy_transfer_between_connections complete
[Tue Oct 17 12:55:53.980879 2017] [proxy:debug] [pid 13857:tid 140438968014016] proxy_util.c(1779): AH00925: initializing worker proxy:forward shared
[Tue Oct 17 12:55:53.980922 2017] [proxy:debug] [pid 13857:tid 140438968014016] proxy_util.c(1821): AH00927: initializing worker proxy:forward local
[Tue Oct 17 12:55:53.980933 2017] [proxy:debug] [pid 13857:tid 140438968014016] proxy_util.c(1856): AH00930: initialized pool in child 13857 for (*) min=0 max=25 smax=25



Thread 2 (Thread 0x7fba792bc700 (LWP 9815)):
#0  0x00007fba7df1b0f3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007fba7e411201 in impl_pollset_poll (pollset=0x7fba7eb93f78, timeout=<optimized out>, num=0x7fba792b9aa4, descriptors=0x7fba792b9ab0) at ./poll/unix/epoll.c:266
#2  0x00007fba7ad29198 in proxy_connect_handler (r=0x7fba7eb920a0, worker=<optimized out>, conf=<optimized out>, url=<optimized out>, proxyname=<optimized out>, proxyport=<optimized out>) at mod_proxy_connect.c:401



# netstat -pnt | grep -v :22
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp6      14      0 ::1:35806               ::1:873                 ESTABLISHED 9813/apache2        
tcp6       0      0 ::1:44136               ::1:80                  ESTABLISHED 13855/rsync         
tcp6       0      0 ::1:873                 ::1:35806               ESTABLISHED 13856/rsync         
tcp6       0      0 ::1:80                  ::1:44136               ESTABLISHED 9813/apache2
Comment 17 Carsten Wolff 2017-10-19 12:35:48 UTC
Created attachment 35442 [details]
suppress poll with long timeout in filter code

Hi again,

this patch, ontop of https://bz.apache.org/bugzilla/attachment.cgi?id=35427 and https://bz.apache.org/bugzilla/attachment.cgi?id=35432 fixes it for me (tm).

I'm not sure, if this is an acceptable way of doing it for you and I am sure the patch needs cleanup, but I wanted to let you know immediately, since I have no time for the rest of the day.
Comment 18 Carsten Wolff 2017-10-19 15:28:18 UTC
Created attachment 35443 [details]
suppress poll with long timeout in filter code (v2)

here is a cleanup of the patch, I hope this is better to read
Comment 19 Yann Ylavic 2017-10-21 12:47:39 UTC
Thanks for the patch Carsten, but there can't be an exception for send_brigade_blocking() to block when the conditions are met, the core output filter must control the memory buckets in flight (set aside).

Please see new proposal below.
Comment 20 Yann Ylavic 2017-10-21 13:00:35 UTC
Created attachment 35446 [details]
Handle POLLOUT in proxy_connect (v6)

This new patch gives more control (flags) to the caller of ap_proxy_transfer_between_connections(), which can now stop its loop when the output filter chain starts buffering data.

It also refactors proxy_connect_handler() to have a symetric loop because the logic had to be duplicated on both sides (the more we add logic, the less it becomes readable).

I made more testing with it and it seems to work as expected here..

I also have an update for attachment 35427 [details] but this patch is actually barely related (the above patch should work on its own for the proxy connect case).

I can provide it here too if you wish because it allows to transfer more than 8K bytes at a time (a current limitation in mod_proxy_connect).
Comment 21 Carsten Wolff 2017-10-21 14:39:45 UTC
(In reply to Yann Ylavic from comment #19)
> Thanks for the patch Carsten, but there can't be an exception for
> send_brigade_blocking() to block when the conditions are met, the core
> output filter must control the memory buckets in flight (set aside).

Sure, using the timeout value as a criterion in send_brigade_blocking is ugly. I see there's a APR_NONBLOCK_READ flag one can pass to ap_get_brigade(). Maybe there should be a APR_NONBLOCK_WRITE flag one could pass to ap_pass_brigade() (and further down to send_brigade_blocking) to absolutely make sure, the poll() there is never hit?

(In reply to Yann Ylavic from comment #20)
> Created attachment 35446 [details]
> Handle POLLOUT in proxy_connect (v6)
> 
> This new patch gives more control (flags) to the caller of
> ap_proxy_transfer_between_connections(), which can now stop its loop when
> the output filter chain starts buffering data.
>
> It also refactors proxy_connect_handler() to have a symetric loop because
> the logic had to be duplicated on both sides (the more we add logic, the
> less it becomes readable).

Much nicer indeed, you also choose the right socket on POLLOUT now, which was wrong in v5.

> I made more testing with it and it seems to work as expected here..

It gives me a hard time to reproduce the original issue. :-) But, it now blocks at the end of the transfer in mod_proxy_connect's poll() for 300 seconds before finishing successfully. See backtrace and log below.

> I also have an update for attachment 35427 [details] but this patch is
> actually barely related (the above patch should work on its own for the
> proxy connect case).
> 
> I can provide it here too if you wish because it allows to transfer more
> than 8K bytes at a time (a current limitation in mod_proxy_connect).

This might be interesting for performance of connections on loopback, where the segment size is 64k.

Current backtrace:
Thread 2 (Thread 0x7ffa5f0b9700 (LWP 5797)):
#0  0x00007ffa63d180f3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffa6420e201 in impl_pollset_poll (pollset=0x7ffa64990f78, timeout=<optimized out>, num=0x7ffa5f0b6a94, descriptors=0x7ffa5f0b6aa0) at ./poll/unix/epoll.c:266
#2  0x00007ffa60b263b7 in proxy_connect_handler (r=0x7ffa6498f0a0, worker=<optimized out>, conf=<optimized out>, url=<optimized out>, proxyname=<optimized out>, 
    proxyport=<optimized out>) at mod_proxy_connect.c:442


Log tail of the end of the transfer:
[Sat Oct 21 16:28:29.475904 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:28:29.476140 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [client ::1:33340] writev_nonblocking: 8192/8192 bytes
[Sat Oct 21 16:28:29.476241 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:28:29.476510 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [client ::1:33340] writev_nonblocking: 8192/8192 bytes
[Sat Oct 21 16:28:29.476685 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:28:29.476985 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [client ::1:33340] writev_nonblocking: 8192/8192 bytes
[Sat Oct 21 16:33:46.176810 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.178932 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.179110 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [remote ::1:873] writev_nonblocking: 4/4 bytes
[Sat Oct 21 16:33:46.186064 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [client ::1:33340] writev_nonblocking: 4272/4272 bytes
[Sat Oct 21 16:33:46.199852 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.200032 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [remote ::1:873] writev_nonblocking: 5/5 bytes
[Sat Oct 21 16:33:46.200317 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.200365 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [client ::1:33340] writev_nonblocking: 5/5 bytes
[Sat Oct 21 16:33:46.200793 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.200898 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [remote ::1:873] writev_nonblocking: 7/7 bytes
[Sat Oct 21 16:33:46.200920 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.200952 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [client ::1:33340] writev_nonblocking: 6/6 bytes
[Sat Oct 21 16:33:46.242005 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.242141 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [client ::1:33340] writev_nonblocking: 25/25 bytes
[Sat Oct 21 16:33:46.242524 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.244044 2017] [core:trace6] [pid 5795:tid 140713313146624] core_filters.c(878): [remote ::1:873] writev_nonblocking: 5/5 bytes
[Sat Oct 21 16:33:46.256132 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.256195 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): (70014)End of file found: [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.256236 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.256245 2017] [proxy:trace2] [pid 5795:tid 140713313146624] proxy_util.c(3845): (70014)End of file found: [client ::1:33340] ap_proxy_transfer_between_connections complete
[Sat Oct 21 16:33:46.256368 2017] [proxy_connect:trace2] [pid 5795:tid 140713313146624] mod_proxy_connect.c(558): [client ::1:33340] finished with poll() - cleaning up
Comment 22 Yann Ylavic 2017-10-22 12:22:57 UTC
(In reply to Carsten Wolff from comment #21)
> 
> Maybe there should be a APR_NONBLOCK_WRITE flag one could
> pass to ap_pass_brigade() (and further down to send_brigade_blocking) to
> absolutely make sure, the poll() there is never hit?

When the poll() is hit in send_brigade_blocking() it means the core output filter ought to block, because we don't want unbounded buffering.

Suppose a handler keeps sending data out, when the socket becomes full we must stop buffering soon or later (i.e. above THRESHOLD_MAX_BUFFER or something configurable which is the point of attachment 35427 [details], among other parameters).

Otherwise it's unbounded memory usage.
Comment 23 Yann Ylavic 2017-10-22 12:27:36 UTC
(In reply to Carsten Wolff from comment #21)
> 
> It gives me a hard time to reproduce the original issue. :-) But, it now
> blocks at the end of the transfer in mod_proxy_connect's poll() for 300
> seconds before finishing successfully. See backtrace and log below.

It looks like the socket is not really closed (before timeout) on the one or other end (proxy_connect won't close on its own).

What's the output if you force "#define DEBUGGING" somewhere before proxy_connect_handler() in mod_proxy_connect.c?
Comment 24 Yann Ylavic 2017-10-22 12:52:02 UTC
Created attachment 35447 [details]
Read buffer size and flush threshold tuning

Allows to configure ReadBufferSize, FlushMinThreshold, FlushMaxThreshold and FlushMaxPipelined (instead of hard coded values).

For your proxy_connect case where large files may be rsync-ed, something like the below looks not absurd:

# Read more at once than the default 8K
ReadBufferSize 65536
# Don't block below this (i.e. more than ReadBufferSize)
FlushMaxThreshold 131072
# Always flush (non-blocking) when asked to write (i.e. don't retain data)
FlushMinThreshold 1
Comment 25 Yann Ylavic 2017-10-22 13:18:44 UTC
(In reply to Yann Ylavic from comment #24)
> Created attachment 35447 [details]
> Read buffer size and flush threshold tuning

Also not note that send_brigade_blocking() is now a bit misnamed, it's rather send_brigade_maybe_blocking() since it first attemps to write non-blocking in any case and that might prevent the poll() if !should_block() afterward...
Comment 26 Carsten Wolff 2017-10-22 19:41:39 UTC
I added this just at the end of the poll() loop:
        ap_log_rerror(APLOG_MARK, APLOG_TRACE2, 0, r,
            "@END: "
            "[%s: POLLIN: %d POLLOUT: %d POLLHUP: %d data_in_filters: %d] "
            "[%s: POLLIN: %d POLLOUT: %d POLLHUP: %d data_in_filters: %d]",
            conns[0].name,
            conns[0].pfd.reqevents & APR_POLLIN && 1,
            conns[0].pfd.reqevents & APR_POLLOUT && 1,
            conns[0].pfd.reqevents & APR_POLLHUP && 1,
            conns[0].c->data_in_output_filters,
            conns[1].name,
            conns[1].pfd.reqevents & APR_POLLIN && 1,
            conns[1].pfd.reqevents & APR_POLLOUT && 1,
            conns[1].pfd.reqevents & APR_POLLHUP && 1,
            conns[1].c->data_in_output_filters
        );


The whole long log looks like this sample:

[Sun Oct 22 21:31:00.885835 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(452): [client ::1:33400] AH01024: woke from poll(), i=1
[Sun Oct 22 21:31:00.885891 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(483): [client ::1:33400] AH01025: client was readable
[Sun Oct 22 21:31:00.885946 2017] [proxy:trace2] [pid 26472:tid 140206004721408] proxy_util.c(3845): [client ::1:33400] ap_proxy_transfer_between_connections complete
[Sun Oct 22 21:31:00.885960 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(517): [client ::1:33400] backend wait writable
[Sun Oct 22 21:31:00.885978 2017] [proxy_connect:trace2] [pid 26472:tid 140206004721408] mod_proxy_connect.c(569): [client ::1:33400] @END: [client: POLLIN: 0 POLLOUT: 0 POLLHUP: 0 data_in_filters: 0] [backend: POLLIN: 1 POLLOUT: 1 POLLHUP: 0 data_in_filters: 1]
[Sun Oct 22 21:31:00.885992 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(452): [client ::1:33400] AH01024: woke from poll(), i=1
[Sun Oct 22 21:31:00.886002 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(528): [client ::1:33400] backend was writable
[Sun Oct 22 21:31:00.886068 2017] [core:trace6] [pid 26472:tid 140206004721408] core_filters.c(878): [remote ::1:873] writev_nonblocking: 8/8 bytes
[Sun Oct 22 21:31:00.886108 2017] [proxy_connect:trace2] [pid 26472:tid 140206004721408] mod_proxy_connect.c(569): [client ::1:33400] @END: [client: POLLIN: 1 POLLOUT: 0 POLLHUP: 0 data_in_filters: 0] [backend: POLLIN: 1 POLLOUT: 0 POLLHUP: 0 data_in_filters: 0]
[Sun Oct 22 21:31:00.888170 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(452): [client ::1:33400] AH01024: woke from poll(), i=1
[Sun Oct 22 21:31:00.888229 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(483): [client ::1:33400] AH01025: backend was readable
[Sun Oct 22 21:31:00.888271 2017] [proxy:trace2] [pid 26472:tid 140206004721408] proxy_util.c(3845): [client ::1:33400] ap_proxy_transfer_between_connections complete
[Sun Oct 22 21:31:00.888307 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(517): [client ::1:33400] client wait writable
[Sun Oct 22 21:31:00.888329 2017] [proxy_connect:trace2] [pid 26472:tid 140206004721408] mod_proxy_connect.c(569): [client ::1:33400] @END: [client: POLLIN: 1 POLLOUT: 1 POLLHUP: 0 data_in_filters: 1] [backend: POLLIN: 0 POLLOUT: 0 POLLHUP: 0 data_in_filters: 0]
[Sun Oct 22 21:31:00.888344 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(452): [client ::1:33400] AH01024: woke from poll(), i=1
[Sun Oct 22 21:31:00.888354 2017] [proxy_connect:debug] [pid 26472:tid 140206004721408] mod_proxy_connect.c(528): [client ::1:33400] client was writable
[Sun Oct 22 21:31:00.888450 2017] [core:trace6] [pid 26472:tid 140206004721408] core_filters.c(878): [client ::1:33400] writev_nonblocking: 12/12 bytes
[Sun Oct 22 21:31:00.888492 2017] [proxy_connect:trace2] [pid 26472:tid 140206004721408] mod_proxy_connect.c(569): [client ::1:33400] @END: [client: POLLIN: 1 POLLOUT: 0 POLLHUP: 0 data_in_filters: 0] [backend: POLLIN: 1 POLLOUT: 0 POLLHUP: 0 data_in_filters: 0]


Two things are peculiar:
- we no longer request APR_POLLHUP events. Is that intentional?
- It seems like when ap_proxy_transfer_between_connections returns, "data_in_output_filters" is always true on the outgoing connection and we're always entering the "write wait" path.
Comment 27 Yann Ylavic 2017-10-23 12:40:09 UTC
The logs from comment 21 and comment 26 look normal, no timeout triggered, proxy_connect_handler() seems to be waiting for sockets to be ready to do its job.

(In reply to Carsten Wolff from comment #26)
> I added this just at the end of the poll() loop:
>         ap_log_rerror(APLOG_MARK, APLOG_TRACE2, 0, r,
>             "@END: "
>             "[%s: POLLIN: %d POLLOUT: %d POLLHUP: %d data_in_filters: %d] "
>             "[%s: POLLIN: %d POLLOUT: %d POLLHUP: %d data_in_filters: %d]",
>             conns[0].name,
>             conns[0].pfd.reqevents & APR_POLLIN && 1,
>             conns[0].pfd.reqevents & APR_POLLOUT && 1,
>             conns[0].pfd.reqevents & APR_POLLHUP && 1,
>             conns[0].c->data_in_output_filters,
>             conns[1].name,
>             conns[1].pfd.reqevents & APR_POLLIN && 1,
>             conns[1].pfd.reqevents & APR_POLLOUT && 1,
>             conns[1].pfd.reqevents & APR_POLLHUP && 1,
>             conns[1].c->data_in_output_filters
>         );
> 
> 
> The whole long log looks like this sample:

Events are fine here.

> 
> Two things are peculiar:
> - we no longer request APR_POLLHUP events. Is that intentional?

POLLHUP is a returned event only (ignored when requested usually), though APR might maybe require it when pollset is implemented with select() (will loook at this), it shouldn't be the case on Linux hence for your testing...


> - It seems like when ap_proxy_transfer_between_connections returns,
> "data_in_output_filters" is always true on the outgoing connection and we're
> always entering the "write wait" path.

Yes, ap_proxy_transfer_between_connections() used to forward everything it reads until EAGAIN (non-blocking on read only), that's why the core output filter ended up blocking when the outgoing socket didn't follow the incoming rate.
I changed this in attachment 35446 [details] with a flag (AP_PROXY_TRANSFER_UNBUFFERED) to tell ap_proxy_transfer_between_connections() to stop when buffering happens in ouput too (i.e. data_in_output_filters), such that in this case mod_proxy_connect can use POLLOUT before trying again.
Comment 28 Carsten Wolff 2017-10-24 18:37:12 UTC
Thanks Yann for your patience and all your work! I have some last observations to share:

- When combining https://bz.apache.org/bugzilla/attachment.cgi?id=35427 and https://bz.apache.org/bugzilla/attachment.cgi?id=35446, I still get frequent stalls at the end of the transfer. Also, I observed a higher CPU usage by apache compared to e.g. apache-2.4.27 release. During transfer, the apache process used ~50% of one core in my test setup. For reference, I also ran the same test with squid3 as a connect proxy and it used ~9% of one core.

- When combining https://bz.apache.org/bugzilla/attachment.cgi?id=35447 and https://bz.apache.org/bugzilla/attachment.cgi?id=35446, I can no longer produce any stall during or at the end of the transfer. Also, the CPU usage by apache is reduced to ~14%

- When combining https://bz.apache.org/bugzilla/attachment.cgi?id=35447 and https://bz.apache.org/bugzilla/attachment.cgi?id=35446 and applying the settings you suggested in comment#24, everything still works and the CPU usage drops to ~8%.

I will now have to run tests on my target platform, which is a bit more complicated, so it will take a few days, but I suppose we can assume this bug is fixed now. Thanks again!
Comment 29 Carsten Wolff 2017-11-09 19:45:50 UTC
Hi, unfortunately, using patch#35446 and patch#35447, there is another problem, when I change the scenario by making the client connection through SSL. In that case, the transfer stalls very quickly within the proxy_connect poll loop:


[Thu Nov 09 20:34:28.029653 2017] [proxy_connect:debug] [pid 19603:tid 139754421921536] mod_proxy_connect.c(246): [client 127.0.0.1:56996] AH01019: connecting localhost:873 to localhost:873
[Thu Nov 09 20:34:28.029745 2017] [proxy_connect:trace1] [pid 19603:tid 139754421921536] mod_proxy_connect.c(272): [client 127.0.0.1:56996] connecting to remote proxy localhost on port 873
[Thu Nov 09 20:34:28.029776 2017] [proxy:trace2] [pid 19603:tid 139754421921536] proxy_util.c(2080): [client 127.0.0.1:56996] CONNECT: fam 10 socket created to connect to localhost
[Thu Nov 09 20:34:28.029877 2017] [proxy_connect:trace3] [pid 19603:tid 139754421921536] mod_proxy_connect.c(355): [client 127.0.0.1:56996] connection complete to [::1]:873 (localhost)
[Thu Nov 09 20:34:28.029905 2017] [proxy_connect:trace1] [pid 19603:tid 139754421921536] mod_proxy_connect.c(377): [client 127.0.0.1:56996] Returning 200 OK
[Thu Nov 09 20:34:28.029921 2017] [ssl:trace6] [pid 19603:tid 139754421921536] ssl_engine_io.c(859): [client 127.0.0.1:56996] ssl_filter_write: 76 bytes
[Thu Nov 09 20:34:28.029946 2017] [ssl:trace6] [pid 19603:tid 139754421921536] ssl_engine_io.c(219): [client 127.0.0.1:56996] bio_filter_out_write: 105 bytes
[Thu Nov 09 20:34:28.029967 2017] [ssl:trace4] [pid 19603:tid 139754421921536] ssl_engine_io.c(2220): [client 127.0.0.1:56996] OpenSSL: write 105/105 bytes to BIO#7f1b040021b0 [mem: 7f1b040025a3] (BIO dump follows)
[Thu Nov 09 20:34:28.029982 2017] [core:trace6] [pid 19603:tid 139754421921536] core_filters.c(458): [client 127.0.0.1:56996] core_output_filter: flushing because of FLUSH bucket
[Thu Nov 09 20:34:28.030012 2017] [core:trace6] [pid 19603:tid 139754421921536] core_filters.c(857): [client 127.0.0.1:56996] writev_nonblocking: 105/105 bytes
[Thu Nov 09 20:34:28.030027 2017] [proxy_connect:trace2] [pid 19603:tid 139754421921536] mod_proxy_connect.c(399): [client 127.0.0.1:56996] setting up poll()
[Thu Nov 09 20:34:28.032931 2017] [proxy_connect:debug] [pid 19603:tid 139754421921536] mod_proxy_connect.c(454): [client 127.0.0.1:56996] AH01024: woke from poll(), i=1
[Thu Nov 09 20:34:28.032971 2017] [proxy_connect:debug] [pid 19603:tid 139754421921536] mod_proxy_connect.c(485): [client 127.0.0.1:56996] AH01025: client was readable
[Thu Nov 09 20:34:28.033024 2017] [ssl:trace4] [pid 19603:tid 139754421921536] ssl_engine_io.c(2220): [client 127.0.0.1:56996] OpenSSL: read 5/5 bytes from BIO#7f1b04002270 [mem: 7f1b040025a3] (BIO dump follows)
[Thu Nov 09 20:34:28.033041 2017] [ssl:trace4] [pid 19603:tid 139754421921536] ssl_engine_io.c(2220): [client 127.0.0.1:56996] OpenSSL: read 38/38 bytes from BIO#7f1b04002270 [mem: 7f1b040025a8] (BIO dump follows)
[Thu Nov 09 20:34:28.033060 2017] [proxy:debug] [pid 19603:tid 139754421921536] proxy_util.c(3803): [client 127.0.0.1:56996] AH03306: ap_proxy_transfer_between_connections: read 14 bytes from client
[Thu Nov 09 20:34:28.033089 2017] [proxy:trace2] [pid 19603:tid 139754421921536] proxy_util.c(3847): [client 127.0.0.1:56996] ap_proxy_transfer_between_connections complete
[Thu Nov 09 20:34:28.033098 2017] [proxy_connect:debug] [pid 19603:tid 139754421921536] mod_proxy_connect.c(519): [client 127.0.0.1:56996] backend wait writable
[Thu Nov 09 20:34:28.033112 2017] [proxy_connect:debug] [pid 19603:tid 139754421921536] mod_proxy_connect.c(454): [client 127.0.0.1:56996] AH01024: woke from poll(), i=1
[Thu Nov 09 20:34:28.033120 2017] [proxy_connect:debug] [pid 19603:tid 139754421921536] mod_proxy_connect.c(530): [client 127.0.0.1:56996] backend was writable
[Thu Nov 09 20:34:28.033159 2017] [core:trace6] [pid 19603:tid 139754421921536] core_filters.c(857): [remote ::1:873] writev_nonblocking: 14/14 bytes
[Thu Nov 09 20:34:28.033755 2017] [proxy_connect:debug] [pid 19603:tid 139754421921536] mod_proxy_connect.c(454): [client 127.0.0.1:56996] AH01024: woke from poll(), i=1
[Thu Nov 09 20:34:28.033782 2017] [proxy_connect:debug] [pid 19603:tid 139754421921536] mod_proxy_connect.c(485): [client 127.0.0.1:56996] AH01025: backend was readable
[Thu Nov 09 20:34:28.033817 2017] [proxy:debug] [pid 19603:tid 139754421921536] proxy_util.c(3803): [client 127.0.0.1:56996] AH03306: ap_proxy_transfer_between_connections: read 14 bytes from backend
[Thu Nov 09 20:34:28.033835 2017] [ssl:trace4] [pid 19603:tid 139754421921536] ssl_engine_io.c(1674): [client 127.0.0.1:56996] coalesce: have 0 bytes, adding 14 more
[Thu Nov 09 20:34:28.033848 2017] [proxy:trace2] [pid 19603:tid 139754421921536] proxy_util.c(3847): [client 127.0.0.1:56996] ap_proxy_transfer_between_connections complete
[Thu Nov 09 20:34:28.582512 2017] [proxy:debug] [pid 19638:tid 139754523858112] proxy_util.c(1781): AH00925: initializing worker proxy:forward shared
[Thu Nov 09 20:34:28.582666 2017] [proxy:debug] [pid 19638:tid 139754523858112] proxy_util.c(1823): AH00927: initializing worker proxy:forward local
[Thu Nov 09 20:34:28.582706 2017] [proxy:debug] [pid 19638:tid 139754523858112] proxy_util.c(1858): AH00930: initialized pool in child 19638 for (*) min=0 max=25 smax=25
[Thu Nov 09 20:34:28.582732 2017] [proxy:debug] [pid 19638:tid 139754523858112] proxy_util.c(1781): AH00925: initializing worker proxy:forward shared
[Thu Nov 09 20:34:28.582746 2017] [proxy:debug] [pid 19638:tid 139754523858112] proxy_util.c(1823): AH00927: initializing worker proxy:forward local
[Thu Nov 09 20:34:28.582772 2017] [proxy:debug] [pid 19638:tid 139754523858112] proxy_util.c(1858): AH00930: initialized pool in child 19638 for (*) min=0 max=25 smax=25
[Thu Nov 09 20:39:28.134117 2017] [proxy_connect:error] [pid 19603:tid 139754421921536] (70007)The timeout specified has expired: [client 127.0.0.1:56996] AH01023: polling
[Thu Nov 09 20:39:28.134322 2017] [proxy_connect:trace2] [pid 19603:tid 139754421921536] mod_proxy_connect.c(560): [client 127.0.0.1:56996] finished with poll() - cleaning up
[Thu Nov 09 20:39:28.134870 2017] [ssl:trace4] [pid 19603:tid 139754421921536] ssl_engine_io.c(1734): [client 127.0.0.1:56996] coalesce: passing on 14 bytes
Comment 30 Carsten Wolff 2017-11-10 06:43:23 UTC
(In reply to Carsten Wolff from comment #29)
> Hi, unfortunately, using patch#35446 and patch#35447, there is another
> problem, when I change the scenario by making the client connection through
> SSL. In that case, the transfer stalls very quickly within the proxy_connect
> poll loop

When I comment the coalescing filter in ssl_io_filter_init(), it works again. So it seems we need a way for mod_proxy_connect to cause mod_ssl to not use that coalescing filter. Is there an accepted way for passing flags from module to module or would this need to go through the core code?
Comment 31 Yann Ylavic 2017-11-10 10:01:20 UTC
Created attachment 35512 [details]
Handle POLLOUT in proxy_connect (v7)

Good catch, it makes no sense to try to coalesce TCP packets!
This v7 simply removes the mod_ssl coalescing filter from the chain before entering the mod_proxy_connect's loop.

The change between v6 and v7 lokks like this:

+     /* we are now acting as a tunnel - the input/output filter stacks should
+-     * not contain any non-connection filters.
++     * not contain any non-connection or coalescing filters.
+      */
++    ap_remove_output_filter_byhandle(c->output_filters,
++                                     "SSL/TLS Coalescing Filter");
+     r->output_filters = c->output_filters;

We'll probably want to use a helper from mod_ssl for this in the final patch, and avoid the magic name, but for now it should work..
Comment 32 Yann Ylavic 2017-11-10 10:26:45 UTC
Created attachment 35513 [details]
Handle POLLOUT in proxy_connect (v8)

Well, same for the backend connection I guess, hence this v8.
Comment 33 Carsten Wolff 2017-11-10 15:49:39 UTC
Tests with v8 look good with SSL. Thanks! :-)
Comment 34 Carsten Wolff 2017-12-07 12:31:57 UTC
Hi Yann,

I have hit another problem with these patches, they do break mod_jk (ajp13). The ajp code in mod_jk uses the ap_rwrite() function from server/protocol.c, which now sometimes breaks it's semantics and doesn't finish the write of the whole buffer passed to it. The only other user of ap_rwrite() I could find is mod_lua, which I haven't tested. Any idea?
Comment 35 Carsten Wolff 2017-12-14 22:39:53 UTC
Hum. It seems to be even worse. Now that proxy_connect works for my case with these patches, everything else seems to be broken, even the normal case of GET for static files produces incomplete files.
To test this, I just put a handfull of files of 500k into the document root and linked them from index.html. When I now do "wget -r", it recognizes partial content at connection close by the server:

--2017-12-14 23:37:48--  http://192.168.122.11/1.html
Connecting to 192.168.122.11:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 512000 (500K) [text/html]
Saving to: ‘192.168.122.11/1.html’

192.168.122.11/1.html                                   16%[===================>                                                                                                       ]  84,57K  --.-KB/s    in 0s      

2017-12-14 23:37:48 (316 MB/s) - Connection closed at byte 86604. Retrying.

--2017-12-14 23:37:49--  (try: 2)  http://192.168.122.11/1.html
Connecting to 192.168.122.11:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 512000 (500K), 425396 (415K) remaining [text/html]
Saving to: ‘192.168.122.11/1.html’

192.168.122.11/1.html                                   33%[++++++++++++++++++++====================>                                                                                  ] 169,09K  --.-KB/s    in 0,001s  

2017-12-14 23:37:49 (99,2 MB/s) - Connection closed at byte 173153. Retrying.
Comment 36 Yann Ylavic 2017-12-19 11:46:05 UTC
Sorry for the delay Carsten, just to clarify, you are running httpd with attachment 35513 [details] [Handle POLLOUT in proxy_connect (v8)] and attachment 35447 [details] [ Read buffer size and flush threshold tuning], no other patch right?

While attachment 35513 [details] alone should work for the proxy_connect case, perfomances are better with tunings from attachment 35447 [details], still right?

If this is the case, I'd suggest using attachment 35513 [details] only for now, that's less performances (than expected?) but the same as the original proxy_connect ones.

I'm looking at what could cause regressions in attachment 35447 [details], but this patch is work in progress anyway and not really related to this PR I think.
Comment 37 Carsten Wolff 2017-12-19 23:00:51 UTC
(In reply to Yann Ylavic from comment #36)
> Sorry for the delay Carsten, just to clarify, you are running httpd with
> attachment 35513 [details] [Handle POLLOUT in proxy_connect (v8)] and
> attachment 35447 [details] [ Read buffer size and flush threshold tuning],
> no other patch right?

Almost. I added these two patches to packages from debian sid (https://packages.debian.org/sid/apache2) and to the SuSE package from obs (https://build.opensuse.org/package/show/Apache/apache2), so there are some distribution patches in there, but nothing essential at this time.

> While attachment 35513 [details] alone should work for the proxy_connect
> case, perfomances are better with tunings from attachment 35447 [details],
> still right?

Yes, the single-transfer test runs for about the same time, but cpu load is less with attachment 35447 [details], which promises better performance with concurrency.

> If this is the case, I'd suggest using attachment 35513 [details] only for
> now, that's less performances (than expected?) but the same as the original
> proxy_connect ones.

I did a quick test without attachment 35447 [details] and it seems you're right, the regression seems to be in that patch. I will do more thorough testing tomorrow.

> I'm looking at what could cause regressions in attachment 35447 [details],
> but this patch is work in progress anyway and not really related to this PR
> I think.

Running the client (wget) on localhost, I cannot reproduce the error. Running it from another machine, it's there 100% of the time. The logs of both cases look exactly the same, except for the last two lines:

GOOD:
[Mon Dec 18 13:36:59.042282 2017] [core:trace6] [pid 3507:tid 140042357954304] core_filters.c(449): [client 192.168.122.11:43236] core_output_filter: flushing because of max threshold
[Mon Dec 18 13:36:59.042875 2017] [core:trace6] [pid 3507:tid 140042357954304] core_filters.c(857): [client 192.168.122.11:43236] writev_nonblocking: 512276/512276 bytes
[Mon Dec 18 13:36:59.042929 2017] [core:trace6] [pid 3507:tid 140042357954304] core_filters.c(458): [client 192.168.122.11:43236] core_output_filter: flushing because of FLUSH bucket

BAD:
[Mon Dec 18 13:37:21.164185 2017] [core:trace6] [pid 3537:tid 140042248849152] core_filters.c(449): [client 192.168.122.12:51428] core_output_filter: flushing because of max threshold
[Mon Dec 18 13:37:21.164499 2017] [core:trace6] [pid 3537:tid 140042248849152] core_filters.c(857): (11)Resource temporarily unavailable: [client 192.168.122.12:51428] writev_nonblocking: 86880/512276 bytes
[Mon Dec 18 13:37:21.164548 2017] [core:trace6] [pid 3537:tid 140042248849152] core_filters.c(449): [client 192.168.122.12:51428] core_output_filter: flushing because of max threshold
[Mon Dec 18 13:37:21.164566 2017] [core:trace1] [pid 3537:tid 140042248849152] core_filters.c(571): (70007)The timeout specified has expired: [client 192.168.122.12:51428] core_output_filter: writing data to the network
Comment 38 Carsten Wolff 2017-12-19 23:35:14 UTC
(In reply to Carsten Wolff from comment #37)
> (In reply to Yann Ylavic from comment #36)
> > If this is the case, I'd suggest using attachment 35513 [details] only for
> > now, that's less performances (than expected?) but the same as the original
> > proxy_connect ones.
> 
> I did a quick test without attachment 35447 [details] and it seems you're
> right, the regression seems to be in that patch. I will do more thorough
> testing tomorrow.

Still, with just attachment 35513 [details], the rsync transfer through proxy_connect often stalls at the very end of the transfer. This disappeared when I also used attachment 35447 [details] and set FlushMinThreshold 1. That's the main reason why I used both patches.

The end of the log of such a stall:

[Wed Dec 20 00:23:54.200810 2017] [ssl:trace6] [pid 24662:tid 140235413370624] ssl_engine_io.c(219): [client 192.168.122.12:33064] bio_filter_out_write: 1317 bytes
[Wed Dec 20 00:23:54.200815 2017] [ssl:trace4] [pid 24662:tid 140235413370624] ssl_engine_io.c(2220): [client 192.168.122.12:33064] OpenSSL: write 1317/1317 bytes to BIO#7f8b080021b0 [mem: 7f8b080025a3] (BIO dump follows)
[Wed Dec 20 00:23:54.200820 2017] [proxy:trace2] [pid 24662:tid 140235413370624] proxy_util.c(3845): [client 192.168.122.12:33064] ap_proxy_transfer_between_connections complete
[Wed Dec 20 00:23:54.200835 2017] [core:trace6] [pid 24662:tid 140235413370624] core_filters.c(857): [client 192.168.122.12:33064] writev_nonblocking: 1317/1317 bytes
[Wed Dec 20 00:28:54.301391 2017] [proxy_connect:error] [pid 24662:tid 140235413370624] (70007)The timeout specified has expired: [client 192.168.122.12:33064] AH01023: polling
[Wed Dec 20 00:28:54.301651 2017] [proxy_connect:trace2] [pid 24662:tid 140235413370624] mod_proxy_connect.c(562): [client 192.168.122.12:33064] finished with poll() - cleaning up
Comment 39 Yann Ylavic 2017-12-20 13:57:47 UTC
Created attachment 35620 [details]
Read buffer size and flush threshold tuning (v2)

Thanks for the details.

I was able to reproduce and the issue came from the socket timeout set to zero in ap_core_output_filter(), and later used for apr_poll() in send_brigade_blocking() (i.e. poll() was asked for immediate POLLOUT availability and timeouted otherwise, e.g. not on localhost...).

Fixed in this v2 by passing the (original) timeout to send_brigade_blocking().
Comment 40 Yann Ylavic 2017-12-20 14:01:28 UTC
Created attachment 35621 [details]
Handle POLLOUT in proxy_connect (v9)

While at it, I also updated this patch for a better ap_proxy_transfer_between_connections() interface regarding the flags parameter.

Would you mind testing this one with your update rather than v8?
Thanks!
Comment 41 Yann Ylavic 2017-12-20 14:18:51 UTC
Created attachment 35623 [details]
Handle POLLOUT in proxy_connect (v10)

Argh, new v10 since v9 changed semantics of ap_proxy_transfer_between_connections() regarding EAGAIN (which it can't since it's public API).
Comment 42 Carsten Wolff 2018-01-02 22:12:30 UTC
Created attachment 35639 [details]
pcaps of a stall in rsync handshake (Test4)

Hello Yann, I wish you a happy new year!

I did some more testing on different combinations of these patches. Here are the results (I will call patch 35620 "v2" and patch 35623 "v10"):

Test1 (only v10 applied):
    - wget for static contents: OK
    - rsync through SSL+proxy_connect: sometimes still stalls at the very end of the transfer (just like in comment#38)

Test2 (v10 and v2 applied):
    - wget for static contents: OK
    - rsync through SSL+proxy_connect: often stalls at the beginning of the transfer (during the rsync protocol handshake)

Test3 (v10 and v2 applied, configuration from comment#24 added):
    - wget for static contents: OK
    - rsync through SSL+proxy_connect: OK (can't reproduce in the scenario I used so far ... but ...)

Because I was under the impression that timing plays a role in that code, I decided to repeat Test3 under different network conditions and used netem(1) to change the characteristics of the virtual network:

Test4 (v10 and v2 applied, configuration from comment#24 added, netem):
    - wget for static contents: OK
    - rsync through SSL+proxy_connect: often stalls(2)(3) at the beginning of the transfer (during the rsync protocol handshake)

So I think it's safe to say that the bug in attachment 35447 [details] has been fixed by "v2". But, unfortunately, stalls in SSL+proxy_connect can still be provoked under certain conditions. Next, I tried without SSL:

Test5 (v10 and v2 applied, configuration from comment#24 added, netem, no SSL): 
    - rsync through proxy_connect: OK (cannot seem to reproduce the stalls)

I have a feeling I should repeat Tests 1 and 2 without SSL to see if SSL is necessary for all the stalls to be reproduced. I'll let you know.

(1)
tc qdisc add dev vnet0 root handle 1: netem delay 100ms 50ms 25%
tc qdisc add dev vnet0 parent 1:1 pfifo limit 1000

(2) log of stall in Test4
[Thu Dec 28 09:06:01.752182 2017] [proxy_connect:debug] [pid 9674:tid 139811025307392] mod_proxy_connect.c(503): [client 192.168.122.12:33950] client wait writable
[Thu Dec 28 09:06:01.752210 2017] [proxy_connect:debug] [pid 9674:tid 139811025307392] mod_proxy_connect.c(456): [client 192.168.122.12:33950] AH01024: woke from poll(), i=1
[Thu Dec 28 09:06:01.752227 2017] [proxy_connect:debug] [pid 9674:tid 139811025307392] mod_proxy_connect.c(531): [client 192.168.122.12:33950] client was writable
[Thu Dec 28 09:06:01.752316 2017] [core:trace6] [pid 9674:tid 139811025307392] core_filters.c(740): [client 192.168.122.12:33950] writev_nonblocking: 69/69 bytes
[Thu Dec 28 09:06:01.894674 2017] [proxy_connect:debug] [pid 9674:tid 139811025307392] mod_proxy_connect.c(456): [client 192.168.122.12:33950] AH01024: woke from poll(), i=1
[Thu Dec 28 09:06:01.894840 2017] [proxy_connect:debug] [pid 9674:tid 139811025307392] mod_proxy_connect.c(487): [client 192.168.122.12:33950] AH01025: client was readable
[Thu Dec 28 09:06:01.895319 2017] [ssl:trace4] [pid 9674:tid 139811025307392] ssl_engine_io.c(2220): [client 192.168.122.12:33950] OpenSSL: read 5/5 bytes from BIO#7f282c002270 [mem: 7f282c002bc3] (BIO dump follows)
[Thu Dec 28 09:06:01.895393 2017] [ssl:trace4] [pid 9674:tid 139811025307392] ssl_engine_io.c(2220): [client 192.168.122.12:33950] OpenSSL: read 64/64 bytes from BIO#7f282c002270 [mem: 7f282c002bc8] (BIO dump follows)
[Thu Dec 28 09:06:01.895495 2017] [proxy:trace2] [pid 9674:tid 139811025307392] proxy_util.c(3845): [client 192.168.122.12:33950] ap_proxy_transfer_between_connections: output filters full
[Thu Dec 28 09:06:01.895531 2017] [proxy:trace2] [pid 9674:tid 139811025307392] proxy_util.c(3866): (70008)Partial results are valid but processing is incomplete: [client 192.168.122.12:33950] ap_proxy_transfer_between_connections finished
[Thu Dec 28 09:06:01.895557 2017] [proxy_connect:debug] [pid 9674:tid 139811025307392] mod_proxy_connect.c(503): [client 192.168.122.12:33950] backend wait writable
[Thu Dec 28 09:06:01.895737 2017] [proxy_connect:debug] [pid 9674:tid 139811025307392] mod_proxy_connect.c(456): [client 192.168.122.12:33950] AH01024: woke from poll(), i=1
[Thu Dec 28 09:06:01.895785 2017] [proxy_connect:debug] [pid 9674:tid 139811025307392] mod_proxy_connect.c(531): [client 192.168.122.12:33950] backend was writable
[Thu Dec 28 09:06:01.895915 2017] [core:trace6] [pid 9674:tid 139811025307392] core_filters.c(740): [remote ::1:873] writev_nonblocking: 9/9 bytes

(3) bt of stall in Test4
Thread 3 (Thread 0x7f284bfff700 (LWP 9827)):
#0  0x00007f2855f3f0f3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f2856435201 in impl_pollset_poll (pollset=0x7f2856b1e028, timeout=<optimized out>, num=0x7f284bffca34, descriptors=0x7f284bffca40) at ./poll/unix/epoll.c:266
#2  0x00007f2852d4c4b5 in proxy_connect_handler (r=0x7f2856b1c0a0, worker=<optimized out>, conf=<optimized out>, url=<optimized out>, proxyname=<optimized out>, proxyport=<optimized out>) at mod_proxy_connect.c:446
#3  0x00007f2852f5b3fc in proxy_run_scheme_handler (r=r@entry=0x7f2856b1c0a0, worker=0x7f2856c638d8, conf=conf@entry=0x7f2856c29320, url=0x7f2856b1d706 "localhost:873", proxyhost=proxyhost@entry=0x0, 
    proxyport=proxyport@entry=0) at mod_proxy.c:2888
#4  0x00007f2852f5c371 in proxy_handler (r=0x7f2856b1c0a0) at mod_proxy.c:1230
Comment 43 Carsten Wolff 2018-01-08 12:25:57 UTC
(In reply to Carsten Wolff from comment #42)
> I have a feeling I should repeat Tests 1 and 2 without SSL to see if SSL is
> necessary for all the stalls to be reproduced. I'll let you know.

So I did:

Test5 (only v10 applied, no SSL):
    - rsync through proxy_connect: OK

Test6 (only v10 applied, no SSL, netem):
    - rsync through proxy_connect: OK

Test7 (v10 and v2 applied, no SSL):
    - rsync through proxy_connect: OK

Test8 (v10 and v2 applied, no SSL, netem):
    - rsync through proxy_connect: OK

So it definitely seems to be necessary that the frontend connection uses SSL for the stalls to be triggered.
Comment 44 Carsten Wolff 2018-02-22 09:17:04 UTC
Created attachment 35737 [details]
fixes a busy loop trying to shutdown a disconnected socket

(In reply to Carsten Wolff from comment #43)
> So it definitely seems to be necessary that the frontend connection uses SSL
> for the stalls to be triggered.

I need to retract that statement. With "v10 and v2 applied + configuration from comment#24", another problem arises, when many concurrent connections are made through proxy_connect: Now and then a process (prefork mpm) will stay in a busy loop, doing this over and over again:

shutdown(30, 1 /* send */)              = -1 ENOTCONN (Transport endpoint is not connected)
epoll_ctl(31, EPOLL_CTL_DEL, 30, {0, {u32=0, u64=0}}) = 0
epoll_wait(31, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=359204392, u64=140724962657832}}}, 2, 60000) = 1
epoll_ctl(31, EPOLL_CTL_DEL, 30, {0, {u32=0, u64=0}}) = -1 ENOENT (No such file or directory)
epoll_ctl(31, EPOLL_CTL_ADD, 30, {EPOLLOUT, {u32=359204480, u64=140724962657920}}) = 0
shutdown(29, 1 /* send */)              = -1 ENOTCONN (Transport endpoint is not connected)
epoll_ctl(31, EPOLL_CTL_DEL, 29, {0, {u32=0, u64=0}}) = 0
epoll_wait(31, {{EPOLLOUT|EPOLLHUP, {u32=359204480, u64=140724962657920}}}, 2, 60000) = 1
epoll_ctl(31, EPOLL_CTL_DEL, 29, {0, {u32=0, u64=0}}) = -1 ENOENT (No such file or directory)
epoll_ctl(31, EPOLL_CTL_ADD, 29, {EPOLLOUT, {u32=359204392, u64=140724962657832}}) = 0
shutdown(30, 1 /* send */)

The attached patch avoids the busy loop by setting "done", when apr_socket_shutdown returns with an error.

Further more, when testing this patch by doing many concurrent rsync transfers, I noticed that some rsync clients still stall at some point (w/ and w/o SSL!). When these clients die, that's when the busy loop occurs.

The test is very simple:
# cat reproduce-endlesss-poll-loop.sh 
#!/bin/bash
mkdir /testdst
cd /testdst
for i in {1..100}; do
        dd if=/dev/urandom of=$i bs=10M count=1 >/dev/null &
done
wait

for i in {1..100}; do
        # w/ stunnel
        #RSYNC_PROXY=localhost:874 rsync -rcP rsync://localhost:873/testsrc/$i /testdst/$i &
        # no-ssl
        RSYNC_PROXY=192.168.122.11:80 rsync -rcP rsync://localhost:873/testsrc/$i /testdst/$i &
done
wait
Comment 45 Yann Ylavic 2018-02-23 17:34:34 UTC
Created attachment 35738 [details]
Handle POLLOUT in proxy_connect (v11)

An upadte for the patch to better/fully handle half-closed connections and improve logging (trace8).

Leaving on shutdown failure is not necessary anymore, I can't reproduce stalls or busy loops (anymore) with this patch and your rsync scenario.

By running the rsync loop indefinitely, it happens once in a while that rsync and rsyncd stop communicating, but the proxy doesn't look at fault here, it poll()s appropriately for reads on both sides but none seems to want to send its remaining data.

I took a pcap and httpd log of the issue (will attach in next message), and the proxy can't do better here.
Comment 46 Yann Ylavic 2018-02-23 17:46:03 UTC
Created attachment 35739 [details]
rsync timeout: pcap and httpd log (with v11)

I used three different (linux) boxes for rsync, the proxy and rsyncd, traces were taken on the proxy.

We can see in the log file that the proxy is poll()ing appropriately on both sides before Timeout (set to 10).

The pcap shows rsynd sending data at the (timeout/premature) end of connection which is an error message (on the socket???).
Something's fishy there...
Comment 47 Carsten Wolff 2018-03-04 01:35:31 UTC
(In reply to Yann Ylavic from comment #46)
> The pcap shows rsynd sending data at the (timeout/premature) end of
> connection which is an error message (on the socket???).
> Something's fishy there...

Nothing fishy with that, AFAIK. rsync tries very hard to get error messages through to the user (behind the client). From a TCP perspective there's nothing wrong with sending data after receiving a FIN. The client may have "SHUT_WR" the socket and would still be able to receive that message and present it to the user. Of course the proxy has closed the socket at that point and thus sends an RST. I think this part is actually fine.

(In reply to Yann Ylavic from comment #45)
> By running the rsync loop indefinitely, it happens once in a while that
> rsync and rsyncd stop communicating, but the proxy doesn't look at fault
> here, it poll()s appropriately for reads on both sides but none seems to
> want to send its remaining data.

I can reproduce this and I see something in my capture, which is hard to see in yours, because you use all zeroes for the files contents (I use /dev/urandom).

First of all, if I try with a much longer timeout than 10 seconds, then, after minutes, the transfer succeeds without error!

During the "stall" I see :
- in netstat: all involved socket buffers are empty
- in trace8 log: proxy_connect is polling for POLLIN on both sockets
Conclusion: All data sent by rsync server and client has been read by the proxy at that point.

What I can see from my pcap:
- The last data bytes the rsync server sends _before_ the stall are in packet 6717.
- A tail chunk of these bytes is _not_ being sent to the client before the stall.
- After minutes, the rsync client sends an empty message (\x00\x00\x00\x07 == len 0 MSG_DATA), which the proxy sends to the server, which causes the server to respond with the same empty message (maybe some sort of keepalive in rsync), which causes POLLIN on the proxy backend. Only after that, the proxy sends a data segment (packet 8252) to the client containing the remainder of packet 6717 plus the new message.

Conclusion: Because all socket buffers are emtpy during the stall and proxy_connect polls for POLLIN, the delayed data must sit in the apache process during that time. Maybe under some circumstance c_o->data_in_output_filters being false is a lie (or simply not enough) and the data has been set aside somewhere without proxy_connect noticing?

You can find the PCAP here: https://drive.google.com/file/d/1Rf4fI1MBIVZ8AF0XI49xDiPrryHgKaZz/view?usp=sharing
Comment 48 Mohamed El Housseine 2018-03-15 12:30:15 UTC
After driving many tests with the data provided by Carsten, I was able to figure out, where the unsent-but-read data sits during the stall.
For simplification I will reduce the proxy flow to the following steps:
 
1- proxy_connect_handler calls ap_proxy_transfer_between_connections after every succesfull POLLIN.
 
The flow of ap_proxy_transfer_between_connections consists of:
 
2- call ap_get_brigade to read the incoming data from the backend connection into bb_i
3- call ap_proxy_buckets_lifetime_transform to copy bb_i into bb_o
4- call ap_pass_brigade to write the data of bb_o to the client connection
5- repeat the steps 2:4 as long as step 2 and step 4 return APR_SUCCESS
 
During this flow following situtations may occur:
 
#a: the ap_core_input_filter is called from ap_get_brigade in step2. It takes as last parameter the readbytes (the max amount of data to read) and it tries to read the requested amount of data from the socket by calling the apr function socket_bucket_read through apr_bucket_read in a loop (http://svn.apache.org/viewvc/httpd/httpd/branches/2.4.x/server/core_filters.c?annotate=1734391#l235 and http://svn.apache.org/viewvc/httpd/httpd/branches/2.4.x/server/core_filters.c?annotate=1734391#l284).
 
#b: socket_bucket_read allocates a new bucket and _overwrites_ the requested len with APR_BUCKET_BUFF_SIZE, before passing it to apr_socket_recv.
 
#c: Because of this, it is possible that socket_bucket_read reads more data from the socket into the bucket than ap_core_input_filter was instructed to get.
 
#d: In this case the read data will be split into several buckets by ap_core_input_filter. Tthe first one will be returned through bb_i and the rest will be kept in in_ctx->b. (http://svn.apache.org/viewvc/httpd/httpd/branches/2.4.x/server/core_filters.c?annotate=1734391#l305 ff.)
 
#e: the ap_pass_brigade in step 4 then tries to write the data from bb_i to the target connection. If that write comes back incomplete, the flag data_in_output_filters will be set to true, in which case the loop in step 5 ends.
 
#f: if this all happens at the very end of the data transfer, and the socket doesn't receive any more data to cause a POLLIN, step 1 will never happen again and the data in in_ctx->b will not get sent before rsync decides to send an (application-level) keepalive (after minutes).
Comment 49 Yann Ylavic 2018-03-15 16:26:52 UTC
Thanks Mohamed for the nice analysis, yes the issue is that there may still be data in the input filter chain (which we didn't read yet) when the output filter chain becomes full and we abort the transfer (APR_INCOMPLETE).

I figured out this already and was working on a new patch (got interrupted by $day job...). It does not work for now but I will link it here when it passes my testing.

The idea is to "drain" the input chain by not clearing POLLOUT (on the other side) until done.
Comment 50 Carsten Wolff 2018-03-15 17:05:45 UTC
(In reply to Yann Ylavic from comment #49)
> The idea is to "drain" the input chain by not clearing POLLOUT (on the other
> side) until done.

How would that work? After POLLOUT we only call ap_core_output_filter, which does not access f->ctx->in_ctx, where the data hides. And we can't extract the data by accessing ctx from mod_proxy_connect, because the ctx are private structures. I don't see an elegant solution without modifying the core_filter.
Comment 51 Yann Ylavic 2018-03-15 17:13:39 UTC
Yes sorry, I made a shortcut. POLLOUT + a "drain" flag to indicate that the POLLIN code should be entered in this case too...
Comment 52 Carsten Wolff 2018-03-15 21:42:53 UTC
Created attachment 35772 [details]
try draining input buffers

this patch goes on top of https://bz.apache.org/bugzilla/attachment.cgi?id=35738 and https://bz.apache.org/bugzilla/attachment.cgi?id=35620&action=edit

It's late and there needs to be more testing, but I wanted to let you know that I seem to have good success with this patch.
Comment 53 Carsten Wolff 2018-03-15 22:06:28 UTC
BTW,

(In reply to Mohamed El Housseine from comment #48)
> #b: socket_bucket_read allocates a new bucket and _overwrites_ the requested
> len with APR_BUCKET_BUFF_SIZE, before passing it to apr_socket_recv.

shouldn't this be considered a bug in APR? I would understand if the function would just clamp to min(len, APR_BUCKET_BUFF_SIZE) and might read less than requested. But reading more than the caller requested while returning only as much as was requested and not telling the caller about that fact is very unintuitive, especially if one is used to the behavior of POSIX read().
Comment 54 Mohamed El Housseine 2018-03-16 09:23:11 UTC
Created attachment 35774 [details]
patch for fixing the problem of remaining data in input ctx

Hi yan,
Hi Carsten,

i could fix that problem with the attached patch. There was no need for big changes. after succesfull sending of all data in output_filters I check if there are remaining data in input_filters and resend them.
Comment 55 Mohamed El Housseine 2018-03-16 13:51:23 UTC
Created attachment 35776 [details]
patch for fixing the problem of remaining data in input ctx v2

Hi Yann again,

I made some improvement to keep the changes as minimal as possible.
Comment 56 Yann Ylavic 2018-03-18 19:39:10 UTC
Created attachment 35783 [details]
Handle POLLOUT in proxy_connect (v12)

This v12 implements the "drain" flag I mentioned earlier.

It's less intrusive change than attachment 35776 [details] (or attachment 35772 [details]) with regard tp core filters (self contained in mod_proxy code).

It also keeps the ap_proxy_transfer_between_connections() logic in one place, to avoid duplicating code (and error handling, which seems to be missing in attachment 35776 [details]).

Probably more importantly, it passes my testing :)
Does it work for you?
Comment 57 Yann Ylavic 2018-03-18 19:40:20 UTC
Created attachment 35784 [details]
Incremental patch from v11 to v12
Comment 58 Mohamed El Housseine 2018-03-19 10:41:49 UTC
Hi Yann,
thank you for your feedback!

> It also keeps the ap_proxy_transfer_between_connections() logic in one
> place, to avoid duplicating code (and error handling, which seems to be
> missing in attachment 35776 [details]).

The error handling is not missing, I analyzed all possible returns of ap_proxy_transfer_between_connections, it can return:
APR_SUCCESS
APR_STATUS_IS_INCOMPLETE
APR_STATUS_IS_EOF
ERROR

In its second call (for handling remaining data in input filters) it can return:

APR_SUCCESS
APR_STATUS_IS_INCOMPLETE
ERROR

APR_STATUS_IS_EOF can't be returned, because we are handling remaining (existing) data.

Also if the second call returns something different than APR_SUCCESS or APR_STATUS_IS_INCOMPLETE it must be handled as error.

For ensuring this, extending the if-condition for handling data_in_output_filters with "&& !APR_STATUS_IS_EAGAIN(rv)" must be enought.


maybe I missed another case?
Comment 59 Mohamed El Housseine 2018-03-19 11:10:45 UTC
(In reply to Mohamed El Housseine from comment #58)

sorry
> For ensuring this, extending the if-condition for handling
> data_in_output_filters with "&& !APR_STATUS_IS_EAGAIN(rv)" must be enought.

I mean with "&& !APR_STATUS_IS_INCOMPLETE(rv)"
Comment 60 Yann Ylavic 2018-03-19 12:05:59 UTC
Hi Mohamed,

(In reply to Mohamed El Housseine from comment #58)
> 
> In its second call (for handling remaining data in input filters) it can
> return:
> 
> APR_SUCCESS
> APR_STATUS_IS_INCOMPLETE
> ERROR
> 
> APR_STATUS_IS_EOF can't be returned, because we are handling remaining
> (existing) data.
The issue is that ap_proxy_transfer_between_connections() is a loop, and once it has consumed pending data it also consumes the socket (with all of its possible error codes) until the other side is full.
So I think we need the same logic than in the POLLIN case...

Also, the httpd code in trunk now handles pending data very differently (no c->data_in_{out,in}put_filters anymore, the whole filter chain is involved), while our patches until now are 2.4.x (and even 2.4.29) specific.
For anything to be accepted/upstreamed in the httpd project, it must first go into trunk, that's the next step.

When writing these patches I had that in mind because I'd like to share as much code as possible between mod_proxy's__connect, _wstunnel and even _http (which I'd like to see handling websocket traffic too). So the proxy_connect loop is (expectedly) meant to be a shared helper (along with common things in struct proxy_connect_conn).

So the goal (for me) is a wider refactorization which I wanted to make work in 2.4.x first (outside of new trunk "concepts") such that I have both something to compare and possibly a backport plan (without the full monty ;)

Thanks anyway for your analyses and proposals that help a lot!
Comment 61 Carsten Wolff 2018-03-19 13:22:09 UTC
Hi Yann,

(In reply to Yann Ylavic from comment #56)
> Created attachment 35783 [details]
> Handle POLLOUT in proxy_connect (v12)
> 
> Does it work for you?

unfortunately not, it creates busy loops again, when I:
- start 100 transfers "simultaniously"
- quickly interrupt the rsync clients (SIGINT)

strace repeats:

epoll_wait(15, [{EPOLLOUT|EPOLLHUP, {u32=4292461128, u64=140737485849160}}], 2, 300000) = 1
epoll_ctl(15, EPOLL_CTL_ADD, 13, {EPOLLOUT, {u32=4292461032, u64=140737485849064}}) = 0
epoll_ctl(15, EPOLL_CTL_DEL, 14, 0x7fffffd9c0b0) = 0
shutdown(14, SHUT_WR)                   = -1 ENOTCONN (Transport endpoint is not connected)

Just like in comment#44, (patch v10). Maybe, apr_socket_shutdown != APR_SUCCESS should be handled afterall.
Comment 62 Yann Ylavic 2018-03-19 14:35:45 UTC
(In reply to Carsten Wolff from comment #61)
> 
> unfortunately not, it creates busy loops again, when I:
> - start 100 transfers "simultaniously"
> - quickly interrupt the rsync clients (SIGINT)

Hmm, I can't reproduce, can you please provide the error log when this happens?
Comment 63 Yann Ylavic 2018-03-19 14:36:30 UTC
(In reply to Yann Ylavic from comment #62)
> (In reply to Carsten Wolff from comment #61)
> > 
> > unfortunately not, it creates busy loops again, when I:
> > - start 100 transfers "simultaniously"
> > - quickly interrupt the rsync clients (SIGINT)
> 
> Hmm, I can't reproduce, can you please provide the error log when this
> happens?
LogLevel trace8 ;)
Comment 64 Carsten Wolff 2018-03-19 15:11:22 UTC
(In reply to Yann Ylavic from comment #62)
> Hmm, I can't reproduce, can you please provide the error log when this
> happens?

Will do. In the meantime, another observation: Because your patch v12 is constrained to the module code and thus cannot determine, if there still is data in the input filter, it just assumes whenever it cannot write, that there may still be data in the input filter of other. This means, that you call transfer_between_connections every time this happens, while our patch only does that, when there actually is data in the input filters. Since transfer_between_connections does not only drain data from it's input filter but also reads some more data again, we thought you may potentially buffer a lot more data in application memory and allow the sending endpoint to send a lot more, even though you cannot be sure that you will be able to write it again. A simple test with "watch free -m" seems to confirm that, during my test (100 transfers à 10MiB), the application mem with your patch goes up to about 560MiB on my machine, while ours just reaches about 350MiB.
Comment 65 Carsten Wolff 2018-03-19 15:25:04 UTC
Created attachment 35786 [details]
log of a single process entering a busy loop

(In reply to Yann Ylavic from comment #63)
> (In reply to Yann Ylavic from comment #62)
> > (In reply to Carsten Wolff from comment #61)
> > > 
> > > unfortunately not, it creates busy loops again, when I:
> > > - start 100 transfers "simultaniously"
> > > - quickly interrupt the rsync clients (SIGINT)
> > 
> > Hmm, I can't reproduce, can you please provide the error log when this
> > happens?
> LogLevel trace8 ;)

Here's the log.
Comment 66 Mohamed El Housseine 2018-03-19 17:27:24 UTC
(In reply to Yann Ylavic from comment #60)
Hi Yann again,
thank you very much for for your detailed reply.
 
> The issue is that ap_proxy_transfer_between_connections() is a loop, and
> once it has consumed pending data it also consumes the socket (with all of
> its possible error codes) until the other side is full.
> So I think we need the same logic than in the POLLIN case...

I see, an improved will be uploaded, it handles the remaining case: APR_STATUS_IS_EOF.
 
> Also, the httpd code in trunk now handles pending data very differently (no
> c->data_in_{out,in}put_filters anymore, the whole filter chain is involved),
> while our patches until now are 2.4.x (and even 2.4.29) specific.
> For anything to be accepted/upstreamed in the httpd project, it must first
> go into trunk, that's the next step.

As long this patch goes on top of proxy_connect (v11) and v11 is still not merged into trunk, unfortunately, there is no way for my patch to into trunk :-(
Comment 67 Mohamed El Housseine 2018-03-19 17:29:36 UTC
Created attachment 35787 [details]
patch for fixing the problem of remaining data in input ctx v3

as already mentioned patch for remaining data in input filters v3
Comment 68 Yann Ylavic 2018-03-19 17:43:54 UTC
You don't seem to be using attachment 35620 [details] here (hence FlushMinThreshold).

If this is the case, you at least need to:
  #define THRESHOLD_MIN_WRITE 1
in "server/core_filters.c".

Otherwise, the core output filter is buffering up to 4K which is not suitable for mod_proxy_connect.
Comment 69 Yann Ylavic 2018-03-19 17:47:45 UTC
(In reply to Yann Ylavic from comment #68)
> You don't seem to be using attachment 35620 [details] here (hence
> FlushMinThreshold).
This relates to comment 65, not Mohamed's patch.
Comment 70 Yann Ylavic 2018-03-20 14:05:00 UTC
Created attachment 35789 [details]
Handle POLLOUT in proxy_connect (v13)

Previous patch (v12) was bogus, indeed.

This new one should work regardless of THRESHOLD_MIN_WRITE (or FlushMinThreshold from attachment 35620 [details]), i.e. it's really self contained.

The does use or play with "data_in_input_filters" because the core filter can't be responsible for clearing it (other filters might retain data too, suppose it's used by third-party modules...).

So I kept my "drain" flag, set according to ap_proxy_transfer_between_connections() since a successful return from there implies that there is no pending data. It allows to reduce pressure on buffering as you noted, but mainly to avoid blocking on the write side of ap_proxy_transfer_between_connections() by progressively accumulating data there, should it not "follow" the read side (which defeats the purpose of checking data_in_output_filters anyway).

I tries to keep the best of our respective proposals (no buffering nor code duplication...), if it works we should find an agreement right? :)
Comment 71 Carsten Wolff 2018-05-21 20:35:06 UTC
(In reply to Yann Ylavic from comment #70)
> I tries to keep the best of our respective proposals (no buffering nor code
> duplication...), if it works we should find an agreement right? :)

Hi Yann,

I apologize for the long silence. We have this patch running together with https://bz.apache.org/bugzilla/attachment.cgi?id=35620 for a while now and it seems to work well. Occasional "broken pipe" messages from ap_pass_brigade are probably caused by real connection losses. Thank you for your persistence in this matter!
Comment 72 Tom Drake 2019-02-05 01:16:11 UTC
I believe that I am encountering this same issue. 

I had reported my issue on server fault here:
https://serverfault.com/questions/949259/apache-buffers-tomcat-responses-and-closes-connection-to-tomcat-when-request-si

The way I have prevented the connection failure from happening is to eliminate the bi-directional communication; my proxied web-service consumes all of the input and processes all of it before writing content back out. While this prevents the error from occurring when there is too much data, it adds an undesirable latency to the web-service.

I'm interested in finding out if and when the current patch will be released?
Comment 73 Carsten Wolff 2019-02-06 08:54:00 UTC
Hi Tom,

(In reply to Tom Drake from comment #72)
> I believe that I am encountering this same issue. 

that is unlikely. You're using mod_proxy_ajp, which does not share much with mod_proxy_connect. Also, looking at mod_proxy_ajp.c, the behavior you're seeing seems to be intentional (it only flushes data to the client explicitely, if the backend has no more data to read, i.e. if a poll times out after 10ms [see also bug#37100]).
Comment 74 Tom Drake 2019-08-02 22:31:40 UTC
Carsten, thanks for your reply.
Do you have any suggestions for me as to how to get around the issue I'm experiencing?

Thanks.