Bug 65886 - Regression: Reverse proxy with websocket forcing 60 second timeout regardless of config
Summary: Regression: Reverse proxy with websocket forcing 60 second timeout regardless...
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy (show other bugs)
Version: 2.4.51
Hardware: PC Mac OS X 10.1
: P3 major (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: FixedInTrunk, PatchAvailable
Depends on:
Blocks:
 
Reported: 2022-02-15 23:16 UTC by Archie Cobbs
Modified: 2022-03-07 15:25 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Archie Cobbs 2022-02-15 23:16:44 UTC
We have a reverse proxy from Apache to a Tomcat instance. Apache is configured like this:

    # Proxy tomcat
    SetEnv              proxy-initial-not-pooled
    ProxyRequests       Off
    ProxyPreserveHost   On

    # Set timeout higher than Vaadin's "heartbeatInterval"
    ProxyTimeout        300

    # Proxy "/pcom" through to tomcat (Vaadin application)
    ProxyPass                       /pcom/              http://127.0.0.1:8080/pcom/ retry=0
    ProxyPassReverse                /pcom/              http://127.0.0.1:8080/pcom/
    #ProxyPassReverseCookieDomain    127.0.0.1:8080     SERVERNAME
    #ProxyPassReverseCookiePath      /pcom               /pcom

    # Proxy "/VAADIN" through to tomcat (Vaadin application)
    ProxyPass                       /VAADIN/            http://127.0.0.1:8080/VAADIN/ retry=0
    ProxyPassReverse                /VAADIN/            http://127.0.0.1:8080/VAADIN/

    # Proxy "/static" through to tomcat
    ProxyPass                       /static/            http://127.0.0.1:8080/static/ retry=0
    ProxyPassReverse                /static/            http://127.0.0.1:8080/static/

    # Proxy "/api" through to tomcat (REST API)
    ProxyPass                       /api/               http://127.0.0.1:8080/ws/api/ retry=0
    ProxyPassReverse                /api/               http://127.0.0.1:8080/ws/api/

    # Make websocket requests work properly for the above mappings as well
    RewriteEngine       On

    RewriteCond         %{HTTP:Upgrade} websocket [NC]
    RewriteCond         %{HTTP:Connection} upgrade [NC]
    RewriteRule         ^/pcom/ "ws://127.0.0.1:8080%{REQUEST_URI}" [P]

    RewriteCond         %{HTTP:Upgrade} websocket [NC]
    RewriteCond         %{HTTP:Connection} upgrade [NC]
    RewriteRule         ^/VAADIN/ "ws://127.0.0.1:8080%{REQUEST_URI}" [P]

    RewriteCond         %{HTTP:Upgrade} websocket [NC]
    RewriteCond         %{HTTP:Connection} upgrade [NC]
    RewriteRule         ^/static/ "ws://127.0.0.1:8080%{REQUEST_URI}" [P]

    RewriteCond         %{HTTP:Upgrade} websocket [NC]
    RewriteCond         %{HTTP:Connection} upgrade [NC]
    RewriteRule         ^/api/ "ws://127.0.0.1:8080%{REQUEST_URI}" [P]


Note that ProxyTimeout is set to 5 minutes because the client web application only polls for new data every ~3 minutes.

This has worked for a long time and most recently worked fine with Apache version 2.4.43. After upgrading to 2.4.51, we started getting disconnects in the web application. So something was broken in a version in the range (2.4.43, 2.4.51].

After turning up the logging level, it became apparent that the proxy timeout being used in 2.4.51 was actually 60 seconds instead of 5 minutes, even though our config has not changed.

You can see it happen here. Note the line "proxy: websocket: tunnel running (timeout 60.000000)" at time 16:28:11.114445, then the timeout occurring 60 seconds after "AH10213: proxy: websocket: polling timed out (client=31, origin=31)":

    [Tue Feb 15 16:28:11.111553 2022] [proxy_wstunnel:trace5] [pid 8825] mod_proxy_wstunnel.c(340): [client 127.0.0.1:47232] handler fallback
    [Tue Feb 15 16:28:11.111555 2022] [proxy_http:trace1] [pid 8825] mod_proxy_http.c(1875): [client 127.0.0.1:47232] HTTP: serving URL ws://127.0.0.1:8080/pcom/PUSH?v-uiId=4&v-pushId=7f29d183-3c6a-4c8c-b672-001261fc162f&X-Atmosphere-tracking-id=0&X-Atmosphere-Framework=2.2.13.vaadin5-javascript&X-Atmosphere-Transport=websocket&X-Atmosphere-TrackMessageSize=true&Content-Type=application/json;%20charset=UTF-8&X-atmo-protocol=true
    [Tue Feb 15 16:28:11.111561 2022] [proxy:debug] [pid 8825] proxy_util.c(2588): [client 127.0.0.1:47232] AH00944: connecting ws://127.0.0.1:8080/pcom/PUSH?v-uiId=4&v-pushId=7f29d183-3c6a-4c8c-b672-001261fc162f&X-Atmosphere-tracking-id=0&X-Atmosphere-Framework=2.2.13.vaadin5-javascript&X-Atmosphere-Transport=websocket&X-Atmosphere-TrackMessageSize=true&Content-Type=application/json;%20charset=UTF-8&X-atmo-protocol=true to 127.0.0.1:8080
    [Tue Feb 15 16:28:11.111580 2022] [proxy:debug] [pid 8825] proxy_util.c(2811): [client 127.0.0.1:47232] AH00947: connected /pcom/PUSH?v-uiId=4&v-pushId=7f29d183-3c6a-4c8c-b672-001261fc162f&X-Atmosphere-tracking-id=0&X-Atmosphere-Framework=2.2.13.vaadin5-javascript&X-Atmosphere-Transport=websocket&X-Atmosphere-TrackMessageSize=true&Content-Type=application/json;%20charset=UTF-8&X-atmo-protocol=true to 127.0.0.1:8080
    [Tue Feb 15 16:28:11.114299 2022] [proxy_http:trace3] [pid 8825] mod_proxy_http.c(1198): [client 127.0.0.1:47232] Status from backend: 101
    [Tue Feb 15 16:28:11.114321 2022] [proxy_http:trace4] [pid 8825] mod_proxy_http.c(820): [client 127.0.0.1:47232] Headers received from backend:
    [Tue Feb 15 16:28:11.114325 2022] [proxy_http:trace4] [pid 8825] mod_proxy_http.c(842): [client 127.0.0.1:47232] Upgrade: websocket
    [Tue Feb 15 16:28:11.114331 2022] [proxy_http:trace4] [pid 8825] mod_proxy_http.c(842): [client 127.0.0.1:47232] Connection: upgrade
    [Tue Feb 15 16:28:11.114335 2022] [proxy_http:trace4] [pid 8825] mod_proxy_http.c(842): [client 127.0.0.1:47232] Sec-WebSocket-Accept: BBZpglr2TI0qkORtYXDeNDu3PN4=
    [Tue Feb 15 16:28:11.114338 2022] [proxy_http:trace4] [pid 8825] mod_proxy_http.c(842): [client 127.0.0.1:47232] Sec-WebSocket-Extensions: permessage-deflate
    [Tue Feb 15 16:28:11.114341 2022] [proxy_http:trace4] [pid 8825] mod_proxy_http.c(842): [client 127.0.0.1:47232] Date: Tue, 15 Feb 2022 22:28:10 GMT
    [Tue Feb 15 16:28:11.114352 2022] [proxy_http:trace2] [pid 8825] mod_proxy_http.c(1376): [client 127.0.0.1:47232] HTTP: received interim 101 response (policy: n/a)
    [Tue Feb 15 16:28:11.114367 2022] [ssl:trace6] [pid 8825] ssl_engine_io.c(856): [client 127.0.0.1:47232] ssl_filter_write: 193 bytes
    [Tue Feb 15 16:28:11.114392 2022] [ssl:trace6] [pid 8825] ssl_engine_io.c(215): [client 127.0.0.1:47232] bio_filter_out_write: 215 bytes
    [Tue Feb 15 16:28:11.114403 2022] [ssl:trace4] [pid 8825] ssl_engine_io.c(2370): [client 127.0.0.1:47232] OpenSSL: write 215/215 bytes to BIO#56310e553f60 [mem: 56310e56aaa3] 
    [Tue Feb 15 16:28:11.114424 2022] [core:trace6] [pid 8825] core_filters.c(812): [client 127.0.0.1:47232] writev_nonblocking: 215/215
    [Tue Feb 15 16:28:11.114433 2022] [proxy_http:debug] [pid 8825] mod_proxy_http.c(1480): [client 127.0.0.1:47232] AH10239: HTTP: tunneling protocol websocket
    [Tue Feb 15 16:28:11.114445 2022] [proxy:trace1] [pid 8825] proxy_util.c(4874): [client 127.0.0.1:47232] AH10212: proxy: websocket: tunnel running (timeout 60.000000)
    [Tue Feb 15 16:28:11.114451 2022] [proxy:trace5] [pid 8825] proxy_util.c(4980): [client 127.0.0.1:47232] proxy: websocket: client resume writable
    [Tue Feb 15 16:28:11.114458 2022] [proxy:trace2] [pid 8825] proxy_util.c(4653): [client 127.0.0.1:47232] ap_proxy_transfer_between_connections complete (from 127.0.0.1:8080)
    [Tue Feb 15 16:28:11.114465 2022] [proxy:trace5] [pid 8825] proxy_util.c(4980): [client 127.0.0.1:47232] proxy: websocket: origin resume writable
    [Tue Feb 15 16:28:11.114474 2022] [ssl:trace4] [pid 8825] ssl_engine_io.c(2379): [client 127.0.0.1:47232] OpenSSL: I/O error, 5 bytes expected to read on BIO#56310e557030 [mem: 56310e570b13]
    [Tue Feb 15 16:28:11.114478 2022] [proxy:trace2] [pid 8825] proxy_util.c(4653): (11)Resource temporarily unavailable: [client 127.0.0.1:47232] ap_proxy_transfer_between_connections complete (to 127.0.0.1:8080)
    [Tue Feb 15 16:28:11.115073 2022] [ssl:trace6] [pid 8825] ssl_engine_io.c(856): [client 127.0.0.1:47232] ssl_filter_write: 44 bytes
    [Tue Feb 15 16:28:11.115090 2022] [ssl:trace6] [pid 8825] ssl_engine_io.c(215): [client 127.0.0.1:47232] bio_filter_out_write: 66 bytes
    [Tue Feb 15 16:28:11.115093 2022] [ssl:trace4] [pid 8825] ssl_engine_io.c(2370): [client 127.0.0.1:47232] OpenSSL: write 66/66 bytes to BIO#56310e553f60 [mem: 56310e56aaa3] 
    [Tue Feb 15 16:28:11.115110 2022] [core:trace6] [pid 8825] core_filters.c(812): [client 127.0.0.1:47232] writev_nonblocking: 66/66
    [Tue Feb 15 16:28:11.115117 2022] [proxy:trace2] [pid 8825] proxy_util.c(4653): [client 127.0.0.1:47232] ap_proxy_transfer_between_connections complete (from 127.0.0.1:8080)
    [Tue Feb 15 16:29:11.124429 2022] [proxy:trace2] [pid 8825] proxy_util.c(4896): [client 127.0.0.1:47232] AH10213: proxy: websocket: polling timed out (client=31, origin=31)
    [Tue Feb 15 16:29:11.124641 2022] [ssl:trace6] [pid 8825] ssl_engine_io.c(215): [client 127.0.0.1:47232] bio_filter_out_write: 24 bytes
    [Tue Feb 15 16:29:11.124647 2022] [ssl:trace4] [pid 8825] ssl_engine_io.c(2370): [client 127.0.0.1:47232] OpenSSL: write 24/24 bytes to BIO#56310e553f60 [mem: 56310e56aaa3] 
    [Tue Feb 15 16:29:11.124650 2022] [ssl:trace6] [pid 8825] ssl_engine_io.c(156): [client 127.0.0.1:47232] bio_filter_out_write: flush
    [Tue Feb 15 16:29:11.124671 2022] [core:trace6] [pid 8825] core_filters.c(812): [client 127.0.0.1:47232] writev_nonblocking: 24/24
    [Tue Feb 15 16:29:11.124675 2022] [ssl:trace3] [pid 8825] ssl_engine_kernel.c(2223): [client 127.0.0.1:47232] OpenSSL: Write: SSL negotiation finished successfully
    [Tue Feb 15 16:29:11.124677 2022] [ssl:trace6] [pid 8825] ssl_engine_io.c(156): [client 127.0.0.1:47232] bio_filter_out_write: flush


The web client displays the error "Disconnected from server" exactly 60 seconds after connecting.

Where is this 60 seconds value coming from??

We tried adding "timeout=300" to the ProxyPass lines (even though this timeout is supposed to alread default to ProxyTimeout) but that didn't help.

So we don't even have a workaround for this bug yet, other than downgrading.
Comment 1 Yann Ylavic 2022-02-16 00:21:12 UTC
A possibly workaround is "ProxyWebsocketFallbackToProxyHttp off" (https://httpd.apache.org/docs/2.4/mod/mod_proxy_wstunnel.html#proxywebsocketfallbacktoproxyhttp).

But since 2.4.52 websocket tunneling uses the smallest of Timeout and ProxyTimeout, you may want to try 2.4.52 (and then add "Timeout 300" to your configuration as you already tried in 2.4.51).
Comment 2 Archie Cobbs 2022-02-16 00:45:11 UTC
> But since 2.4.52 websocket tunneling uses the smallest of Timeout and
> ProxyTimeout, you may want to try 2.4.52 (and then add "Timeout 300"
> to your configuration as you already tried in 2.4.51).

Wha? Isn't it wrong to make a completely undocumented and backward incompatible change like that in a patch release?
Comment 3 Archie Cobbs 2022-02-16 01:02:51 UTC
FWIW, I can confirm that adding "Timeout 300" does work around the problem:

  AH10212: proxy: websocket: tunnel running (timeout 300.000000)

But this still seems like a bug to me.

Or at least a poor design decision.

Why should ProxyTimeout's DEFAULT value (Timeout) ever override its CONFIGURED value (ProxyTimeout) ? I definitely don't get it.
Comment 4 Ruediger Pluem 2022-02-16 07:53:16 UTC
I agree that the need to increase Timeout is not a good solution. Maybe we should change the logic to choose the maximum of both timeouts (front end socket, backend socket) instead of the minimum as backend timeouts can be configured more selectively (per worker if needed) as front end timeouts and typically the backend timeouts reflect the application requirements.
Comment 5 Ruediger Pluem 2022-02-16 07:59:43 UTC
Something like the following patch:

Index: modules/proxy/mod_proxy_http.c
===================================================================
--- modules/proxy/mod_proxy_http.c	(revision 1898054)
+++ modules/proxy/mod_proxy_http.c	(working copy)
@@ -1485,10 +1485,10 @@
                 return HTTP_INTERNAL_SERVER_ERROR;
             }
 
-            /* Set timeout to the lowest configured for client or backend */
+            /* Set timeout to the highest configured for client or backend */
             apr_socket_timeout_get(backend->sock, &backend_timeout);
             apr_socket_timeout_get(ap_get_conn_socket(c), &client_timeout);
-            if (backend_timeout >= 0 && backend_timeout < client_timeout) {
+            if (backend_timeout >= 0 && backend_timeout > client_timeout) {
                 tunnel->timeout = backend_timeout;
             }
             else {
Index: modules/proxy/proxy_util.c
===================================================================
--- modules/proxy/proxy_util.c	(revision 1898054)
+++ modules/proxy/proxy_util.c	(working copy)
@@ -4718,10 +4718,10 @@
     tunnel->origin->pfd->desc.s = ap_get_conn_socket(c_o);
     tunnel->origin->pfd->client_data = tunnel->origin;
 
-    /* Defaults to the smallest timeout of both connections */
+    /* Defaults to the biggest timeout of both connections */
     apr_socket_timeout_get(tunnel->client->pfd->desc.s, &timeout);
     apr_socket_timeout_get(tunnel->origin->pfd->desc.s, &tunnel->timeout);
-    if (timeout >= 0 && (tunnel->timeout < 0 || tunnel->timeout > timeout)) {
+    if (timeout >= 0 && (tunnel->timeout < 0 || tunnel->timeout < timeout)) {
         tunnel->timeout = timeout;
     }
Comment 6 Ruediger Pluem 2022-02-16 08:00:56 UTC
Another option instead of increasing the Timeout is

ProxyWebsocketFallbackToProxyHttp off

but in this case no timeout is applied to the websocket connection at all and the proxy waits until one side closes the connection.
Comment 7 Yann Ylavic 2022-02-16 09:28:36 UTC
(In reply to Ruediger Pluem from comment #5)
> Something like the following patch:

+1
Comment 8 Yann Ylavic 2022-02-16 10:35:19 UTC
(In reply to Archie Cobbs from comment #3)
> 
> Why should ProxyTimeout's DEFAULT value (Timeout) ever override its
> CONFIGURED value (ProxyTimeout) ? I definitely don't get it.

The rationale is/was that since Timeout is the one applied on the client connection side, and ProxyTimeout (or ProxyPass timeout= parameter) the one(s) on the backend side, we'd use the smallest when tunneling between the client and backend because if one side times out the tunnel should be terminated. But this was probably a poor choice because in most cases it actually obliges to change existing configuration to set both timeouts at the same/highest value. Poor choices happen..

As Rüdiger said, note that in 2.4.43 no timeout was enforced in mod_proxy for websocket tunneling (neither ProxyTimeout nor Timeout, so the timeout was actually infinite), your configuration worked regardless of ProxyTimeout probably thanks to the heartbeat. The above change was made precisely to honor a timeout while tunneling without an ad hoc mechanism.

Let's change this again for 2.4.53.
Comment 9 Ruediger Pluem 2022-02-16 12:35:50 UTC
(In reply to Yann Ylavic from comment #7)
> (In reply to Ruediger Pluem from comment #5)
> > Something like the following patch:
> 
> +1

r1898127
Comment 10 Yann Ylavic 2022-02-16 12:57:11 UTC
(In reply to Ruediger Pluem from comment #9)
> (In reply to Yann Ylavic from comment #7)
> > (In reply to Ruediger Pluem from comment #5)
> > > Something like the following patch:
> > 
> > +1
> 
> r1898127

Thanks! Since the mod_proxy_http hunk of your patch in comment 5 is 2.4 only, how about we simply remove the tunnel->timeout setting there in 2.4 (on backport) given that it's already what ap_proxy_tunnel_create() called just above will do?
Comment 11 Archie Cobbs 2022-02-16 15:41:57 UTC
Thanks guys, appreciate the fast response.

I have two updates and a new question related to this.


The updates are:

1 - Adding "ProxyWebsocketFallbackToProxyHttp off" does in fact work
around the problem.

2 - Previously I said that setting "Timeout 300" also works around the
problem, but that is not entirely accurate. What that does is simply
change how long it takes for the particular problem I'm seeing to occur,
from 60 seconds to 5 minutes. This is as you would probably expect.


Here's my question:

> As Rüdiger said, note that in 2.4.43 no timeout was enforced in
> mod_proxy for websocket tunneling (neither ProxyTimeout nor Timeout,
> so the timeout was actually infinite), your configuration worked
> regardless of ProxyTimeout probably thanks to the heartbeat. The
> above change was made precisely to honor a timeout while tunneling
> without an ad hoc mechanism.

OK this makes sense. Basically my web application is using websockets
and sending a heartbeat every ~3 minutes. This websocket connection
was previously being proxied by mod_proxy to Tomcat, and mod_proxy
was not applying any particular timeout to that connection, so it
would stay up indefinitely.

But here's my question: isn't a websocket connection SUPPOSED to stay
up indefinitely?

Or at least, shouldn't any "timeout" value that is applied to that
connection be an IDLE timeout rather than an ABSOLUTE timeout (measured
from when the connection was started)?

In other words, it seems to me like whatever mod_proxy was doing is
more correct, at least in the scenario of a proxied websocket connection,
than what mod_wstunnel is doing.

Can you explain to me what I'm missing? Thanks.
Comment 12 Ruediger Pluem 2022-02-16 15:53:36 UTC
(In reply to Archie Cobbs from comment #11)

> 
> Or at least, shouldn't any "timeout" value that is applied to that
> connection be an IDLE timeout rather than an ABSOLUTE timeout (measured
> from when the connection was started)?

It is an idle timeout. Connections with activity stay up.
Comment 13 Archie Cobbs 2022-02-16 18:58:36 UTC
> > Or at least, shouldn't any "timeout" value that is applied to that
> > connection be an IDLE timeout rather than an ABSOLUTE timeout (measured
> > from when the connection was started)?
> 
> It is an idle timeout. Connections with activity stay up.

Yes, you're right.

What Vaadin appears to be doing is creating more than one websocket connection,
and sending its heartbeat messages on only one of them.

Actually that does make sense - the purpose of the Vaadin heartbeat is to let the
server know that the client is still alive, not to juice the websocket connection(s).

Philosophically, timeouts are appropriate for a request/response world, but the
websockets world is a completely different beast. When doing websockets, it really
should be left up to the two endpoints to decide if/when the connection has "timed
out" and should be torn down; any intervening proxies should not have any timeout
(in general) because (in general) they don't have any special knowledge of what
the two endpoint applications are doing with their TCP connection.

So I'd say that there is another subtle problem here, which is that there really
should be two separate configuration variables for configuring (a) the proxy
timeout for regular connections and (b) the proxy timeout for websocket
connections.
Comment 14 Archie Cobbs 2022-02-23 02:40:01 UTC
> So I'd say that there is another subtle problem here, which is that there really
> should be two separate configuration variables for configuring (a) the proxy
> timeout for regular connections and (b) the proxy timeout for websocket
> connections.

Any thoughts on this question?

Right now the only way to have different timeout values for normal vs. websockets
is to say "ProxyWebsocketFallbackToProxyHttp off" which seems like the wrong tool.
Comment 15 Ruediger Pluem 2022-02-23 07:17:00 UTC
(In reply to Archie Cobbs from comment #14)
> > So I'd say that there is another subtle problem here, which is that there really
> > should be two separate configuration variables for configuring (a) the proxy
> > timeout for regular connections and (b) the proxy timeout for websocket
> > connections.
> 
> Any thoughts on this question?
> 
> Right now the only way to have different timeout values for normal vs.
> websockets
> is to say "ProxyWebsocketFallbackToProxyHttp off" which seems like the wrong
> tool.


You can configure an individual timeout per wss:// backend worker. Setting it to an arbitrary high number should give you what you need.
Comment 16 Ruediger Pluem 2022-02-23 08:01:53 UTC
Proposed for backport as https://github.com/apache/httpd/pull/295
Comment 17 Archie Cobbs 2022-02-24 15:19:35 UTC
> You can configure an individual timeout per wss:// backend worker.
> Setting it to an arbitrary high number should give you what you need.

Thanks - I didn't realize that.
Comment 18 Ruediger Pluem 2022-03-03 10:57:10 UTC
(In reply to Ruediger Pluem from comment #16)
> Proposed for backport as https://github.com/apache/httpd/pull/295

Backported as r1898559. Will be part of the next release.