Bug 65294

Summary: Reverse proxy regression for websockets application
Product: Apache httpd-2 Reporter: Eric Hameleers <alien>
Component: AllAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: regression    
Priority: P2    
Version: 2.4.47   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: ap_filter_should_yield() prevents flushing
ap_filter_should_yield() prevents flushing

Description Eric Hameleers 2021-05-07 20:04:30 UTC
The problem:
Web applications using websockets fail to refresh their content in the webbrowser when the application is hidden behind an Apache reverse proxy.
Since when does this happen:
This happens in Apache httpd-2.4.47. Everything was working fine with Apache httpd-2.4.46 and earlier.
Examples:
The two applications that I use which show this behavior: NoVNC (https://novnc.com/) and Otto (http://ottoaudiojukebox.com/)
Operating System:
Slackware Linux 14.2 (64-bit)

Detailed description follows. It focuses on NoVNC since I think more people use this than the Otto Jukebox.

Relevant httpd configuration:
    ProxyRequests Off
    ProxyVia on
    ProxyAddHeaders On
    ProxyPreserveHost On
    <Proxy *>
        Require all granted
    </Proxy>
    RewriteEngine On
    RewriteCond %{REQUEST_URI}  ^/remco                         [NC]
    RewriteCond %{QUERY_STRING} transport=polling               [NC]
    RewriteRule /(.*)           http://127.0.0.1:6080/$1        [P,L]
    RewriteCond %{HTTP:Connection}      Upgrade                 [NC]
    RewriteCond %{HTTP:Upgrade}         =websocket              [NC]
    RewriteRule /(.*)                   ws://127.0.0.1:6080/$1  [P,L]
    ProxyPassReverse /websockify ws://localhost:6080/websockify retry=2
    <Location /remco/>
        Require all granted
        ProxyPass http://localhost:6080/
        ProxyPassReverse http://localhost:6080/
    </Location>

Observed behavior:
Login to the backend server via reverse proxy and NoVNC server works fine.
The first mouse click triggers a screen refresh (for instance, new window brought into focus).
Subsequent mouse clicks or key presses *do* get through to the backend VNC server: by having a VNC client connected as well I can observe that all keypresses and mouse clicks are received and handled properly by the VNC server.
However, the NoVNC web client in the browser does not refresh the screen. This makes the application un-usable since interaction with the backend is impossible using the NoVNC browser client.
There is exactly one reproducible case where I can get some interactive behavior: when I click-drag the mouse across a window containing text, this text is highlighted, and in this case, the NoVNC client inside the browser *does* refresh the window content, i.e. it shows the highlight.
Click-dragging the mouse over surfaces *not* containing text, i.e. not triggering a text highlight, are not inducing the NoVNC client to redraw the window.

The un-modified configuration above works flawlessly with a httpd-2.4.46 reverse proxy but fails with httpd-2.4.47.

Slackware maintainer Patrick Volkerding supplied me with a custom httpd-2.4.47 package for testing, where all updates to proxy_wstunnel since 2.4.46 have been reverted; because we thought that the wstunnel module might be where the root cause lies, it has had a lot of code changes.
But this modified httpd-2.4.47 package shows exactly the same broken behavior as the unmodified 2.4.47 code. We have no idea where the root cause lies.

If you need more information, please let me know.
Comment 1 Eric Covener 2021-05-07 20:07:24 UTC
This is likely the regression that blocked the announce of 2.4.47.

Would you be able to test with this short patch:
http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/http/http_filters.c?r1=1889341&r2=1889340&pathrev=1889341
Comment 2 Eric Hameleers 2021-05-07 21:49:22 UTC
Hi Eric,
Thanks for the fast response. Unfortunately the patch did not change the regression behavior and the problem persists.

Looking at the release download page and the public mirrors, it seems that you formally consider 2.4.46 the current latest release, but making signed 2.4.47 source tarballs available in that same download location was reason for us to believe that 2.4.47 was a sanctioned release as well. Which is why Slackware upgraded to 2.4.47.
Perhaps you should retract the 2.4.47 source tarballs from your mirrors.

Regards, Eric
Comment 3 Christophe JAILLET 2021-05-08 10:54:17 UTC
(In reply to Eric Hameleers from comment #2)
> Thanks for the fast response. Unfortunately the patch did not change the
> regression behavior and the problem persists.

Well, another thing to look at before 2.4.48.

> Perhaps you should retract the 2.4.47 source tarballs from your mirrors.

Done now.


Thanks for the report.
Comment 4 Eric Covener 2021-05-08 14:13:23 UTC
This release also makes mod_proxy_wstunnel defer to mod_proxy_http for tunneling.  It looks like there is no way to opt-out to run a quick test.

If you can patch again, the feedback would help:

Index: modules/proxy/mod_proxy_wstunnel.c
===================================================================
--- modules/proxy/mod_proxy_wstunnel.c	(revision 1889680)
+++ modules/proxy/mod_proxy_wstunnel.c	(working copy)
@@ -23,11 +23,12 @@

 static int proxy_wstunnel_check_trans(request_rec *r, const char *url)
 {
+#if 0
     if (fallback_to_mod_proxy_http) {
         ap_log_rerror(APLOG_MARK, APLOG_TRACE5, 0, r, "check_trans fallback");
         return DECLINED;
     }
-
+#endif
     if (ap_cstr_casecmpn(url, "ws:", 3) != 0
             && ap_cstr_casecmpn(url, "wss:", 4) != 0) {


running both versions w/ loglevel trace8 would be helpful.
Comment 5 Eric Hameleers 2021-05-08 15:30:13 UTC
Unfortunately the new patch also does not make a difference in behavior, the NoVNC application remains broken.
This application runs in a subdirectory of an actively used internet-facing server so increasing the loglevel to trace8 would generate massive amounts of data some of which I am not willing to share.
Comment 6 Eric Covener 2021-05-08 16:04:37 UTC
I setup a simpler config with:

- download novnc
- launch with ./utils/launch.sh --vnc some-lan-system-with-vnc-up:5901

httpd.conf append:

ProxyPass /websockify ws://localhost:6080/websockify retry=2
ProxyPass /  http://127.0.0.1:6080/

and then accessed e.g.

http://localhost/vnc.html?host=localhost&port=80

And VNC'ed to to the system on my LAN but nothing stalled (highlighting text in terminal and firefox and opened/closed windows).  Maybe another

Stopping the proxy broke the connection and I could see the upgrade in the access log.  Maybe the above helps someone recreate and sees a stall.
Comment 7 Eric Hameleers 2021-05-09 13:31:38 UTC
I could not repeat this, I kept losing interactivity when running httpd-2.4.47.

I have uploaded trace8 logs of a couple of seconds clicking and typing in NoVNC browser client, for both httpd-2.4.46 (where everything works fine) and httpd-2.4.47 (where interactive screen refresh is unpredictable and needs to be triggered by mouse click-drag most of the times): https://orwell.alienbase.nl/files/

This is my current vhost configuration, I created a separate vhost just for testing the reverse-proxying to my novnc backend. The remainder of the configuration is for letsencypt so that I can access the page with https:

    Alias /remco /usr/local/noVNC-1.2.0
    <Location /remco/>
         AllowOverride All
        AuthBasicAuthoritative off
        AuthUserFile /xxxxxxxx
        AuthType Basic
        AuthName "Hello Server"
        require valid-user
    </Location>
    ProxyPass /websockify ws://127.0.0.1:6080/websockify retry=2
    ProxyPassReverse /websockify ws://127.0.0.1:6080/websockify retry=2
    ProxyPass /remco/ http://localhost:6080/
    ProxyPassReverse /remco/ http://localhost:6080/
Comment 8 Yann Ylavic 2021-05-10 09:48:53 UTC
(In reply to Eric Hameleers from comment #7)
> https://orwell.alienbase.nl/files/

This link is password protected.

Since (Real)VNC requires a license, I tried with TightVNC as a server, still novnc with firefox and chrome, but like Eric could not reproduce the issue.

There must be something special with your setup, we need the traces at this point..
Comment 9 Eric Hameleers 2021-05-10 19:53:49 UTC
Sorry about the password protection, I have removed that. Could you please try https://orwell.alienbase.nl/files/ again ?
Comment 10 Yann Ylavic 2021-05-11 07:53:31 UTC
Created attachment 37853 [details]
ap_filter_should_yield() prevents flushing

How does this patch work?
Comment 11 Yann Ylavic 2021-05-11 08:01:20 UTC
Created attachment 37854 [details]
ap_filter_should_yield() prevents flushing

Better version of the patch.
Comment 12 Eric Hameleers 2021-05-11 12:25:25 UTC
Hi Yann,

That patch fixes one issue for me, thanks! NoVNC browser client is again responsive and I can't find a regression now.

However the nodejs based OTTO Jukebox (http://ottojukebox.com/) is still suffering as before.
I have added two tracelogs for this application to https://orwell.alienbase.nl/files/ , one for running under httpd-2.4.46 and doing just fine (jukebox_httpd-2.4.46_trace8_20210511.log.gz), the other a log of a failing page under httpd-2.4.47 (jukebox_httpd-2.4.47_trace8_20210511.log.gz)

The expected result is that OTTO displays a page with the currently playing song and the songs in the queue as well as a jukebox interface allowing song selection, but under httpd-2.4.47 all I get is a circular graphic which is the indication that the page is initializing and the browser is waiting for information from the backend.

Cheers, Eric
Comment 13 Yann Ylavic 2021-05-11 16:17:56 UTC
It looks like the jukebox_httpd-2.4.47 traces are not from a 2.4.47, could it be an upgrade issue or some patches applied?
Comment 14 Eric Hameleers 2021-05-11 18:49:36 UTC
My apologies, you are correct. I guess I did too much fiddling with upgrades/downgrades and enabling/disabling tracelogs for my own good...

In any case, https://orwell.alienbase.nl/files/jukebox_httpd-2.4.47_trace8_20210511.log.gz is now a new http-2.4.47 based trace log (I checked the server identifier in the log before uploading).

Cheers, Eric
Comment 15 Yann Ylavic 2021-05-11 21:17:20 UTC
It seems that 2.4.46 and 2.4.47 don't send the same "/socket.io/1/websocket/..." request to the backend since the 2.4.47 one fails with the being closed immediately:
[Tue May 11 20:43:30.895647 2021] [proxy_http:error] [pid 16563:tid 140399326754560] (20014)Internal error (specific information not available): [client 192.168.0.3:54103] AH01102: error reading status line from remote server 192.168.0.11:8778

Could you please add:
    LoadModule dumpio_module modules/mod_dumpio.so
    DumpIOInput on
    DumpIOoutput on
to the configuration so that we can compare what's being sent?
Comment 16 Eric Hameleers 2021-05-13 09:38:15 UTC
Hi Yann,

https://orwell.alienbase.nl/files/jukebox_httpd-2.4.47_trace8_dumpio_20210513.log.gz is a log with your dumpio statements added to the configuration of the OTTO jukebox.
Comment 17 Yann Ylavic 2021-05-13 17:03:36 UTC
So you have a SetEnv[If] or a RewriteRule setting "force-proxy-request-1.0" somewhere, right? This is the only way (I can think of) to make mod_proxy send this "GET /socket.io/1/websocket/UVTDmeB8ZH0F6kQpcaGG HTTP/1.0" request.

However protocol/WebSocket Upgrade is not compatible with HTTP/1.0, from https://datatracker.ietf.org/doc/html/rfc7230#section-6.7:
  A server MUST ignore an Upgrade header field that is received in an HTTP/1.0 request.
So mod_proxy_http won't send an "Upgrade: websocket" header with an HTTP/1.0 request, and the backend server without this header seems to simply close the connection immediately (with no response).

In 2.4.46 and earlier, where websocket tunneling was handled by mod_proxy_wstunnel, there was some minimal HTTP protocol verification ony. Now (from 2.4.47) that all the upgrade forwarding mechanisms are handled by mod_proxy_http, there is more conformance with the HTTP protocol like this one, and notably/mostly the possibility to not upgrade/tunnel the connection if the origin server does not accept the upgrade (i.e. mod_proxy_http will continue with the HTTP protocol on subsequent requests/responses, until/unless the upgrade is really accepted by the origin server).

In upcoming 2.4.48 there will be an opt-out (ProxyWebsocketFallbackToProxyHttp off) to let mod_proxy_wstunnel handle websocket tunneling like before 2.4.47 (i.e. disable the fallback to mod_proxy_http), but it's really safer to have mod_proxy_http do all the necessary HTTP checks actually
So for your case I'd suggest that you simply do not "force-proxy-request-1.0", if possible (and it should be since Upgrade is not meant to be compatible with HTTP/1.0..).
Comment 18 Eric Hameleers 2021-05-14 12:33:07 UTC
Hi Yann,

I did not think I had this configured at first... but my memory is failing me again. When I grepped my httpd configuration I found indeed:

"SetEnv force-proxy-request-1.0 1"

Once I removed that line, the problem was gone also. Thanks for the detailed explanation of the changes in proxy handling.

This means, both my issues are fixed now.

Thanks!