Bug 57436

Summary: Option to strip extra data after websocket request headers
Product: Apache httpd-2 Reporter: Micha Lenk <micha>
Component: mod_proxy_wstunnelAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: enhancement CC: a.abfalterer
Priority: P2 Keywords: PatchAvailable
Version: 2.5-HEAD   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Add option to strip extra data after websocket request headers

Description Micha Lenk 2015-01-12 14:03:36 UTC
Created attachment 32365 [details]
Add option to strip extra data after websocket request headers

I am using mod_proxy_wstunnel to use Apache httpd as a reverse proxy to
tunnel websocket connections from the client to a given websocket server.

Problem description (What am I trying to solve?)
************************************************

I observed a client that sends some extra data (i.e. an additional CRLF) after
the request headers in the request used to establish the websocket connection.
Without Apache inbetween, the communication between client and backend looks
like this:

CLIENT                                BACKEND
  | ------ [    HTTP request   ] ------> |
  |          incl. extra* CRLF           |
  |                                      |
  | <----- [    HTTP response  ] ------- |
  |  "HTTP/1.1 101 Switching Protocols"  |
  |                                      |
  | ------ [ 1st websocket frame ] ----> |
  |                                      |

* Please note that the client sends an extra CRLF after the mandatory empty
  line used to terminate the request headers.

Using Apache httpd as reverse proxy the communication between client and
backend looks like this:

CLIENT                              Apache                              BACKEND
  | ---- [    HTTP request   ] -----> |                                      |
  |        incl. extra* CRLF          |                                      |
  |                                  [A]                                     |
  |                                   | ------ [   HTTP request ] ---------> |
  |                                   |       without extra CRLF             |
  |                                   |                                      |
  |                                   | <------- [ HTTP response ] --------- |
  |                                   |  "HTTP/1.1 101 Switching Protocols"  |
  |                                   |                                      |
  | <--- [    HTTP response  ] ------ |                                      |
  |       101 Switching Protocols     |                                      |
  |                                  [B]                                     |
  |                                   | -------- [ extra CRLF ] -----------> |
  |                                   |    received from client before [A]   |
  | ---- [ 1st websocket frame ] ---> |                                     [C]
  |                                   | ----- [ 1st websocket frame ] -----> |
  |                                   |                                      |


The extra CRLF sent by the client is actually not needed as per HTTP specs.
So, Apache httpd is totally right in just parsing in the HTTP request without
that extra CRLF in [A]. But as the client did already send the extra CRLF,
it lingers around in the input filter chain. Eventually mod_proxy_wstunnel
enters a tunneling mode, i.e. forwards all data seen on the client connection
to the backend connection and vice versa. This results in the backend response "HTTP/1.1 101 Switching Protocols" being forwarded to the client, and [B]
the extra CRLF being forwarded to the backend. Next the client sends a
websocket frame, which also gets forwarded to the backend.

From the backends point of view the key difference is, that without Apache
the extra CRLF is received before switching to websocket protocol. The backend
server is lucky to read request headers sloppy enough to swallow the extra
CRLF, so everything works just fine.  Whereas if Apache is configured as
reverse proxy, the extra CRLF is received after switching to websocket
protocol. The outcome is a confused websocket parser on the backend server and
a broken websocket connection.


Fix approach
************

The proposed patch introduces a new subprocess environment variable
"proxy-wstunnel-strip-extra-data" that causes mod_proxy_wstunnel to strip any
extra data received on the client socket after receiving the request headers
and before connecting to the backend server.