Bug 64437

Summary:	Renegotiation issue when using mod_proxy and OpenSSL 1.1.1
Product:	Apache httpd-2	Reporter:	Armin Abfalterer <a.abfalterer>
Component:	mod_proxy	Assignee:	Apache HTTPD Bugs Mailing List <bugs>
Status:	NEW ---
Severity:	normal
Priority:	P2
Version:	2.4.43
Target Milestone:	---
Hardware:	PC
OS:	Linux
Attachments:	network dump when accessing directly to s_server with firefox network dump when accessing s_server via mod_proxy Use buffered data upon proxy renegotiation 37291: Use buffered data upon proxy renegotiation (v2)

Description Armin Abfalterer 2020-05-14 09:24:56 UTC

We ran into an issue with renegotiation when using mod_proxy and OpenSSL 1.1.1. 

The proxy engine re-acts incorrectly by trying to re-connect the backend server upon initiated renegotiation, finally resulting in "AH01997: SSL handshake failed: sending 502".

Although the issue occurs with a self-compiled httpd binary (based on 2.4.43) and with proprietary server software, we are able to reproduce the issue on CentoOS 8.

We use following virtual host configuration, e.g. placed into /etc/httpd/conf.d/sclient.conf. Note that /etc/httpd/conf/httpd.conf and /etc/httpd/conf.d/ssl.conf have not been changed.

--------
<VirtualHost *:443>

  ServerName sserver

  LogLevel trace7

  SSLEngine on
  SSLProxyEngine on
  SSLCipherSuite EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH
  SSLProtocol All -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
  SSLHonorCipherOrder On
  SSLCertificateFile /etc/pki/tls/certs/localhost.crt
  SSLCertificateKeyFile /etc/pki/tls/private/localhost.key

  <Location />
    ProxyPass https://sserver:10016/
    ProxyPassReverse https://sserver:10016/
  </Location>

</VirtualHost>
--------

As backend server we use openssl s_server with the "-www" option, known to trigger renegotiation when requesting the resource /reneg.

openssl s_server -port 10016 -www -tls1_2 -cert sserver.pem

When requesting https://sserver:10016/reneg (directly) with firefox, s_server shows a status page - thus renegotiation works correctly.

However, when requesting https://sserver:443/reneg (mod_proxy gets involved) the status 502 is delivered by httpd. The error_log shows following

[Wed May 13 13:08:35.055113 2020] [proxy_http:trace3] [pid 21557:tid 139722668078848] mod_proxy_http.c(1403): [client 127.0.0.1:52862] Status from backend: 200
[Wed May 13 13:08:35.055123 2020] [proxy_http:trace4] [pid 21557:tid 139722668078848] mod_proxy_http.c(1053): [client 127.0.0.1:52862] Headers received from backend:
[Wed May 13 13:08:35.055142 2020] [ssl:trace4] [pid 21557:tid 139722668078848] ssl_engine_io.c(2230): [remote 127.0.0.1:10016] OpenSSL: I/O error, 5 bytes expected to read on BIO#7f13bc0454f0 [mem: 7f13bc05daa3]
[Wed May 13 13:08:35.055147 2020] [ssl:trace3] [pid 21557:tid 139722668078848] ssl_engine_kernel.c(2231): [remote 127.0.0.1:10016] OpenSSL: Exit: error in SSLv3/TLS write client hello
[Wed May 13 13:08:35.055152 2020] [ssl:info] [pid 21557:tid 139722668078848] [remote 127.0.0.1:10016] AH02003: SSL Proxy connect failed
[Wed May 13 13:08:35.055160 2020] [ssl:info] [pid 21557:tid 139722668078848] [remote 127.0.0.1:10016] AH01998: Connection closed to child 0 with abortive shutdown (server sclient:443)
[Wed May 13 13:08:35.055210 2020] [ssl:info] [pid 21557:tid 139722668078848] [remote 127.0.0.1:10016] AH01997: SSL handshake failed: sending 502

The logs show that headers can be received from the backend, but then httpd erroneously tries to connect again.

In the following some version information about the system used for reproduction:

[root@localhost ~]# cat /etc/centos-release
CentOS Linux release 8.1.1911 (Core)

[root@localhost ~]# /usr/sbin/httpd -V
Server version: Apache/2.4.37 (centos)
Server built:   Dec 23 2019 20:45:34
Server's Module Magic Number: 20120211:83
Server loaded:  APR 1.6.3, APR-UTIL 1.6.1
Compiled using: APR 1.6.3, APR-UTIL 1.6.1
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
Server compiled with....
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_SYSVSEM_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=256
 -D HTTPD_ROOT="/etc/httpd"
 -D SUEXEC_BIN="/usr/sbin/suexec"
 -D DEFAULT_PIDLOG="run/httpd.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"

[root@localhost ~]# yum info httpd
Name         : httpd
Version      : 2.4.37
Release      : 16.module_el8.1.0+256+ae790463
Architecture : x86_64
Size         : 5.4 M
Source       : httpd-2.4.37-16.module_el8.1.0+256+ae790463.src.rpm

[root@localhost ~]# yum info mod_ssl
Name         : mod_ssl
Epoch        : 1
Version      : 2.4.37
Release      : 16.module_el8.1.0+256+ae790463
Architecture : x86_64
Size         : 308 k
Source       : httpd-2.4.37-16.module_el8.1.0+256+ae790463.src.rpm

[root@localhost ~]# yum info openssl
Name         : openssl
Epoch        : 1
Version      : 1.1.1c
Release      : 2.el8_1.1
Architecture : x86_64
Size         : 1.2 M
Source       : openssl-1.1.1c-2.el8_1.1.src.rpm

Comment 1 Armin Abfalterer 2020-05-14 09:30:32 UTC

Created attachment 37243 [details]
network dump when accessing directly to s_server with firefox

Comment 2 Armin Abfalterer 2020-05-14 09:31:01 UTC

Created attachment 37244 [details]
network dump when accessing s_server via mod_proxy

Comment 3 Armin Abfalterer 2020-06-05 12:29:42 UTC

Ok, here some more information...

The error message "SSL Proxy connect failed" is produced here 
https://github.com/apache/httpd/blob/2.4.43/modules/ssl/ssl_engine_io.c#L1244

By tracking down the code path starting from SSL_connect() one finishes in OpenSSL's ssl3_read_bytes(), more precisely at this position

https://github.com/openssl/openssl/blob/OpenSSL_1_1_1g/ssl/record/rec_layer_s3.c#L1727

The comment in the code  "At this point, we were expecting handshake data, but have application data." exactly reflects the dump of attachment 3744.

Anyway, I still not get it what what's going wrong. As already mentioned, doing the same without mod_proxy works fine.

So, any hint in the right direction is highly appreciated!

Comment 4 Yann Ylavic 2020-06-05 16:26:20 UTC

I think it has to do with Post Handshake Authentication (PHA) with TLS 1.3, which is not handled by mod_proxy. Switching to OpenSSL 1.1.1 enables TLS 1.3.

Does "SSLProxyProtocol All -TLSv1.3" help?

Comment 5 Armin Abfalterer 2020-06-05 18:12:22 UTC

(In reply to Yann Ylavic from comment #4)
> I think it has to do with Post Handshake Authentication (PHA) with TLS 1.3,
> which is not handled by mod_proxy. Switching to OpenSSL 1.1.1 enables TLS
> 1.3.
> 
> Does "SSLProxyProtocol All -TLSv1.3" help?

Please note that when reproducing it on CentOS we ran s_server with the -tls1_2 option. So it seems not to be related to TLS 1.3/PHA only.

Comment 6 Yann Ylavic 2020-06-06 17:18:40 UTC

Created attachment 37291 [details]
Use buffered data upon proxy renegotiation

Could you please try this patch on top of 2.4.43 (or 2.4.x)?

Comment 7 Armin Abfalterer 2020-06-10 07:38:04 UTC

I applied the patch to 2.4.43. Now, I get status 200 and there is no error in the error_log.

However, the network dumps still look different when requesting the URI directly or when mod_proxy is involved. I don't attach new dumps as the look the same as the ones already attached.

That there is still a difference in behavior can be shown by staring s_server with the -msg option, e.g. like following

openssl s_server -port 10016 -www -tls1_2 -cert sserver.pem -msg | grep -E "TLS 1.|reneg"

The output when requesting the URI /reneg directly from s_server looks following

<<< TLS 1.3, Handshake [length 0200], ClientHello
>>> TLS 1.2, Handshake [length 0041], ServerHello
>>> TLS 1.2, Handshake [length 0412], Certificate
>>> TLS 1.2, Handshake [length 012c], ServerKeyExchange
>>> TLS 1.2, Handshake [length 0004], ServerHelloDone
<<< TLS 1.2, Handshake [length 0025], ClientKeyExchange
<<< TLS 1.2, Handshake [length 0010], Finished
>>> TLS 1.2, Handshake [length 00aa], NewSessionTicket
>>> TLS 1.2, ChangeCipherSpec [length 0001]
>>> TLS 1.2, Handshake [length 0010], Finished
SSL_renegotiate -> 1
>>> TLS 1.2, Handshake [length 0004], HelloRequest
<<< TLS 1.3, Handshake [length 0200], ClientHello
>>> TLS 1.2, Handshake [length 0041], ServerHello
>>> TLS 1.2, Handshake [length 0412], Certificate
>>> TLS 1.2, Handshake [length 012c], ServerKeyExchange
>>> TLS 1.2, Handshake [length 0004], ServerHelloDone
<<< TLS 1.2, Handshake [length 0025], ClientKeyExchange
<<< TLS 1.2, Handshake [length 0010], Finished
>>> TLS 1.2, Handshake [length 00aa], NewSessionTicket
>>> TLS 1.2, ChangeCipherSpec [length 0001]
>>> TLS 1.2, Handshake [length 0010], Finished

When mod_proxy is involved, the output of s_server looks following

<<< TLS 1.3, Handshake [length 0200], ClientHello
>>> TLS 1.2, Handshake [length 0041], ServerHello
>>> TLS 1.2, Handshake [length 0412], Certificate
>>> TLS 1.2, Handshake [length 012c], ServerKeyExchange
>>> TLS 1.2, Handshake [length 0004], ServerHelloDone
<<< TLS 1.2, Handshake [length 0025], ClientKeyExchange
<<< TLS 1.2, Handshake [length 0010], Finished
>>> TLS 1.2, Handshake [length 00aa], NewSessionTicket
>>> TLS 1.2, ChangeCipherSpec [length 0001]
>>> TLS 1.2, Handshake [length 0010], Finished
SSL_renegotiate -> 1
>>> TLS 1.2, Handshake [length 0004], HelloRequest

In the later case it looks like as renegotiation is not finished correctly.

Comment 8 Yann Ylavic 2020-06-10 15:11:45 UTC

I filled a PR for openssl 2 days ago (see [1]).

openssl s_server with "/reneg" does not behave correctly (IMHO), it assumes that the request is a single line, which is never true for HTTP/1.x requests for instance.
So long as the request is a single line (like an HTTP/0.9 "GET /\r\n"), there is nothing buffered by s_server (which it can BIO_gets() immediately), thus the handshake completes.
But otherwise, s_server will send its HelloRequest to start the renegotiation, read the *buffered* second line (which does not trigger the handshake), and go straight to the HTTP response (application data).

On the mod_proxy side (as SSL client), after sending the request to the backend and while reading the response header (line by line), it first gets the HelloRequest and then sends its ClientHello to satisfy the renegotitation, and then by reading the next step of the handshake it gets the application data directly (full response header + body, which it buffers).
Now thanks to [2] (as you noted in comment 3), application data are "kind of valid" when the client is in ClientHello state, so the first response line is read successfully and returned. But the connection somehow is still in ClientHello state (hence PR [1] to ask for clarification), so mod_proxy still thinks it is in the middle of a handshake (per SSL_is_init_finished() returning 0), thus when reading the second response line it thinks it should start/continue the handshake, which fails because the connection has been closed by s_server already.

So what my patch does is that it always tries to read from the buffered data before asking for complete handshake to the openssl layer, and let openssl handle renegotiations on its own (if any), thus the reponse which was already fully buffered is returned and mod_proxy can proceed with the response to the client. The connection closed is not an error anymore (that's the end of the body for a response without Content-Length), and everything seems to work, but still the SSL client connection state remains in an exit/error path all that long.

So I'm not sure it's the correct fix, unless the openssl team clarifies this ClientHello => application data non-transition but no-failure either case.
What is a SSL client supposed to do while !SSL_is_init_finished(), handshake or read? I'm a bit nervous about the patch's SSL_total_renegotiations() != 0 workaround / bypass...


[1] https://github.com/openssl/openssl/pull/12081
[2] https://github.com/openssl/openssl/blob/OpenSSL_1_1_1g/ssl/record/rec_layer_s3.c#L1727

Comment 9 Yann Ylavic 2020-06-10 23:19:38 UTC

Created attachment 37302 [details]
37291: Use buffered data upon proxy renegotiation (v2)

This v2 should be more acceptable.

The check for SSL_is_init_finished() is replaced by !SSL_in_defore() which concerns the first handshake only (we shouldn't enter ssl_io_filter_handshake() for any other case anyway).

Also it fixes the EOF error condition (SSL_ERROR_ZERO_RETURN) in ssl_io_input_read(), otherwise the final close-notify can still break the response completeness and lead to 502.

Comment 10 Armin Abfalterer 2020-06-11 15:30:36 UTC

I've applied your new patch (v2) to 2.4.43 and did the test with s_server. The renegotiation works IF s_server is patched with your proposed change. Otherwise the behavior is still the same as mentioned in comment #7.

I've tested the v2 patch also against our "proprietary" server software. The observation there is that renegotiation sometimes works... but most often not. 

As the behavior looks to me like a timing-issue (handshake data sometimes delay, sometimes not) I've changed the server by repeatably calling SSL_do_handshake() as long as SSL_renegotiate_pending() is true, additionally by peeking for data. Before the change SSL_do_handshake() was called again only on SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE.

Note that the server uses non-blocking sockets.


SSL_renegotiate(ssl);

while (goahead) {
  SSL_do_handshake(ssl);

  if (SSL_renegotiate_pending(ssl)) {
    char peekbuf[1];
    SSL_peek(ssl, peekbuf, 0); 
    goahead = 1;
  }
  
  // also repeat on SSL_ERROR_WANT_READ/SSL_ERROR_WANT_WRITE
  ...

  // otherwise we are finished with renegotiation
  ...
}

Doing so the renegotiation always works.

Whether the workaround is right or not remains to be seen, but in my opinion it clearly shows what the problem is about.