Bug 67938 - Tomcat mishandles large client hello messages
Summary: Tomcat mishandles large client hello messages
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 10
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 10.1.15
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ------
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-27 19:24 UTC by Aaron Ogburn
Modified: 2023-11-03 19:13 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aaron Ogburn 2023-10-27 19:24:20 UTC
A java client application running previously with java 11 began seeing handshake failures with Tomcat 10.1 when the client app moved to java 17.  OpenJDK engineers reviewed and based on the evidence gathered so far and after a static code analysis, we think that there is a problem in how Apache Tomcat handles TLS handshakes containing large Client Hello packets. We know that versions 10.1.9 to 10.1.15 are affected, but have not looked into other major releases.

What follows is a high-level overview of the events that are happening, in our understanding, when the failure manifests:

1) The TLS client sends a Client Hello packet to resume a TLS 1.3 session. The packet is so large (26,660 bytes) that it has to be split into 2 TLS record messages. This splitting occurs at the TLS level, above any possible TCP fragmentation. The first TLS record has a length of 16,372 bytes and the second a length of 10,298 bytes (5 bytes of each TLS record are for the header, and the rest accounts for the Client Hello payload).

2) The method org.apache.tomcat.util.SecureNioChannel::handshake handles the incoming connection, on the TLS server side [1]. In particular, org.apache.tomcat.util.SecureNioChannel::processSNI is called first to peek at the incoming data and check, for example, if the SNI TLS extension is present [2].

3) The most relevant outcomes of the org.apache.tomcat.util.SecureNioChannel::processSNI call are:
 3.1) The SNI TLS extension is not present. This was probably decided here [3] because the Client Hello didn't fit into a single TLS record. SNI was not present anyways.
 3.2) A new SSLEngine instance is created for the incoming connection.
 3.3) The netInBuffer ByteBuffer is filled with bytes from the first TLS record sent by the client, and might include some but not all the bytes from the second TLS record. This is because netInBuffer is initialized to a default size of 16,921 bytes, and both TLS records total 26,670 bytes. netInBuffer is expanded to sslEngine.getSession().getPacketBufferSize() after a read from the network [4] but in practice, because there was no data passed to the SSLEngine yet, this is probably 16,709 bytes (max record size, taken from SSLRecord.maxRecordSize). Expanding to a smaller length has no effect. As a result, netInBuffer has a likely size of 16,921 bytes and is completely full of data.
 3.4) netInBuffer is assumed to be in a write-ready state at this point, which means that position is set to the end of the filled data, limit is set to capacity, and more bytes can be appended. However, if it's completely full as assumed in #3.3, position would then be equal to limit (which is, in turn, equal to capacity) and more bytes cannot be appended.

4) When returning from org.apache.tomcat.util.SecureNioChannel::processSNI to org.apache.tomcat.util.SecureNioChannel::handshake, the field sniComplete is set to true reflecting that no further calls to ::processSNI are needed for this connection. Execution moves to org.apache.tomcat.util.SecureNioChannel::handshakeUnwrap because the initial state for a SSLEngine is NEED_UNWRAP [5].

5) Once in org.apache.tomcat.util.SecureNioChannel::handshakeUnwrap, the "netInBuffer.position() == netInBuffer.limit()" condition evaluates to true [6] and the ByteBuffer::clear method is called on netInBuffer. Position is set to 0 and limit to capacity. As a result, any write to netInBuffer will overwrite unprocessed data. This unprocessed data is the first TLS record and part of the second TLS record, depending on how much is written.

6) More bytes are read into netInBuffer here [7]. Bytes read are probably the remainder of the second TLS record —we know that it's after the TLS record header and that it's at least 5 bytes long—, and the overwrite occurs as anticipated in #5. Data in netInBuffer is now corrupt.

7) The netInBuffer buffer is flipped to a read-ready state [8]. Thus, limit is set to the last position after the overwrite and position is set to 0.

8) netInBuffer is passed to the SSLEngine for unwrapping. The SSLEngine finds data at the beginning of the buffer that does not correspond to the beginning of a TLS record, and fails throwing the exception shown in the server log.

We think that this error may not show up consistently due to network/OS timing conditions. Different JDK releases, server configurations and TLS protocol versions may also affect the length of the Client Hello message and have an impact on reproducibility. The reason why Client Hello messages for resumption are large in the analyzed client application case with OpenJDK 17 is because a large resumption ticket is passed, but large messages (spanning multiple TLS records) are compliant with the standard and should be handled appropriately.  The following backport for OpenJDK 17 is also being pursued to reduce the message size in this case.  Work arounds in this particular case have included keeping the java client app on java 11, limiting the client app to TLSv1.2, or setting "jdk.tls.client.enableSessionTicketExtension=false" on the client.  Nonetheless, it looks like a flaw to address here in Tomcat for large client hello messages whether from some circumstance like above or something else.
Comment 1 Aaron Ogburn 2023-10-27 19:30:32 UTC
A backport (https://bugs.openjdk.org/browse/JDK-8318950) is being pursued to reduce the message size from a client in such a case on OpenJDK 17.  But a Tomcat level fix may still be required in the end for a large message in some other scenario.
Comment 3 Aaron Ogburn 2023-10-27 21:16:49 UTC
Credit and thanks to Francisco Ferrari and Martin Balao from the OpenJDK engineering team for their analysis leading to this report.
Comment 4 Stephen Higgs 2023-11-01 16:54:17 UTC
Reproducer Steps
================

This reproducer creates an artificially large ClientHello that causes Tomcat to respond with an SSL alert on TLS 1.3 session resumption.  In this test case, a certificate extension with a very long string value is added to the server's certificate.  Wireshark analysis shows the ClientHello preshared key identity can become very large with a large certificate.  Mutual authentication also increases the size of the identity.

In the following test, the first openssl call will succeed while the second one will fail.


STEP 1 - generate a large certificate
-------------------------------------

$ cat openssl.cnf 
[req]
distinguished_name = req_distinguished_name
req_extensions = req_ext
prompt = no

[req_distinguished_name]
C   = NA
ST  = NA
L   = NA
O   = NA
OU  = NA
CN  = localhost

[req_ext]
subjectAltName = @alternate_names

[alternate_names]
DNS.1 = localhost
DNS.2 = *.localhost

[ v3_ca ]
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid:always,issuer
basicConstraints = critical,CA:true
subjectAltName = @alternate_names
keyUsage = digitalSignature, keyEncipherment
2.999 = ASN1:UTF8String:LONGSTRING


$ sed "s/LONGSTRING/$(printf '%.0sx' {0..16000})/g" ./openssl.cnf > openssl-long.cnf

$ cat create-cert.sh 
#!/bin/bash

openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 7 -nodes -config ./openssl-long.cnf -extensions v3_ca
openssl pkcs12 -inkey key.pem -in cert.pem -export -out keystore.p12 -password pass:changeit -name my
keytool -importkeystore -srckeystore keystore.p12 -destkeystore keystore.jks -srcstoretype PKCS12 -deststoretype jks -deststorepass changeit -srcstorepass changeit

$ ./create-cert.sh


Step 2 - install cert and start Tomcat
--------------------------------------


$ grep --after-context 8 "<Connector.*8443" conf/server.xml 
    <Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol"
               maxThreads="150" SSLEnabled="true"
               maxParameterCount="1000"
               >
        <UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol" />
	<SSLHostConfig protocols="all" >
		<Certificate certificateKeystoreFile="conf/keystore.jks" type="RSA" />
        </SSLHostConfig>
    </Connector>


$ cp $CERT_DIR/keystore.jks conf/keystore.jks

$ bin/catalina.sh run

Step 3 - test
-------------

$ cat test.sh 
#!/bin/bash

echo -en "GET / HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n" | openssl s_client -connect localhost:8443 -sess_out session -tls1_3 -quiet -CAfile=cert.pem
echo -en "GET / HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n" | openssl s_client -connect localhost:8443 -sess_in session -tls1_3 -quiet -CAfile=cert.pem

$ ./test.sh 
...
003E54FCFD7E0000:error:0A000438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error:ssl/record/rec_layer_s3.c:1586:SSL alert number 80
Comment 5 Mark Thomas 2023-11-03 17:13:06 UTC
Many thanks for the clear, reproducible test case. I am able to reproduce this.

I haven't confirmed the analysis but it looks right.

I'm looking at potential fixes now.
Comment 6 Mark Thomas 2023-11-03 19:13:41 UTC
Fixed in:
- 11.0.x for 11.0.0-M14 onwards
- 10.1.x for 10.1.16 onwards
-  9.0.x for  9.0.83 onwards
-  8.5.x for  8.5.96 onwards