Bug 65131

Summary: OpenSSLEngine errors on a connection affect other connections
Product: Tomcat 9 Reporter: frrajott
Component: ConnectorsAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 9.0.34   
Target Milestone: -----   
Hardware: PC   
OS: Linux   

Description frrajott 2021-02-08 17:29:16 UTC
This issue was previously reported in https://bz.apache.org/bugzilla/show_bug.cgi?id=62054 but was discarded because it was not reproducible. However, I have been able to reproduce it.

The setup is:
tomcat 9.0.34
HTTPNIO connector, with OpenSSLEngine implementation
openssl 1.0.2u

To reproduce, configure a secure connector with a self-signed certificate.

Then, in a loop, access the secure connector using curl, but ignore the self-signed certificate errors (-k), like this:
for i in {1..1000} ; do curl -q -k https://hostname >/dev/null 2>&1; echo $?; done

The exit code should always be zero if the connection is successful.

While the test is running, access the same interface, but let curl fail the connection by letting it verify the certificate:
curl https://hostname
curl: (60) Peer's certificate issuer has been marked as not trusted by the user.

The test will start reporting connection errors (exit code 52). Run both in a tight loop to get many errors.

The following stack trace is visible in tomcat logs:

FINE: OpenSSL error: [336462231] message: [error:140E0197:SSL routines:SSL_shutdown:shutdown while in init]
Feb 08, 2021 11:41:19 AM org.apache.coyote.http11.Http11Processor service
FINE: Error parsing HTTP request header
javax.net.ssl.SSLException: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init
        at org.apache.tomcat.util.net.openssl.OpenSSLEngine.checkLastError(OpenSSLEngine.java:946)
        at org.apache.tomcat.util.net.openssl.OpenSSLEngine.pendingReadableBytesInSSL(OpenSSLEngine.java:631)
        at org.apache.tomcat.util.net.openssl.OpenSSLEngine.unwrap(OpenSSLEngine.java:558)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:629)
        at org.apache.tomcat.util.net.SecureNioChannel.read(SecureNioChannel.java:617)
        at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.fillReadBuffer(NioEndpoint.java:1229)
        at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.read(NioEndpoint.java:1141)
        at org.apache.coyote.http11.Http11InputBuffer.fill(Http11InputBuffer.java:761)
        at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:356)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:260)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at com.broadsoft.xsp.platform.InstrumentedExecutor$Tracker.run(InstrumentedExecutor.java:413)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


My feeling is that this is related to the same problem that led to this fix: https://github.com/apache/tomcat/commit/ebab25a3576a315ca676db7ce2366e53f9dcf311
It looks like, in this case, there are multiple errors waiting in the openssl error queue, and just reading one error is not sufficient to clear the queue.
Therefore, SSL errors on a connection are permitted to propagate to other connections as long as they are processed on the same thread.


As an aside, I also noticed the following NPE stacktrace in the logs, but I'm not sure if it's related. It may be a different scenario that caused it. I think it's related to SecureNioChannel.processSNI returning/throwing before creating the SSLEngine.

Feb 08, 2021 11:48:00 AM org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper doClose
SEVERE: Failed to close channel
java.lang.NullPointerException
        at org.apache.tomcat.util.net.SecureNioChannel.close(SecureNioChannel.java:534)
        at org.apache.tomcat.util.net.SecureNioChannel.close(SecureNioChannel.java:560)
        at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doClose(NioEndpoint.java:1172)
        at org.apache.tomcat.util.net.SocketWrapperBase.close(SocketWrapperBase.java:396)
        at org.apache.tomcat.util.net.NioEndpoint$Poller.cancelledKey(NioEndpoint.java:684)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at com.broadsoft.xsp.platform.InstrumentedExecutor$Tracker.run(InstrumentedExecutor.java:413)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Comment 1 Remy Maucherat 2021-02-09 12:06:35 UTC
Ok, I can see the error check could still be a bit shaky, a loop could possibly be used in clearLastError (documentation of the OpenSSL API is here: https://www.openssl.org/docs/manmaster/man3/ERR_get_error.html ). Not sure why clearLastError is called in unwrap however, maybe Mark can comment on that one. The TODO is accurate: ideally every SSL call should check the error.

BTW, I doubt this reproduces anything, ab works just fine, so curl isn't going to be doing anything special beyond that.

The NPE in SecureNioChannel.close is possible, it would be a cosmetic error but the code could be more defensive.
Comment 2 Remy Maucherat 2021-02-11 09:42:23 UTC
I cannot reproduce the issue, please test with the most recent 9.0 release.

Since this better matches the OpenSSL documentation, I am ok conceptually to modify clearLastError to do:
while (SSL.getLastErrorNumber() != SSL.SSL_ERROR_NONE);
Comment 3 frrajott 2021-02-11 17:08:24 UTC
I have reproduced the problem with tomcat version 9.0.43:


Feb 11, 2021 11:49:43 AM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
INFO: Loaded Apache Tomcat Native library [1.2.26] using APR version [1.4.8].
Feb 11, 2021 11:49:43 AM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
INFO: APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true].
Feb 11, 2021 11:49:43 AM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
INFO: APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
Feb 11, 2021 11:49:43 AM org.apache.catalina.core.AprLifecycleListener initializeSSL
INFO: OpenSSL successfully initialized [OpenSSL 1.0.2u  20 Dec 2019]

....


Feb 11, 2021 12:05:01 PM org.apache.tomcat.util.net.openssl.OpenSSLEngine checkLastError
FINE: OpenSSL error: [336462231] message: [error:140E0197:SSL routines:SSL_shutdown:shutdown while in init]
Feb 11, 2021 12:05:01 PM org.apache.coyote.http11.Http11Processor service
FINE: Error parsing HTTP request header
javax.net.ssl.SSLException: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init
        at org.apache.tomcat.util.net.openssl.OpenSSLEngine.checkLastError(OpenSSLEngine.java:951)
        at org.apache.tomcat.util.net.openssl.OpenSSLEngine.pendingReadableBytesInSSL(OpenSSLEngine.java:636)
        at org.apache.tomcat.util.net.openssl.OpenSSLEngine.unwrap(OpenSSLEngine.java:558)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:629)
        at org.apache.tomcat.util.net.SecureNioChannel.read(SecureNioChannel.java:637)
        at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.fillReadBuffer(NioEndpoint.java:1320)
        at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.read(NioEndpoint.java:1232)
        at org.apache.coyote.http11.Http11InputBuffer.fill(Http11InputBuffer.java:791)
        at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:359)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:261)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:887)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1684)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at com.broadsoft.xsp.platform.InstrumentedExecutor$Tracker.run(InstrumentedExecutor.java:413)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Comment 4 Mark Thomas 2021-02-22 12:02:24 UTC
I can't recreate this using the provided test case, Tomcat Native 1.2.26 and the latest OpenSSL 1.1.1 source code.

I'll note that Tomcat 1.2.26 won't compile with OpenSSL 1.0.2 so I do wonder exactly what versions were in used in the tests described here.

I'll also note that OpenSSL 1.0.2 is no longer supported.

Please retest with the latest OpenSSL 1.1.1 release.
Comment 5 Remy Maucherat 2021-02-22 12:09:11 UTC
(In reply to Mark Thomas from comment #4)
> I can't recreate this using the provided test case, Tomcat Native 1.2.26 and
> the latest OpenSSL 1.1.1 source code.
> 
> I'll note that Tomcat 1.2.26 won't compile with OpenSSL 1.0.2 so I do wonder
> exactly what versions were in used in the tests described here.
> 
> I'll also note that OpenSSL 1.0.2 is no longer supported.
> 
> Please retest with the latest OpenSSL 1.1.1 release.

Should I still replace clearLastError with this ?

    private static void clearLastError() {
        while (SSL.getLastErrorNumber() != SSL.SSL_ERROR_NONE);
    }

This does not seem unreasonable given the docs ...
Comment 6 Mark Thomas 2021-02-22 16:53:05 UTC
Yes, the OP is correct that the clearLastError call in unwrap is there is part of this fix:
https://github.com/apache/tomcat/commit/ebab25

I agree that using a loop looks to be the right approach here.
Comment 7 Remy Maucherat 2021-02-24 09:55:23 UTC
Ok, so clearLastError will now align with the OpenSSL documentation using the loop. The change will be in 10.0.3, 9.0.44 and 8.5.64 [along with the NPE fix]. We could not reproduce, so you will have to verify this is fixed.
Comment 8 frrajott 2021-02-24 13:33:20 UTC
Thanks, Remy. I'll update this bug report when I get the chance to test the latest tomcat version.