Bug 58566

Summary: NoSuchMethodError and JVM crash when running HTTPS test with Tomcat 7 (o.a.t.util.net.TestSsl)
Product: Tomcat Native Reporter: Konstantin Kolinko <knst.kolinko>
Component: LibraryAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 1.2.0   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Stdout/Stderr output of Apache Ant
JVM crash log

Description Konstantin Kolinko 2015-10-31 01:47:07 UTC
Tomcat Native 1.2.0 is unusable for HTTPS with Tomcat 7, 8 due to missing APIs on java side.

Apparently it can be used for plain HTTP and AJP, as other Tomcat 7 tests (using plain HTTP) pass successfully.

To reproduce:
===============
1) Run org.apache.tomcat.util.net.TestSsl test standalone with Tomcat 7, by placing the following line into build.properties

test.entry=org.apache.tomcat.util.net.TestSsl

I was testing Tomcat 7 trunk (at r1711498)

2) I was running Ant with JAVA_HOME=JDK 8u66 (32-bit) on Windows 7.

Notes:
=========
1) Ant build with this single test was running for 17 minutes. Two tests were run

- [testSimpleSsl] printed NoSuchMethodError and hung for 17 minutes
- [testKeyPass] crashed the JVM.

2) The "TEST-org.apache.tomcat.util.net.TestSsl.APR.txt" after JVM crash is effectively empty.

You need to catch console output of Ant by redirecting it into a file. The console output has NoSuchMethodError and other messages.

Errors:
========
1)
    [junit] SEVERE: Failed to initialize connector [Connector[HTTP/1.1-auto-1]]
    [junit] org.apache.catalina.LifecycleException: Failed to initialize component [Connector[HTTP/1.1-auto-1]]
    [junit]     at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:106)
    [junit]     at org.apache.catalina.core.StandardService.initInternal(StandardService.java:559)
    [junit]     at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
    [junit]     at org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:821)
    [junit]     at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
    [junit]     at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:139)
    [junit]     at org.apache.catalina.startup.Tomcat.start(Tomcat.java:339)
    [junit]     at org.apache.catalina.startup.TomcatBaseTest$TomcatWithFastSessionIDs.start(TomcatBaseTest.java:793)
    [junit]     at org.apache.tomcat.util.net.TestSsl.testSimpleSsl(TestSsl.java:61)
...
    [junit] Caused by: java.lang.NoSuchMethodError: sniCallBack
    [junit]     at org.apache.tomcat.jni.SSLContext.make(Native Method)
    [junit]     at org.apache.tomcat.util.net.AprEndpoint.bind(AprEndpoint.java:573)
    [junit]     at org.apache.tomcat.util.net.AbstractEndpoint.init(AbstractEndpoint.java:651)
    [junit]     at org.apache.coyote.AbstractProtocol.init(AbstractProtocol.java:434)
    [junit]     at org.apache.catalina.connector.Connector.initInternal(Connector.java:978)
    [junit]     at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
    [junit]     ... 33 more

2) From JVM crash file:

[[[
Stack: [0x17440000,0x17490000],  sp=0x1748f504,  free space=317k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0xe26df]
C  [tcnative-1.dll+0x9d73]
C  [tcnative-1.dll+0xc905d]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.apache.tomcat.jni.SSLSocket.handshake(J)I+0
j  org.apache.tomcat.util.net.AprEndpoint.setSocketOptions(J)Z+106
j  org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run()V+34
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j  org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run()V+4
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
]]]


Source code
============
From quick look into native implementation of SSLContext.make() (sslcontext.c),

[[[
    /* Cache Java side SNI callback if not already cached */
    if (ssl_context_class == NULL) {
        ssl_context_class = (*e)->NewGlobalRef(e, o);
        sni_java_callback = (*e)->GetStaticMethodID(e, ssl_context_class,
                                                    "sniCallBack", "(JLjava/lang/String;)J");
    }
]]]

1) There is no test for success of GetStaticMethodID() call.
2) There is a race condition. Testing that ssl_context_class is not NULL does not mean that sni_java_callback has already been initialized.

As stacktrace above shows, SSLContext.make() is called from AprEndpoint.bind(), so a race here is unlikely unless someone initializes several connectors concurrently at the same time. It explains the JVM crash in the tests, as the first call results in NoSuchMethodError, but the second call succeeds with ssl_context_class check and crashes later.
Comment 1 Konstantin Kolinko 2015-10-31 01:55:29 UTC
Created attachment 33235 [details]
Stdout/Stderr output of Apache Ant

Ant output. Testing Tomcat 7 trunk (at r1711498) with JDK 8u66
Comment 2 Konstantin Kolinko 2015-10-31 02:01:48 UTC
Created attachment 33236 [details]
JVM crash log
Comment 3 Mark Thomas 2015-10-31 18:52:28 UTC
I am not concerned about the potential race condition here.

In Tomcat, starting of multiple connectors is always single threaded.

In theory, connectors could be started in parallel via the embedded API but even then the likelihood of a race is low and, as it happens, the fix for the method not being present should prevent any crashes if the race occurs.

I have a fix for this that enables 1.2.x to work with current 7.0.x trunk. I'll be committing that fix shortly.
Comment 4 Mark Thomas 2015-10-31 19:03:00 UTC
Fixed in trunk for 1.2.1
Comment 5 Konstantin Kolinko 2015-11-02 03:19:20 UTC
(In reply to Mark Thomas from comment #3)
> I am not concerned about the potential race condition here.
> 
> In Tomcat, starting of multiple connectors is always single threaded.
> 
> In theory, connectors could be started in parallel via the embedded API but
> even then the likelihood of a race is low and, as it happens, the fix for
> the method not being present should prevent any crashes if the race occurs.

There is also JMX API, StandardService.addConnector(...). There is no indication in java code that SSLContext.make() should be called by a single thread only (e.g. method not marked as synchronized).

There is a method that is certainly called only once - the method called by APRLifecycleListener, SSL.initialize(). I wonder whether this code can be moved there.

I agree that this is not a stopper. Usually connectors have <Connector bindOnInit="true"/> so there is a delay between initialization of all connectors and their start. The worst case - serving several requests without SNI - won't happen because all connectors will be initialized before starting them.

BTW, in sslcontext.c / SSLContext.make():

    /* Cache the byte[].class for performance reasons */
    clazz = (*e)->FindClass(e, "[B");
    byteArrayClass = (jclass) (*e)->NewGlobalRef(e, clazz);

The same lookup/caching of "[B" is present in ssl.c / SSL.initialize().


Review of r1711667:
(Comment typo already fixed by r1711675)

1. sslcontext.c / SSLContext.make(..)

>        /* Older Tomcat versions may not have this static method */
>        if ( JNI_TRUE == (*e)->ExceptionCheck(e) ) {
>            (*e)->ExceptionClear(e);
>        }

Other places (info.c, misc.c) just call
         if ((*e)->ExceptionCheck(e)) {

2. sslcontext.c / ssl_callback_ServerNameIndication()

> if (sni_java_callback != 0) {

1) sni_java_callback is jmethodID which is a pointer to a structure. So != NULL

2) If callback call is skipped, new_ssl_context variable is left uninitialized (random value) and "if (original_ssl_context != new_ssl_context) {" check is satisfied and goes on accessing random memory.

3) If sni_java_callback is unavailable, this method should return early. It does a lot of unneeded work.

4) Java method SSLContext.sniCallBack() implemented in Tomcat trunk by default returns 0

As such,  "if (original_ssl_context != new_ssl_context) {"  compares not-null pointer with 0 returned by the method.

5)
> SSL_set_SSL_CTX(ssl, J2P(new_ssl_context, SSL_CTX *));

Is "new_ssl_context" here a pointer to SSL_CTX ?

I cannot confirm it. From java code it looks that it is a pointer to tcn_ssl_ctxt_t structure, which field "ctx" is (SSL_CTX*).


Looking at Tomcat trunk code, the method 
in APREndpoint.bind():
> sslHostConfig.setOpenSslContext(Long.valueOf(ctx));

ctx is a pointer to tcn_ssl_ctxt_t structure, not a SSL_CTX pointer.

6) In SSLHostConfig.java in Tomcat trunk:

    public Object getOpenSslContext() {
        return openSslContext;
    }

s/Object/Long/, to match setter method and to avoid class casts later.
Comment 6 Mark Thomas 2015-11-02 13:59:37 UTC
Review comments all addressed.