Dear all: Last days, we found some jvm crash things when we use APR mode in tomcat. Version: Tomcat 9.0.65,APR 1.7.0 ,Tomcat Native library 1.2.33, OpenSSL 1.1.1n We used GDB to findout what happend. Show that crash is happend on this code , . one crash: org.apache.tomcat.util.buf.ByteBufferUtils#cleanDirectBuffer org.apache.tomcat.util.net.AprEndpoint.AprSocketWrapper#doClose another crash: org.apache.tomcat.util.net.AprEndpoint#setSocketOptions(org.apache.tomcat.util.net.SocketWrapperBase<java.lang.Long>) org.apache.tomcat.jni.SSLSocket#handshake To See some changelogs and code , I found there are some write lock an read lock to do socket close or somethine, but some codes doesnt have same code , Im not sure wether this code need to add write lock. like org.apache.tomcat.util.net.AprEndpoint.SocketProcessor#doRun sorry that I have an doc file to show stacktrace but due to safe reason , I cant add an attachment , If you neet more message ,I will try to add more details . And I am not a native speaker , hope that I made it clear
add crash stacktrace Thread 8117: (state = IN_NATIVE) - org.apache.tomcat.jni.SSLSocket.handshake(long) @bci=0 (Compiled frame; information may be imprecise) - org.apache.tomcat.util.net.AprEndpoint.setSocketOptions(org.apache.tomcat.util.net.SocketWrapperBase) @bci=169, line=738 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run() @bci=109, line=2105 (Compiled frame) - org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker) @bci=92, line=1191 (Compiled frame) - org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run() @bci=5, line=659 (Interpreted frame) - org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run() @bci=4, line=61 (Compiled frame) - java.lang.Thread.run() @bci=11, line=750 (Compiled frame) Thread 16878: (state = IN_VM) - sun.misc.Unsafe.freeMemory(long) @bci=0 (Compiled frame; information may be imprecise) - java.nio.DirectByteBuffer$Deallocator.run() @bci=17, line=94 (Compiled frame) - sun.misc.Cleaner.clean() @bci=12, line=143 (Compiled frame) - sun.reflect.GeneratedMethodAccessor72.invoke(java.lang.Object, java.lang.Object[]) @bci=36 (Compiled frame) - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame) - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=56, line=498 (Compiled frame) - org.apache.tomcat.util.buf.ByteBufferUtils.cleanDirectBuffer(java.nio.ByteBuffer) @bci=24, line=124 (Compiled frame) - org.apache.tomcat.util.net.SocketBufferHandler.free() @bci=11, line=229 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$AprSocketWrapper.doClose() @bci=76, line=2385 (Compiled frame) - org.apache.tomcat.util.net.SocketWrapperBase.close() @bci=34, line=429 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun() @bci=32, line=2145 (Compiled frame) - org.apache.tomcat.util.net.SocketProcessorBase.run() @bci=21, line=49 (Compiled frame) - org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker) @bci=92, line=1191 (Compiled frame) - org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run() @bci=5, line=659 (Interpreted frame) - org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run() @bci=4, line=61 (Compiled frame) - java.lang.Thread.run() @bci=11, line=750 (Compiled frame)
Note: The APR/Tomcat Native HTTP and AJP connectors are deprecated in Tomcat 9 and have been removed in Tomcat 10.1.x onwards. You have plenty of time before 9.0.x reaches End-Of-Life but you might want to switch to one of the alternatives sooner rather than later. The NIO connector with OpenSSLImplementation is probably a good option.
Can you reliably reproduce the issue? What OS are you using? I haven't seen so many crashes with APR for the past 10 years in such a short time frame. Especially #setSocketOptions() is weird.
To investigate this further we need a test case that reproduces the crash reliably enough for us to be able to debug it.
Reduce severity to normal.
Also, the native backtrace would be helpful (it should be found in the hs_pid_xxxx.txt file generated on crash). If you were able to inspect with gdb, anything you found in there would be helpful as well.
We found the way to reproduce this issue。 Upgrading Tomcat Native Library 1.2.23 to Tomcat Native Library 1.2.33, we set the tomcat in apr mode, In a high-concurrency scenario, the Tomcat process crashes. When line 476 is commented out in native/src/sslcontext.c, the JVM crash does not occur. We gauss the content in servername is not released in line 192 in native/src/sslcontext.c, causing the JVM to crash. I hope you reply as soon as possible. Here's our analysis, and I hope it's helpful. First, the apr allocates and releases the memory of internal objects through malloc_pool. The allocated objects do not need to be released. When the memory pool is destroyed, all the objects allocated in the memory pool are destroyed and released. At the same time, the cleanup callback function registered in the memory pool is called back. The local invoking method for the Tomcat to create and destroy a memory pool is Pool.create,Pool.destory. When Tomcat functions as an HTTPS server, the SSLContext (org.apache.tomcat.util.net.openssl.OpenSSLContext#OpenSSLContext) is initialized using SSContext.make (native invoking) during startup. During server certificate initialization, an APR memory pool is created, The memory pool will be destroyed when the certificate is released. Check the invoking code. It is found that the memory pool destruction method is invoked in the org.apache.tomcat.util.net.AprEndpoint#unbind method. In normal cases, the memory pool is released when the endpoint is destroyed (destory) or stopped. PR based on GitHub https://github.com/apache/tomcat-native/commit/d1885c202d987f3898ba72b7fc084413f0bdb1f9 In this modification, the callback function ssl_callback_ClientHello is registered when SSLContext.make is used. This function is invoked when the SSL handshake is performed inside SSLSocket.handshake each time the client requests. The memory is allocated on the memory pool for each call, storing the servername, which is not shown here. It is suspected that memory leakage occurs before the apr memory pool destruction or cleanup method is invoked.
log in jvm hs_err_.log --------------- T H R E A D --------------- Current thread (0x00002b7a41865800): JavaThread "catalina-exec-18000-120" daemon [_thread_in_native, id=16233, stack(0x00002b7b724e7000,0x00002b7b725e8000)] siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000 Registers: RAX=0x322e332e372e3632, RBX=0x00002b7a8c017ac0, RCX=0x0000000000000000, RDX=0x0000000000000030 RSP=0x00002b7b725e6580, RBP=0x00002b7b725e65f0, RSI=0x00000000ffffffff, RDI=0x0000000000000000 R8 =0xbbb9b201d5303400, R9 =0x00002b7a8c0bb940, R10=0x12656c79b7059708, R11=0xb71a9a122dd39dad R12=0x0000000000000000, R13=0x0000000738591598, R14=0x00002b7b725e6820, R15=0x00002b7a41865800 RIP=0x00002b7aac08aaf3, EFLAGS=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000000 TRAPNO=0x000000000000000d Top of Stack: (sp=0x00002b7b725e6580) 0x00002b7b725e6580: 00002b7b725e6650 00002b7a8c14f330 0x00002b7b725e6590: 0000003000000000 0000000000000000 0x00002b7b725e65a0: 0000000000000000 00002b7a42262a80 0x00002b7b725e65b0: 00002b7aac8256a0 0000000000000030 Instructions: (pc=0x00002b7aac08aaf3) 0x00002b7aac08aad3: 10 05 00 00 bf 00 00 00 00 e8 17 36 f9 ff 48 89 0x00002b7aac08aae3: 83 e0 01 00 00 48 8b 45 98 48 8b 80 a8 00 00 00 0x00002b7aac08aaf3: 48 8b 80 e0 03 00 00 48 85 c0 0f 84 ee 00 00 00 0x00002b7aac08ab03: 48 8b 45 98 48 8b 80 10 05 00 00 48 8b 80 40 02 Register to memory mapping: RAX=0x322e332e372e3632 is an unknown value RBX=0x00002b7a8c017ac0 is an unknown value RCX=0x0000000000000000 is an unknown value RDX=0x0000000000000030 is an unknown value RSP=0x00002b7b725e6580 is pointing into the stack for thread: 0x00002b7a41865800 RBP=0x00002b7b725e65f0 is pointing into the stack for thread: 0x00002b7a41865800 RSI=0x00000000ffffffff is an unknown value RDI=0x0000000000000000 is an unknown value R8 =0xbbb9b201d5303400 is an unknown value R9 =0x00002b7a8c0bb940 is an unknown value R10=0x12656c79b7059708 is an unknown value R11=0xb71a9a122dd39dad is an unknown value R12=0x0000000000000000 is an unknown value R13=0x0000000738591598 is an oop java.util.concurrent.locks.ReentrantLock - klass: 'java/util/concurrent/locks/ReentrantLock' - ---- fields (total size 2 words): - private final 'sync' 'Ljava/util/concurrent/locks/ReentrantLock$Sync;' @12 a 'java/util/concurrent/locks/ReentrantLock$NonfairSync' (e70ad38b 385915b0) R14=0x00002b7b725e6820 is pointing into the stack for thread: 0x00002b7a41865800 R15=0x00002b7a41865800 is a thread Stack: [0x00002b7b724e7000,0x00002b7b725e8000], sp=0x00002b7b725e6580, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libssl.so.1.1+0x8aaf3] C [libssl.so.1.1+0x6e230] C [libssl.so.1.1+0x6d5f7] C [libssl.so.1.1+0x6d093] C [libssl.so.1.1+0x4ec81] SSL_do_handshake+0xfa C [libtcnative-1.so.0.2.36+0x2b540] Java_org_apache_tomcat_jni_SSLSocket_handshake+0x81 J 14729 org.apache.tomcat.jni.SSLSocket.handshake(J)I (0 bytes) @ 0x00002b7a45eebb41 [0x00002b7a45eeba80+0xc1] J 15421 C2 org.apache.tomcat.util.net.AprEndpoint.setSocketOptions(Lorg/apache/tomcat/util/net/SocketWrapperBase;)Z (419 bytes) @ 0x00002b7a469e46b8 [0x00002b7a469e4540+0x178] J 15420 C2 org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run()V (200 bytes) @ 0x00002b7a46a2f1a4 [0x00002b7a46a2f0e0+0xc4] J 15712% C2 org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(Lorg/apache/tomcat/util/threads/ThreadPoolExecutor$Worker;)V (187 bytes) @ 0x00002b7a45e8d014 [0x00002b7a45e8ce40+0x1d4] j org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run()V+5 J 10304 C1 org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run()V (25 bytes) @ 0x00002b7a45a274d4 [0x00002b7a45a273c0+0x114] J 10295 C1 java.lang.Thread.run()V (17 bytes) @ 0x00002b7a45a2cb14 [0x00002b7a45a2c9c0+0x154] v ~StubRoutines::call_stub V [libjvm.so+0x7778a5] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0xc85 V [libjvm.so+0x77509f] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x26f V [libjvm.so+0x77564d] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x5d V [libjvm.so+0x82b1de] thread_entry(JavaThread*, Thread*)+0xae V [libjvm.so+0xc0adf7] JavaThread::thread_main_inner()+0x137 V [libjvm.so+0xa8b730] java_start(Thread*)+0x170 C [libpthread.so.0+0x96b4] start_thread+0xc4
(In reply to DigitalCat from comment #7) > When line 476 is commented out in native/src/sslcontext.c, the JVM crash > does not occur. > We guess the content in servername is not released in line 192 in > native/src/sslcontext.c, causing the JVM to crash. These line numbers do not match the current main branch of Tomcat. Are you able to: 1. Use sources from git/main (1.2.38-dev) 2. Show the patch which resolves the issue 3. Run the process long enough to ensure that you aren't leaking memory from elsewhere
The line numbers match up for the quoted version (1.2.33) and the relevant code is unchanged in 1.2.x. The description of a potential memory leak looks plausible but I am unable to reproduce it with 9.0.x and Tomcat Native 1.2.37. With 1,000,000+ requests I do not see the expected increase in any of the memory pools in a profiler. A test case that demonstrates the issue would be very helpful. Keep in mind that ideally it needs to be something committers can run locally.
Yes branch 1.2.X can match line no. We use wrk to reproduce this issue. "./wrk -t 20 -c 1000 -d 5s --latency --timeout 1s https://XXXXXX/V3/LoginRoute" wrk is an opensource project, you can try with this . Due to our company's security policy, I can not offer our test case . Sorry for that.
We are anticipating your reply
I'm unable to reproduce the crash but adding some debug logging did confirm that there is a memory leak here. I have a fix that I just need to clean up before I commit it.
Correcting Product etc
Note 2.0.x is also affected
Fixed in: - 2.0.x for 2.0.5 onwards - 1.2.x for 1.2.38 onwards