Bug 65553 - Orphaned thread by JNDIRealm / clearReferencesThreads reports memory leak
Summary: Orphaned thread by JNDIRealm / clearReferencesThreads reports memory leak
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 9.0.52
Hardware: PC All
: P2 minor (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-06 07:53 UTC by Thomas Hoffmann
Modified: 2021-09-28 07:29 UTC (History)
0 users



Attachments
Stacktrace on creation of orphaned thread (62.32 KB, image/png)
2021-09-06 07:53 UTC, Thomas Hoffmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Hoffmann 2021-09-06 07:53:40 UTC
Created attachment 38020 [details]
Stacktrace on creation of orphaned thread

Hello,
 
we are using the org.apache.catalina.realm.JNDIRealm for authentication of users against our windows AD.
When undeploying the application, we see the following  warning in our logs:
 
WARNING [Catalina-utility-1]  org.apache.catalina.loader.Webapp ClassLoaderBase.clearReferencesThreads The  web application [ROOT] appears to have started a thread named [Thread-106] but  has failed to stop it. This is very likely to create a memory leak. Stack trace  of thread:
 java.base@11.0.3/java.net.SocketInputStream.socketRead0(Native  Method)
 java.base@11.0.3/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
 java.base@11.0.3/java.net.SocketInputStream.read(SocketInputStream.java:168)
 java.base@11.0.3/java.net.SocketInputStream.read(SocketInputStream.java:140)
 java.base@11.0.3/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:448)
 java.base@11.0.3/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
 java.base@11.0.3/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1104)
 java.base@11.0.3/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:823)
 java.base@11.0.3/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
 java.base@11.0.3/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
 java.base@11.0.3/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
 java.naming@11.0.3/com.sun.jndi.ldap.Connection.run(Connection.java:832)
 java.base@11.0.3/java.lang.Thread.run(Thread.java:834)
 
The warning is not always shown but quite often.
 
Summary of the analysis of the problem: 
On tomcat startup, the worker-thread is running under the  tomcat classloader. But when a reconnect happens, the thread is running with  the classloader of the web application and gets thus reported.
 
The details:
Digging into the problem via remote debugging showed the  reason how this happens:
During startup, Tomcat is initializing the JNDIRealm. The  open-method of JNDIRealm is switching the classloader between bootstrap-CL and  tomcat-lib-CL, depending on the attribute "useContextClassLoader".
Afterwards the context-Object is created  (createDirContext). Within this LdapCtx, an LdapClient is used to communicate  with the AD-Server.
This LdapClient uses a com.sun.jndi.ldap.Connection for  TCP communication. This connection opens the reported Worker-Thread.
This can be seen at https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.naming/share/classes/com/sun/jndi/ldap/Connection.java  around line 243 --> worker = Obj.helper.createThread(this);
 
So far, so good.
 
Somehow, the com.sun.jndi.ldap.Connection is sometimes  closed and the thread dies. At least, the thread is not visible any more. Maybe because of a timeout on the AD-server side or something else happened.
If a new user accesses the site, the JNDIRealm is  authenticating the user.
This triggers the following chain (path is shortened):  JNDIRealm.getUserBySearch --> LdapCtx.dosearch --> LdapCtx.ensureOpen  --> LdapCtx.connect --> LdapClient.getInstance -->  Connection.<init>

A detailed stack is attached as an image.

This call chain creates a new com.sun.jndi.ldap.Connection and  thus a new thread. But this time, the thread is connected to the classloader of  the web-application.
On undeployment, the thread is thus reported to be orphaned.
 
It was tested with Tomcat 9.0.52, Windows 10, OpenJDK  11.0.12_7.
 
As the authentication is conducted within tomcat, before  the application is triggered, thus the application can't handle that error.

The flag "useContextClassLoader" is not taken into account during authentication.
During this process a re-connect can occur and create the thread with the wrong classloader.
 
Thanks in advance,
Thomas
Comment 1 Mark Thomas 2021-09-06 08:07:16 UTC
One view is that the JRE should not be doing this - or at least doing it in a way that doesn't run the risk of memory leaks in a Java EE / Jakarta EE environment. We have raised JRE bugs for issues like this in the past and they have been fixed.

The counter view is that this thread is not a one-off and is not short-lived (the typical cases for JRE bugs that have been fixed) and Tomcat has already acknowledged - with the useContextClassLoader flag - that it needs to be taking action to ensure that any threads are created with the expected class loader.

If the consensus is that Tomcat needs to handle this then we'd need to ensure that every version of authenticate() also set the class loader if required. We'd probably want to refactor the existing handling to make sure we don't try setting the class loader twice.

I am currently leaning towards handling this in Tomcat rather than raising a JRE bug.

Thoughts?
Comment 2 Michael Osipov 2021-09-06 08:23:12 UTC
I see the same in my application, though not with the JNDIRealm:

2021-08-31T20:25:13.679 INFORMATION [https-openssl-apr-8443-exec-67] org.apache.catalina.core.ApplicationContext.log HTMLManager: stop: Stopping web application '/smartld##001'
2021-08-31T20:25:13.712 INFORMATION [https-openssl-apr-8443-exec-67] org.apache.catalina.core.ApplicationContext.log org.tuckey.web.filters.urlrewrite.UrlRewriteFilter INFO: destroy called

> 2021-09-01T15:55:23.726 WARNUNG [deblndw028v.ad001.siemens.net-startStop-3] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [ket##001] appears to have started a thread named [Thread-9] but has
>  java.lang.Object.wait(Native Method)
>  java.lang.Object.wait(Object.java:502)
>  com.sun.jndi.ldap.Connection.pauseReader(Connection.java:840)
>  com.sun.jndi.ldap.Connection.run(Connection.java:983)
>  java.lang.Thread.run(Thread.java:748)

which comes right after a stop in the manager app or a container shutdown:
> 2021-09-01T15:55:19.615 INFORMATION [main] org.apache.catalina.core.StandardServer.await A valid shutdown command was received via the shutdown port. Stopping the Server instance.
> 2021-09-01T15:55:19.616 INFORMATION [main] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["https-openssl-apr-8443"]
> 2021-09-01T15:55:19.635 INFORMATION [main] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
> 2021-09-01T15:55:19.638 INFORMATION [deblndw028v.ad001.siemens.net-startStop-3] org.apache.catalina.core.ApplicationContext.log Destroying Spring FrameworkServlet 'dispatch'
> 2021-09-01T15:55:23.530 INFORMATION [deblndw028v.ad001.siemens.net-startStop-3] org.apache.catalina.core.ApplicationContext.log Closing Spring root WebApplicationContext


I think a solution within the JDK should be pursued first.
Comment 3 Thomas Hoffmann 2021-09-06 08:53:39 UTC
Hello Michael,
if the leak is reported without using JNDIRealm, it might be
that your application creates an LDAP-Query or InitialDirContext without closing it?
Maybe you can use a breakpoint in "com.sun.jndi.ldap.Connection" to see which code is causing the thread to start.
I opened the bug because Tomcat is causing the thread to start, out of the scope of the web-application. It can also happen that a web application causes this leak report but then the application might be able to handle it.
Comment 4 Michael Osipov 2021-09-06 09:23:52 UTC
(In reply to Thomas Hoffmann from comment #3)
> Hello Michael,
> if the leak is reported without using JNDIRealm, it might be
> that your application creates an LDAP-Query or InitialDirContext without
> closing it?
> Maybe you can use a breakpoint in "com.sun.jndi.ldap.Connection" to see
> which code is causing the thread to start.
> I opened the bug because Tomcat is causing the thread to start, out of the
> scope of the web-application. It can also happen that a web application
> causes this leak report but then the application might be able to handle it.

One of the reasons is https://github.com/spring-projects/spring-ldap/issues/489

I have reviewed other spots, but always close/release connections as well as naming enumerations. One reaons could be that I create test connection in startInternal and that could be subject to not the webapp classloader.
I guess I have missed something.
Comment 5 Mark Thomas 2021-09-16 08:20:46 UTC
OpenJDK bug created

https://bugs.openjdk.java.net/browse/JDK-8273874

We will still need to address this in Tomcat to cover the time until we can guarantee that the version of the JRE that Tomcat is running on has the appropriate fix.
Comment 6 Mark Thomas 2021-09-28 07:29:14 UTC
Work-around added in:
- 10.1.x for 10.1.0-M6 onwards
- 10.0.x for 10.0.12 onwards
- 9.0.x for 9.0.54 onwards
- 8.5.x for 8.5.72 onwards

It will need to stay in place for these versions. Once there a fix in the JRE and Tomcat's minimum JRE version is known to include the fix, then we can remove the work-around.