Bug 59797

Summary: Per thread error hash grows indefinitely
Product: Tomcat Native Reporter: Mark Thomas <markt>
Component: LibraryAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 1.2.9   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Patch to expose the thread cleaning method
Patch to clean the thread on every access
Patch that triggers cleaning via the thread pool
Native only patch to release ssl errors on thread exit
Native patch to handle releasing ssl errors on thread exit

Description Mark Thomas 2016-07-04 19:52:05 UTC
OpenSSL maintains a hash of recent errors per thread.

The way the tc-native code is written, a new entry is created for any thread that performs a TLS read or write.

The hash is never cleaned up so it grows indefinitely.

The greater the turn-over of threads (e.g. due to executor configuration or the use of async reads or writes on non-container threads), the greater the problem.

While the leak has been observed, there are no known instances (so far) of it growing to the point where it causes problems.,

For full details see this thread:
https://lists.apache.org/thread.html/6d61c7169e37841b594f10c99952a80a5bb65a42958d36dcc6f7b4ae@%3Cdev.tomcat.apache.org%3E
Comment 1 Mark Thomas 2016-07-04 19:55:28 UTC
Created attachment 34005 [details]
Patch to expose the thread cleaning method

For discussion...
Comment 2 Mark Thomas 2016-07-04 19:58:42 UTC
Created attachment 34006 [details]
Patch to clean the thread on every access
Comment 3 Mark Thomas 2016-07-04 19:59:09 UTC
Created attachment 34007 [details]
Patch that triggers cleaning via the thread pool
Comment 4 Remy Maucherat 2016-07-04 20:15:07 UTC
The fix for the issue doesn't look very pretty.
Comment 5 Mark Thomas 2016-07-04 20:20:25 UTC
Just adding some notes...

Fixing it in Tomcat means an additional native call (relatively slow) for every call to clean the current thread.

Fixing it in native means a call to clean after every read or write which is far more often than necessary any may cause contention for the hash write lock.

Fixing it in the thread pool is good in that the calls to clean are minimized. But this excludes calls from non-container threads.

The problem goes away with OpenSSL 1.1.0 but that is not stable yet and probably won't be designated an LTS release.

Given how invasive the fixes look to be, I think Id prefer to think about this some more to see if a cleaner fix can be found.
Comment 6 Konstantin Kolinko 2016-07-06 00:18:17 UTC
The thread pool patch has RAT / checkstyle issues.

The native side patch has the following fragment:

> #if (OPENSSL_VERSION_NUMBER < 0x10000000L) || defined(OPENSSL_USE_DEPRECATED)
>     ERR_remove_state(0);
> #elif (OPENSSL_VERSION_NUMBER < 0x10100000L)
>     ERR_remove_thread_state(NULL);
> #endif

I do not understand "|| defined(OPENSSL_USE_DEPRECATED)" part of the above condition. I think that ERR_remove_thread_state() is the correct cleanup function and shall be used as soon as it is available, so the OPENSSL_USE_DEPRECATED part of the condition should be removed.

Motivation:

1. Looking into implementation of ERR_remove_state() in crypto/err/err.c of openssl-1.0.2d  it ignores its argument. I think such implementation is bad and should not be invoked.

#ifndef OPENSSL_NO_DEPRECATED
void ERR_remove_state(unsigned long pid)
{
    ERR_remove_thread_state(NULL);
}
#endif

2. Deprecation of ERR_remove_state() is clearly mentioned in CHANGES file of openssl-1.0.2d as a change "between 0.9.8n and 1.0.0  [29 Mar 2010]". I see no reason to call the old function. Also OpenSSL 0.9.8 is no longer supported. Tomcat-Native 1.2.x requires OpenDDL 1.0.2+ (as documented in native/BUILDING file).


(In reply to Mark Thomas from comment #5)
> Fixing it in the thread pool is good in that the calls to clean are
> minimized. But this excludes calls from non-container threads.

ERR_remove_thread_state() can be called with an explicit thread id. If we know ids for threads that completed execution (and are gc'ed) we can perform cleanup for them from a different thread.

I think that monitoring of completed (gc'ed) threads can be performed with a WeakReference and a ReferenceQueue that is processed by a background thread.
Comment 7 Christopher Schultz 2016-07-06 21:04:32 UTC
(In reply to Konstantin Kolinko from comment #6)
> ERR_remove_thread_state() can be called with an explicit thread id. If we
> know ids for threads that completed execution (and are gc'ed) we can perform
> cleanup for them from a different thread.
> 
> I think that monitoring of completed (gc'ed) threads can be performed with a
> WeakReference and a ReferenceQueue that is processed by a background thread.

If this is indeed possible and not fragile, it should be done this way: tricking each thread into calling the cleanup() method will either be badly-performing, error-prone, or both.
Comment 8 Nate Clark 2017-01-05 22:17:13 UTC
Created attachment 34597 [details]
Native only patch to release ssl errors on thread exit

I took a look at how openssl implemented error releasing in 1.1, which now does it automatically and used that to implement similar behavior in tomcat-native.

On windows clean up is invoked on thread detach and on other platforms a thread local with a destructor is used. Whenever ERR_clear_error or ERR_get_error is called the thread local is set. When the thread exits the destructor will be invoked on any thread which does not have a NULL value for the thread local.

I tested this with openssl 1.0.2. I compiled tomcat native against 1.1.0 but I have not done any windows testing since I don't have a windows build system.
Comment 9 Nate Clark 2017-01-27 20:31:49 UTC
Created attachment 34686 [details]
Native patch to handle releasing ssl errors on thread exit

Same as previous patch but added error handling around creating the thread local.
Comment 10 Mark Thomas 2017-02-07 10:24:40 UTC
Thanks for the patch. It has been applied (with some changes on the Windows side - primarily to handle x64 and to avoid a crash on JVM termination) to 1.2.x and will be included in 1.2.11 onwards.