Bug 62892 - Memory leak when performing client certificate validation with OCSP
Summary: Memory leak when performing client certificate validation with OCSP
Alias: None
Product: Tomcat Native
Classification: Unclassified
Component: Library (show other bugs)
Version: 1.2.17
Hardware: PC Linux
: P2 critical (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2018-11-07 15:15 UTC by Sander Benschop
Modified: 2018-11-23 13:56 UTC (History)
0 users

Figures 1 & 2 (449.13 KB, image/png)
2018-11-07 15:15 UTC, Sander Benschop

Note You need to log in before you can comment on or make changes to this bug.
Description Sander Benschop 2018-11-07 15:15:05 UTC
Created attachment 36251 [details]
Figures 1 & 2

We are using the Tomcat APR connector in our application to perform client-certificate validation with OCSP checks. We've noticed a gradual increase in the memory consumed by the Java process until the system runs out of memory and the OOM-killer we configured kills and restarts the process.

The application we created is queried often (every second by two simultaneous clients). We have tested this with two types of client certificates from two different root CA's: PKIoverheid (the root certificate of the Dutch national government) and Comodo certificates, both containing OCSP urls. We first noticed the problem with the PKIoverheid certificates, which are larger in size than the Comodo certificates. In figure 1, showing the available server memory, you can see that using these larger PKIoverheid certificates the server runs out of memory every 2,5 - 3 hours. Afterwards we tried the same thing with smaller Comodo certificates (see figure 2) which has the same result but takes a longer time (15 hours).

When we turned off the client certificate validation by either commenting out the call to X509_verify_cert in OpenSSL (which in turn calls Tomcat Native's SSL_callback_SSL_verify that performs the OCSP checks) or setting SSLVerifyClient to "none" and clientAuth to "false" in the APR connector the server did not run out of memory and the graph of available memory flatlines.

I have tested this with the Apache Native Library v1.2.17, Tomcat v9.0.12, APR v1.5.2 and JDK v1.8.0_181 running on an Ubuntu 16.04.5 server. On the JBoss jira I spotted a similar issue where somebody used different versions but had the same problem: https://issues.jboss.org/browse/JWS-1140.
Comment 1 Sander Benschop 2018-11-12 08:08:28 UTC
I have further isolated the issue by replacing the verify_cb function 'SSL_callback_SSL_verify' (from the Tomcat Native Library) with a no-op function. When I do this the available memory remains constant, our test server didn't run out of memory all weekend with the same polling frequency as before.
Comment 2 jfclere 2018-11-12 16:04:32 UTC
replacing SSL_callback_SSL_verify() by no-op disable all the OSCP checks, that is probably not what you want to do... But yes that shows that the leak is somewhere in SSL_callback_SSL_verify().
Comment 3 Sander Benschop 2018-11-12 16:15:26 UTC
You are correct jfclere, I indeed only tried this in an attempt to isolate the cause of the leak. I should have been more clear in my previous comment :-)
Comment 4 jfclere 2018-11-12 22:42:54 UTC
The problem is OCSP_parse_url() we have forgotten:
I will commit the fix tomorrow, testing it now.
Comment 5 jfclere 2018-11-13 09:26:43 UTC
Try with r1846499, I still have another memory leak but can't find where.
Comment 6 Sander Benschop 2018-11-13 10:16:21 UTC
I was getting errors in the Python build script when running the buildconf file:

ImportError: No module named 'ConfigParser'

And I tried to run the buildcheck.sh file which reported I didn't have Python installed, but I do:

sander:/tmp$ python
Python 2.7.12 (default, Dec  4 2017, 14:50:18)

So for now I've applied the patch you suggested to the downloaded sources of Tomcat Native Library 1.2.17 I was using.

Thank you for the fix! I will report back in a few hours.
Comment 7 jfclere 2018-11-13 13:11:23 UTC
ImportError: No module named 'ConfigParser'
that is because you are using python... You need an apr version that supports python3 or use python2.
Comment 8 Sander Benschop 2018-11-13 14:48:23 UTC
Ok, I will try again to build the code from SVN and see if it makes a difference, but right now the server still runs out of memory.

I have added the three lines of code you suggested in this place:


    if(apr_sock && ok) /* if ok == 0 we have already closed the socket */



    // Manually added code
    // End manually added code
    return ocsp_resp;

It seems that this does have a positive effect on the memory usage, it now took 4,5 hours to run out of memory rather than 3 but the end result is still the same. I will report back when I've tried the exact commit in SVN.
Comment 9 jfclere 2018-11-13 15:14:35 UTC
OK I know that adding:
is not enough, but I am happy it helps ;-)
Comment 10 jfclere 2018-11-14 16:59:12 UTC
try with http://svn.apache.org/viewvc?rev=1846593&view=rev
I think I have fixed all the leaks now.
Comment 11 Sander Benschop 2018-11-16 06:18:52 UTC
It worked! I updated the code yesterday and the server still hasn't run out of memory. In the graph I can see that it stabilises nicely. Thank you so much jfclere! :-)