Bug 61818 - OCSP "SSLUseStapling on" completely blocking the server when something is off with the responder
Summary: OCSP "SSLUseStapling on" completely blocking the server when something is off...
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_ssl (show other bugs)
Version: 2.4.29
Hardware: PC Mac OS X 10.1
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-26 12:57 UTC by Raffaele Sandrini
Modified: 2020-03-02 18:03 UTC (History)
0 users



Attachments
Effect on workers & connections (203.14 KB, image/png)
2020-03-02 17:57 UTC, tomasz.konefal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Raffaele Sandrini 2017-11-26 12:57:08 UTC
This will be a somewhat fuzzy issue because I don't have much data. Please accept my apologies for that.

Today our production site went offline because it was impossible to connect to it using TLS. The httpd error log just showed this error: 

AH01941: stapling_renew_response: responder error

without any supporting information. There was no indication that some name could not be resolved or some IP not be reached.

The server is using the event MPM and pretty quickly all slots were in status "R" and the server reported:

AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
and
AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.

Hence, the site was offline.

Our stapling configuration:

SSLUseStapling on
SSLStaplingResponderTimeout 5
SSLStaplingReturnResponderErrors off
SSLStaplingCache shmcb:/var/run/ocsp(128000)

I am not an export but from this configuration and the supporting documentation I conclude that this situation should never have happened. Even with the OCSP server not being available it should have just continued without "stapling" the response.

Hence, this bug report.

Note 1: The certificate in question is issued by GoDaddy EV CA and I could personally not confirm any issue with their OCSP service.

Note 2: At the same time vhosts using Let's Encrypt certificates still worked with stapling enabled leading to the conclusion that there was something up with GoDaddy. However as stated above, the error log did not indicate anything.
Comment 1 Christophe JAILLET 2017-11-26 15:20:51 UTC
This is odd.
All paths that lead to this error (AH01941) seem to have some additional information logged at APLOG_ERR level.
Comment 2 Raffaele Sandrini 2017-11-26 16:50:53 UTC
I just rescanned the log files, vhost specific and server log file, and I could not find any other related messages than the ones mentioned above (AH01941, AH00484 and AH03490).

Also to add, I restarted Apache several times and consistently got into that state until I eventually disabled OCSP stapling (setting "SSLUseStapling off").
Comment 3 tomasz.konefal 2020-03-02 17:57:01 UTC
Created attachment 37055 [details]
Effect on workers & connections
Comment 4 tomasz.konefal 2020-03-02 18:03:48 UTC
One of our hosted sites has a certificate with crl.usertrust.com (151.139.128.14) as a CRL Distribution point and ocsp.usertrust.com (151.139.128.14) for OCSP in the Authority Information Access field.

We are able to reproduce symptoms like this when the above IP is blocked outbound from the web server.

Please see the above attached image indicating what happens to the worker threads and connection count when the block is enabled (~09h42) and later disabled (~09h52).

Unfortunately, there are no meaningful logs to go along with this.