Bug 57121

Summary: ocsp stapling should not pass temporary server outages to clients
Product: Apache httpd-2 Reporter: Björn Jacke <bjoern>
Component: mod_sslAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: major CC: chris, fabian, schnederle, thomas+bz.apache.org, thomassen, vincent-apache
Priority: P2    
Version: 2.4.6   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Björn Jacke 2014-10-21 10:36:11 UTC
those mod_ssl oscp default values are set here:

SSLStaplingResponseMaxAge -1 (so the entries should be valid a much more than an hour)
SSLStaplingStandardCacheTimeout 3600 (so after one hour a new ocsp request is being done by mod_ssl)

not I had saw the case that after one hour mod_ssl tried to refresh the ocsp rely from the ocsp server but i see in the proxy log that the ocsp server could not be reached. Now instead of attaching the previous (still valid) ocsp reply to the server certificate to the clients it was attaching a "try later" ocsp error in the reply to the client. As a result of that the client (firefox 33 here) was displaying an error message that there is a problem with the ocsp status of the server certificate.

If mod_ssl still have an old but valid ocsp reply in the cache it should never replace that with a "try later" ocsp error. Also setting "SSLStaplingReturnResponderErrors off" is not an option because the site might have a must-staple policy defined.
Comment 1 Jeff Trawick 2014-11-22 19:43:56 UTC
SSLStaplingStandardCacheTimeout directly controls cache expiration, so once the 3600 seconds elapses the old response is no longer available.  Thus, the previous response can't be used as a fallback if the responder can't be reached after the timeout expires.

SSLStaplingResponseMaxAge is a final sanity check on a response, so it doesn't help here.

Ideally a response would be refreshed well before cache expiration, without blocking other threads (which continue to use the previously cached response), and without removing a valid entry from the cache if a temporary communication error occurs.
Comment 2 Björn Jacke 2016-11-02 11:02:46 UTC
So we still need a new parameter like "SSLStaplingRefreshTime" then. Actually this is required to get an ocsp implementation that is stable enough not to cause problems with OCSP server that don't have perfect availability or to be able to enable stapling by default one day.
Comment 3 Fabian Wenk 2017-08-28 14:32:57 UTC
Instead of adding new SSLStaplingRefreshTime, why not just use a fraction (e.g. half) of SSLStaplingStandardCacheTimeout to refresh?
But still, if it fails at refresh time, then it has to work when cache timeout is reached.

Maybe something like this could work better / be more safe:
Try to refresh at half of SSLStaplingStandardCacheTimeout, if it fails try to refresh more often, e.g. every 1/10 of SSLStaplingStandardCacheTimeout until it succeeds and then SSLStaplingStandardCacheTimeout starts again at max. Or simply just try to refresh at every 1/10 of SSLStaplingStandardCacheTimeout.

To make less requests to the CA, set default of SSLStaplingStandardCacheTimeout to 86400 (1 day), so the refresh will happen every 2.4 hours.
Comment 4 Damien B 2020-01-31 16:53:38 UTC
This bug is still unsolved after nearly 5 years...

We experienced today an outage due to Digicert OCSP server failure.
The only solution was to disable OCSP Stappling.

Luckily we were not using ssl cert with OCSP Must-Staple option!

This should be considered as a high priority bug.

OCSP server are not reliable and they can be down for several hours (like today) or even days (like in 2017 for let's encryt).

Some people have even built OCSP-proxy to fix this beaviour and do the job instead of apache:
https://community.letsencrypt.org/t/robust-ocsp-stapling-with-apache-httpd/87896
Comment 5 Damien B 2020-01-31 16:59:47 UTC
*** Bug 63231 has been marked as a duplicate of this bug. ***
Comment 6 Björn Jacke 2020-01-31 17:52:22 UTC
Apache with mod_ssl and ocsp is only reliably usable if you change some settings similar like this:

  SSLUseStapling          on
  SSLOCSPProxyURL http://your.proxy.if.you.have.one:3128/
  SSLStaplingResponderTimeout 4
  SSLStaplingReturnResponderErrors off
  SSLStaplingCache        shmcb:/var/run/ocsp(128000)
  SSLStaplingStandardCacheTimeout 172800
  SSLStaplingErrorCacheTimeout 60

As you can see this is a know problem since more than 5 years and would be simple to fix with different default values. What I can recommend to you instead: Just stop using mod_ssl in Apache and use for examle HAProxy as reverse proxy which does TLS termination also.
Comment 7 Ruediger Pluem 2020-01-31 19:44:17 UTC
You might want to try the stapling support in mod_md which also works for non managed domains:
http://httpd.apache.org/docs/2.4/mod/mod_md.html#mdstapleothers
Comment 8 gmoniker 2020-03-18 19:15:59 UTC
Actually some of the outage issues are already changed in Apache trunk. However if you would run the trunk version with SSLStaplingFakeTrylater off, then if the OCSP cache runs out when the OCSP responder is out of action, then all new TLS connections with a staple request will be hung up immediately by Apache for the duration. I would also recommend to run it with ReturnResponderErrors Off, to avoid shutting down some clients (only the latest trunk version after https://github.com/apache/httpd/commit/6289dfffa43b142bed34629967a4f1a4cf051171)

The 2.4 branch is only fully usable with OCSP stapling in mod_ssl if you use a separate caching proxy for responses.

Let me explain what happens in 2.4 (up to 41) if you run it with OCSP stapling On and without a separate reliable OCSP cache in different settings and why I say it is not usable in light of (inevitable) OCSP responder outages:

With ReturnResponderErrors off and FakeTryLater on, it will continue to run when the OCSP responders are unreachable without performance problems but it will shut out Firefox users if they have no local stored OCSP response. And also more importantly with ReturnResponderErrors Off it will NEVER PASS ON ANY REVOCATION MESSAGE FROM THE ORIGINAL OCSP RESPONDER. So even if it still performant and Firefox solved their problem, then it still does NOT make any sense to run it with ReturnResponderErrors off. Combined with FakeTryLater off you will run into the same problem as with the ReturnResponderErrors On setting.

If you run with ReturnResponderErrors On, then an outage of the OCSP responder when the cache runs out, will let every new TLS connection with an OCSP staple request hang for the duration of the Responder Timeout setting in Apache. Also Apache request threads will have continuous contention for the stapling_refresh_mutex.


See also:
https://bugzilla.mozilla.org/show_bug.cgi?id=1323141
Comment 9 vincent-apache@vinc17.net 2020-04-04 00:38:28 UTC
(In reply to Damien B from comment #4)
> Some people have even built OCSP-proxy to fix this beaviour and do the job
> instead of apache:
> https://community.letsencrypt.org/t/robust-ocsp-stapling-with-apache-httpd/
> 87896

Unfortunately, this does not seem to solve the problem since Apache also has connection issues locally! I got an error because of a timeout when Apache connected to this local OCSP proxy: bug 64306.