Bug 60182

Summary: SSLStaplingFakeTryLater Deviates From Documented Behavior of Only Being Effective When SSLStaplingReturnResponderErrors is On
Product: Apache httpd-2 Reporter: Andrew Pietila <a.pietila>
Component: mod_sslAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: normal CC: toscano.luca, vincent-apache
Priority: P2 Keywords: FixedInTrunk, PatchAvailable
Version: 2.4.41   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Andrew Pietila 2016-09-28 01:21:44 UTC
In modules/ssl/ssl_util_stapling.c, the following code is used to determine whether to throw an OCSP TryLater failure:

    *prsp = modssl_dispatch_ocsp_request(&uri, mctx->stapling_responder_timeout,
                                         req, conn, vpool);


    if (!*prsp) {
        ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, APLOGNO(01941)
                     "stapling_renew_response: responder error");
        if (mctx->stapling_fake_trylater) {
            *prsp = OCSP_response_create(OCSP_RESPONSE_STATUS_TRYLATER, NULL);
        else {
            goto done;

The mctx->stapling_fake_trylater corresponds with configuration option SSLStaplingFakeTryLater. Per < https://httpd.apache.org/docs/trunk/mod/mod_ssl.html#sslstaplingfaketrylater >:

Only effective if SSLStaplingReturnResponderErrors is also enabled.

However, the configuration variable SSLStaplingReturnResponderErrors is not referenced in the above code. As a result, the fake TryLater is sent if SSLStaplingFakeTryLater is either enabled or non-existant in the configuration file, regardless of presence or absence of SSLStaplingReturnResponderErrors. This causes connectivity issues with Firefox when, say, DNS for the OCSP responder fails.
Comment 1 gmoniker 2020-02-23 10:54:51 UTC
This behaviour in Apache 2.4 is still ongoing in trunk and it is making it very difficult for server operators. 

On the one hand authorities like the Dutch NCSC and privacy focused customers are demanding that OCSP stapling should be enabled, and at the other hand you have a bug in the Apache code that makes it nearly impossible to have stapling enabled while an OCSP responder is unreachable with no immediate connection refusal. The OCSP responders and the browser clients are completely outside the operators control.

Setting FakeTryLater or leaving it on default would really help in this case, because the ResponderTimeOut is only experienced on one TLS connection setup by a client and then the TryLater is cached.

However this is not possible for two reasons:

1. The TryLater if it is made by Apache 2.4 is not marked as an unsuccesful response, so it is cached for the duration of the succesful OCSP response caching time out, which you would want to be long. So, this will mean that even a short unreachability of the OCSP responder, causes a long unavailability of the OCSP staple. Setting FakeTryLater to off, is not a real option either because then each new TLS connection from a browser will again have to suffer the ResponderTimeOut until the server can finally reach the OCSP responder, leading to a potentially huge increase in memory usage and slow website loading.

2. The faked TryLater because it is not marked as an unsuccesful response is served to the clients, even if ReturnResponderErrors is off. In itself this does not violate any RFC and the RFC meaning of TryLater would seem to support this usage. But it is contrary to the expectation from the documentation, and the natural language interpretation of ReturnReponderErrors. Also it specifically causes trouble for FireFox users. If the Firefox browser in default settings (for the past four year anyway) doesn't have a cached OCSP response itself and receives a TryLater, then it will refuse to load the site and give its users an incomprehensible message. This is not Apache's fault, but it is kind of difficult for the server operator to explain to customers.

So, for the server operator at the moment, it is kind of repulsive to activate OCSP stapling in Apache. Apache turns a DOS of the OCSP responder into a DOS of the website it serves. And the fix would really be very simple. It is even already made in the 2.5 branch. Just mark the faked up TryLater as an unsuccesful response. Include *pok = false after the create of the TryLater, just as in https://github.com/apache/httpd/commit/3bd26f8c6b3ed892e0e27747bb5fce1db360ffc1#diff-f4dcc9abff3ef58debc5e15da139ce3d

Then setting ReturnResponderErrors to off works also for the faked TryLater and just gives no staple response then, which is just fine for all the major browsers, as long as they are not set to require a mandatory OCSP response, which is not their default setting. Furthermore the faked TryLater is only stored for SSLStaplingErrorCacheTimeout, which increases the time a valid OCSP response staple can be available. And no TLS connections have to wait for SSLStaplingResponderTimeout seconds, but one in each error cache timeout.

With that fix, I believe setting ReturnResponderErrors to on is still a bit impossible, because it turns an OCSP responder DOS unnecesarily into a website DOS. But thats a different issue, and at least as operator you have one workable configuration, even if it is not the default one.

1. Online Certificate Status Protocol - OCSP https://tools.ietf.org/html/rfc6960
2. Dutch governmental TLS guidelines, saying you have to activate OCSP stapling on the server or give a very good reason why not: https://www.ncsc.nl/binaries/ncsc/documenten/publicaties/2019/mei/01/ict-beveiligingsrichtlijnen-voor-transport-layer-security-tls/ICT-beveiligingsrichtlijnen-voor-Transport-Layer-Security-v2.pdf
Comment 2 Yann Ylavic 2020-02-27 13:28:48 UTC
Since the fake TryLater later response will no longer be sent (cached only) unless ReturnResponderErrors is on, I wonder if we should add a new SSLStaplingReturnResponderErrors notfake (tristate off/on/notfake) to preserve compatibility. Or possibly an other way around, SSLStaplingFakeTryLater off/on/cache.

Thoughts ?
Comment 3 gmoniker 2020-02-27 20:30:29 UTC
@ Yann Ylavic , thanks for the suggestion.

I think in most website hosting operation there is not really much use in delivering TryLater responses and probably not any of the other unsigned OCSP messages either. It is an example of Postel's principle, be conservative in what you send, do not send anything that might seem obscure. So, in those operations you just would want ReturnResponderErrors off. What could clients use an unsigned response for? Would a browser keep retrying a  TLS connection to a webserver in the background if it got a TryLater and then immediately blank the site if it got a response with a retraction?

In some dedicated enviroments it may be useful for Apache to be a 'true' proxy and then a TryLater seems to be semantically correct if Apache waited for a programmed timeout and couldn't reach the origin for that time, no need to consider that a 'fake' response.

So, in both cases for ReturnResponderErrors, FakeTryLater should just be on. In the "off" case for it to be cached for a short while, but NOT returned, and keep the server from retrying too often, and in the "on" case to note that it couldn't provide a signed response after waiting for it.

I commented on this Firefox bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1323141 to try and get that client to move, because they are just a bit too convervative on what is accepted, and together with the current Apache 2.4 behaviour this leads to unnecessary outage for their users.

If there is any additional programming time, it would be nice to work on making it the *most* likely possible, that a staple can be returned. So, inspect the cache for soon to update OCSP responses and try one or several times in advance at different spacings to get a new OCSP response. That would be a security benefit. Or maybe provide a longer timeout option when a certificate has a Must-Staple attribute.
Comment 4 Ruediger Pluem 2020-02-28 10:26:43 UTC
Have you tried using the stapling provided by mod_md?
Comment 5 gmoniker 2020-02-28 11:33:03 UTC
@ Ruediger Pluem

Thanks for the suggestion. I didn't know that it could replace mod_ssl stapling. Will try to compare that.
Comment 6 Stefan Eissing 2020-02-28 11:37:56 UTC
You need the github version for that <https://github.com/icing/mod_md> or use the one from Apache subversion itself. The version od mod_md that supports stapling has not been part of a Apache httpd release.

Several distros, such as Fedora, package the github version of the module. So, it might be available. Check for a version >= 2.2.x

Documentation: <https://github.com/icing/mod_md#mdstapling>
Blog: <https://icing.github.io/mod_md/allthethingspromised.html>
Comment 7 Ruediger Pluem 2020-02-28 12:09:36 UTC
(In reply to Stefan Eissing from comment #6)
> You need the github version for that <https://github.com/icing/mod_md> or
> use the one from Apache subversion itself. The version od mod_md that
> supports stapling has not been part of a Apache httpd release.

Forgot that. Thanks for pointing out. But it will be part of 2.4.42.
Comment 8 gmoniker 2020-03-03 01:01:10 UTC
Thank you all for the helpful comments.

I have finally succeeded on building an installable package of apache 2.4.41 with a 2.2.7 mod_md on Ubuntu 18. The server-status pages and acquiring the certs looks good. Will do some more prodding.

However if a change to not emit the TryLater with ResponderErrors off could be made for 2.4.42 then that would nicely round off a rather nasty edge of the basic mod_ssl functions I think. For getting a robust SSL OCSP response, this mod_md is obviously much better.
Comment 9 gmoniker 2020-03-05 21:21:31 UTC
mod_md OCSP stapling looks good. Only the authoritative responses are sent and the chances of being affected by outage are much smaller then with the mod_ssl module OCSP stapling. Too bad that it will take a while for this to land in LTS distributions.

About mod_ssl:
There is one other point in the mod_ssl module where I think ReturnResponderErrors off should have the effect of suppressing a probably invalid OCSP staple:
Comment 10 Stefan Eissing 2020-03-06 08:03:49 UTC
Thanks for the testing and feedback. Happy to hear that it works for you. Apart from borrowing a time machine, there is not much we can do about LTS.

The point you raise about mod_ssl returning possibly an error when the OCSP response could not be parsed (but was retrieved successfully): that seems to be a very rare case, might indicate someone tinkering or a broken OCSP responder.

For this alone, I would not really change the 2.4.x implementation of mod_ssl. To address the overall weaknesses, I think investing time into the mod_md stapling is better spent.
Comment 11 gmoniker 2020-03-06 20:17:28 UTC
So, then we have to accept that OCSP stapling in 2.4 mod_ssl is fundamentally broken?

I spent some more time looking at the mod_ssl stapling code. Unfortunately this did not improve my outlook of finding a robust stapling config for 2.4.

I had somewhat adopted the feeling that running with `ReturnResponderErrors off` and `FakeTryLater` would be a configuration that was nearly *good*. Just fix the sending out of a TryLater if the OCSP responder was not reachable and it stays up when the OCSP responder is blocked from answering and all clients that I know of can reach the site and actually show it to the user, unless they have set it to mandatory revocation checking and the client locally also cannot find another source of revocation info.

However, I have now noticed that if you run with `ReturnResponderErrors off`, then if a OCSP responder answers with a authoritative revocation, then it is handled by the code as if it was an error that needs to be suppressed, and it stops the revocation from reaching the client. Well............ That means running with responder errors of, becomes pointless. If you never return a revocation, then it is completely useless.

So for 2.4 mod_ssl, two things must be fixed. Not send out a faketrylater AND NOT keep perfectly good revocations from going out. And sending out responses that can't be parsed as basic OCSP responses should also be stopped.

For the hosting operator with a run of the mill production server, this leaves little options. Running with `ResponderErrors off` means that cosmetically it ticks the security boxes of delivering OCSP stapling, but it will never send out revocations it received, cache an outage unnecessarily long and dupe Firefox users when the OCSP responder is blocked. Running with `ResponderErrors on` means that an OCSP responder that is blocked from responding also delivers a much less responsive website because for each new TLS connection it will try again to get an OCSP response cached. And in both settings, it will also return OCSP responses that can't be parsed by openSSL at all.

So, for the moment the hosting operator with Apache can only look to external OCSP caching proxies, to have meaning OCSP stapling, until such moment that mod_md becomes available in 2.2 or higher.

And incidentally, if I look at trunk, the situation is not improving. In trunk, a renewal failure will be translated into a TLS Fatal hangup. So, if you run with OCSP stapling enabled with just mod_ssl then if an OCSP responder is unreachable or produces garbage just when the cached response expired, then from that moment until an OCSP response becomes available, NO client will be able to reach the site.
Comment 12 gmoniker 2020-03-18 08:52:49 UTC
I have published a pull request with modifications to take the a couple of specific blocking problems off of the mod_ssl implementation of OCSP stapling.

Yesterday there was a major outage of Globalsign OCSP responders. I was glad that we didn't have OCSP stapling active yet/anymore, but also sad that there really is no option to do it, even though many SSL operators have pages up that say that you can just enable it in Apache, which I feel is currently far from the truth.

Comment 13 Ruediger Pluem 2020-03-18 11:33:00 UTC
(In reply to gmoniker from comment #12)
> https://github.com/apache/httpd/pull/102

Thanks. Committed to trunk as r1875355 and r1875356.
Comment 14 gmoniker 2020-03-18 18:47:03 UTC
@ Ruediger Pluem

Thanks for merging this into trunk. A little unexpected because I targeted them initially for the 2.4 branch. In trunk as it stands, it is not really an option to run with FakeTryLater off because then a DOS of the OCSP responder is immediately fatal for any new TLS connection with OCSP staple request when the cache runs out.  Also I would caution that with the present state of Firefox it is not an option to run with ReturnResponderErrors set on. So that leaves only `SSLStaplingReturnResponderErrors off` and `SSLStaplingFakeTryLater on` as somewhat robust OCSP stapling config for mod_ssl on its own.

I do realize that these patches do not exactly address the title issue of this bug. The changes are actually far more appropiate for https://bz.apache.org/bugzilla/show_bug.cgi?id=57121. So maybe they should be posted there for Fixed and PatchAvailable. 

I am curious also, is there any chance of these changes merging into a 2.4.42 version perhaps? It would be a shame if it never reaches the 2.4 branch, because then there really is no hope to even get it into a Ubuntu 20 LTS for example as an SRU. mod_md 2.2+ would be an option to work on there.

Thanks for the follow-up.