Bug 64306 - Error "AH01977: failed reading line from OCSP server" with local OCSP proxy due to timeout
Summary: Error "AH01977: failed reading line from OCSP server" with local OCSP proxy d...
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_ssl (show other bugs)
Version: 2.4.38
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-04 00:29 UTC by vincent-apache@vinc17.net
Modified: 2020-05-08 00:48 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description vincent-apache@vinc17.net 2020-04-04 00:29:46 UTC
Due to bug 57121 / bug 60182, I've been using a local OCSP proxy since 2020-03-18, as suggested in bug 57121 comment 4, using

        SSLOCSPProxyURL http://localhost:8888/

in the /etc/apache2/mods-available/ssl.conf file. I had no problems for a couple of weeks, but an hour ago, I got an OCSP error, and when I looked at the Apache error logs, I saw:

[Sat Apr 04 01:08:02.538928 2020] [ssl:error] [pid 25395:tid 140479203022592] (70007)The timeout specified has expired: [client 157.55.39.253:4878] AH01977: failed reading line from OCSP server
[Sat Apr 04 01:08:02.577255 2020] [ssl:error] [pid 25395:tid 140479203022592] [client 157.55.39.253:4878] AH01980: bad response from OCSP server: (none)
[Sat Apr 04 01:08:02.577385 2020] [ssl:error] [pid 25395:tid 140479203022592] AH01941: stapling_renew_response: responder error

According to the ocsp_proxy logs, Apache httpd connects to the proxy every hour, and the last one was:

Apr 04 01:07:54 joooj ocsp_proxy[7642]: [info] connection from ::1:33004

I assume that this is this one that led to the timeout.

At this time, according to journalctl information, the machine was idle (one log message at 01:05:19, and the next one after ocsp_proxy at 01:08:16), so that there are external reasons for a timeout.
Comment 1 gmoniker 2020-04-08 07:46:26 UTC
At the face of it, this would be an error on the part of the OCSP proxy. In all likelihood it didn't respond timely to the OCSP query from Apache.

There are a couple of non trivial things that I saw in the stapling code of Apache that could go wrong with a proxy.

Apache reads the OCSP proxy answer in HTTP 1.0 fashion. So, if the OCSP proxy returns the answer, but doesn't cut the connection immediately at the end of it, there is a risk of Apache not receiving it before the timeout, even if it is in fact completely sent. It may also be that the OCSP proxy couldn't get a response from the origin OCSP responder itself and didn't have anything cached. If it just hangs up before Apache times out its request, it won't be a problem apart from the response not being stapled, but if it keeps Apache waiting, then you get a renewal error in Apache. So, an OCSP proxy needs to have a timeout itself, that is lower then that of the Apache timeout setting.

The second, but unlikely, thing is that a client may be sending unsupported stapling extensions in its own request. Apache passes through the stapling request extensions it receives from the client, as is required by OCSP stapling RFC. It may just so happen that a client that triggers the OCSP renewal in Apache sent an extension that the proxy you uses gets stuck with. But again this seems unlikely.
Comment 2 vincent-apache@vinc17.net 2020-04-08 08:58:40 UTC
First note that in the apache2 logs, I could see another OCSP error (tough I did not have the chance to see it on the client side):

[Sun Mar 29 04:03:05.876938 2020] [ssl:error] [pid 3644:tid 140478943012608] (70007)The timeout specified has expired: [client 66.249.79.246:37979] AH01977: failed reading line from OCSP server
[Sun Mar 29 04:03:05.877734 2020] [ssl:error] [pid 3644:tid 140478943012608] [client 66.249.79.246:37979] AH01980: bad response from OCSP server: (none)
[Sun Mar 29 04:03:05.877867 2020] [ssl:error] [pid 3644:tid 140478943012608] AH01941: stapling_renew_response: responder error

In the ocsp_proxy logs, the cache was refreshed less than 1 hour ago:

Mar 29 03:11:49 joooj ocsp_proxy[7642]: [info] refreshing ocspxy_b390a7d8c9af4ecd613c9f7cad5d7f41fd6930ea0x4aa43b32a0f5e7b491c2881d02818ca5
Mar 29 04:02:57 joooj ocsp_proxy[7642]: [info] connection from ::1:57056

and it was not refreshed again until Mar 30 03:12:25, which follows the ocsp_proxy documentation: "OCSP responses are stored in a redis db and are refreshed on daily basis, or hourly if the validity period is at half-time."

(In reply to gmoniker from comment #1)
> At the face of it, this would be an error on the part of the OCSP proxy. In
> all likelihood it didn't respond timely to the OCSP query from Apache.

I would have said that the code of the OCSP proxy is simple enough so that it should not have any issue.

> Apache reads the OCSP proxy answer in HTTP 1.0 fashion. So, if the OCSP
> proxy returns the answer, but doesn't cut the connection immediately at the
> end of it, there is a risk of Apache not receiving it before the timeout,
> even if it is in fact completely sent.

The code of the proxy is of the form:

  while (my $c = $daemon->accept) {
    info("connection from %s:%d", $c->peerhost, $c->peerport);
    REQ: while (my $r = $c->get_request) {
[...]
      $c->send_response($client_res);
    }
    debug("disconnecting %s:%d", $c->peerhost, $c->peerport);
    $c->close;

This means that the OCSP proxy disconnects immediately, unless for some reason Apache sends another request. I could enable debug messages to have more information of what really happens.

> It may also be that the OCSP proxy
> couldn't get a response from the origin OCSP responder itself and didn't
> have anything cached.

It appears from its source code that ocsp_proxy does not log anything in this case by default (only with debug messages), unless an error occurred. But it is very unlikely that this occurred since on March 29, the request by Apache was done less than 1 hour after the refresh. Moreover, on April 4, I had restarted the apache2 service after the issue in order to clear the error, and I had checked with:

$ openssl s_client -connect vinc17.net:443 -status
CONNECTED(00000003)
depth=2 C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority
verify return:1
depth=1 C = FR, ST = Paris, L = Paris, O = Gandi, CN = Gandi Standard SSL CA 2
verify return:1
depth=0 CN = vinc17.net
verify return:1
OCSP response:
======================================
OCSP Response Data:
    OCSP Response Status: successful (0x0)
    Response Type: Basic OCSP Response
    Version: 1 (0x0)
    Responder Id: B390A7D8C9AF4ECD613C9F7CAD5D7F41FD6930EA
    Produced At: Apr  3 00:49:21 2020 GMT
    Responses:
    Certificate ID:
      Hash Algorithm: sha1
      Issuer Name Hash: A5E2344EF5763A9CE2F31E9B9807B0075727A5F9
      Issuer Key Hash: B390A7D8C9AF4ECD613C9F7CAD5D7F41FD6930EA
      Serial Number: 4AA43B32A0F5E7B491C2881D02818CA5
    Cert Status: good
    This Update: Apr  3 00:49:21 2020 GMT
    Next Update: Apr 10 00:49:21 2020 GMT
[...]

Since there was a refresh at Apr 03 03:15:08 CEST (i.e. UTC+2), this would also mean that the cache did not need an update.

> The second, but unlikely, thing is that a client may be sending unsupported
> stapling extensions in its own request. Apache passes through the stapling
> request extensions it receives from the client, as is required by OCSP
> stapling RFC. It may just so happen that a client that triggers the OCSP
> renewal in Apache sent an extension that the proxy you uses gets stuck with.
> But again this seems unlikely.

This does not seem to be the case with ocsp_proxy. Well, the only thing that matters to send its response is the cache key, which seems to depend only on issuerKeyHash and serialNumber. Anyway, according to the ocsp_proxy logs, only one cache key is refreshed (on a daily basis).
Comment 3 vincent-apache@vinc17.net 2020-04-24 00:51:18 UTC
The problem occurred again with ocsp_proxy started in debug mode, and I can see that the response was sent rather late (it is usually immediate):

Apr 22 23:01:14 joooj ocsp_proxy[824]: [info] connection from ::1:47608
Apr 22 23:01:22 joooj ocsp_proxy[824]: [debug] cache key is ocspxy_b390a7d8c9af4ecd613c9f7cad5d7f41fd6930ea0x4aa43b32a0f5e7b491c2881d02818ca5
Apr 22 23:01:23 joooj ocsp_proxy[824]: [debug] sending response
Apr 22 23:01:23 joooj ocsp_proxy[824]: [debug] disconnecting ::1:47608

Now, I don't know where the time has been spent between the first two lines. After a look at the ocsp_proxy, this currently doesn't rule out Apache, since I don't know how much time has been spent between the connection and the end of the request. I've added a debug message just after the get_request and restarted ocsp_proxy. This will tell us whether the time is spent in the request itself or its analysis by ocsp_proxy.
Comment 4 vincent-apache@vinc17.net 2020-05-08 00:48:21 UTC
On May 1:

May 01 01:04:24 joooj ocsp_proxy[715]: [info] connection from ::1:38408
May 01 01:04:29 joooj ocsp_proxy[715]: [debug] got request
May 01 01:04:34 joooj ocsp_proxy[715]: [debug] cache key is ocspxy_b390a7d8c9af4ecd613c9f7cad5d7f41fd6930ea0x4aa43b32a0f5e7b491c2881d02818ca5
May 01 01:04:35 joooj ocsp_proxy[715]: [debug] sending response
May 01 01:04:36 joooj ocsp_proxy[715]: [debug] disconnecting ::1:38408

I've eventually found with atop that ocsp_proxy was almost entirely swapped out shortly before (from 11 MB RSIZE to 3.6 MB RSIZE), and the cause was that I, as a normal user, had run a process that needed much memory. And it was back to 11 MB after the "disconnecting".

Given the fact that the kernel can swap out a small system process like ocsp_proxy because a non-system process needs much memory, I think that apache should take such a case into account, i.e. increase the default timeout. A 8-second timeout seems very low anyway, in particular for something that would yield an error and make the server unavailable (due to bug 57121). One generally sees timeouts from 1 to 3 minutes. And I don't see why a timeout value larger than 8 seconds should not be used here.