Bug 54357

Summary: Crash during restart or at startup in mod_ssl, in certinfo_free() function registered by ssl_stapling_ex_init()
Product: Apache httpd-2 Reporter: Rainer Jung <rainer.jung>
Component: mod_sslAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED FIXED    
Severity: normal CC: alex, alp, jorton, steve
Priority: P2 Keywords: FixedInTrunk
Version: 2.5-HEAD   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Proof of concept patch to address the issue
v2 Proof of concept patch to address the issue
v3 Proof of concept patch to address the issue
v3 Proof of concept patch to address the issue (with signed-of-by)
v4 Proof of concept patch to address the issue
v5 Proof of concept patch to address the issue
v6 Proof of concept patch to address the issue
Patch v7 - store stapling certinfo in a global hash, based on work by Alex Bligh
Patch v8 (based on work by Alex Bligh) - store stapling certinfo in a pconf-allocated hash

Description Rainer Jung 2012-12-28 20:51:07 UTC
Reason for crash: the stapling code in mod_ssl implements a free_func which it registers in OpenSSL as a function pointer. After restart, the function address registered during the original startup is still used, because the OpenSSL libs were not reloaded, but due to a different module load order, mod_ssl.so was loaded at another address. The function pointer points to memory now used for other stuff and calling the function there crashes.

It seems in my config the OpenSSL libs don't get reloaded during restart, so still hold the original reference.

Stack:

#0  0xfee71920 in ?? ()
#1  0xfec4c91c in int_free_ex_data (class_index=<optimized out>, obj=0x1cfc48, ad=0x1cfc60) at ex_data.c:522
#2  0xfec4c65c in CRYPTO_free_ex_data (class_index=class_index@entry=10, obj=obj@entry=0x1cfc48, ad=ad@entry=0x1cfc60) at ex_data.c:592
#3  0xfeced474 in x509_cb (operation=operation@entry=3, pval=pval@entry=0xffbfd68c, it=it@entry=0xfed9c358, exarg=exarg@entry=0x0) at x_x509.c:113
#4  0xfecf2a30 in asn1_item_combine_free (pval=0xffbfd68c, it=0xfed9c358, combine=0) at tasn_fre.c:173
#5  0xfecf2c38 in ASN1_item_free (val=0x1cfc48, it=0xfed9c358) at tasn_fre.c:71
#6  0xfe9ccf74 in ssl_pphrase_Handle (s=0xb8ee0, p=0xf3e08) at ssl_engine_pphrase.c:295
#7  0xfe9c13cc in ssl_init_Module (p=0x93c48, plog=<optimized out>, ptemp=0xf3e08, base_server=0xb8ee0) at ssl_engine_init.c:368
#8  0x0004ae0c in ap_run_post_config (pconf=pconf@entry=0x93c48, plog=0xeabd8, ptemp=0xf3e08, s=0xb8ee0) at config.c:105
#9  0x00065b98 in main (argc=5, argv=0xffbff90c) at main.c:765

The address 0xfee71920 after the restart lies between two memory segments used by the prefork MPM:

FEE60000      24K r-x--  /shared/build/autobuild/install/apache24-2.4.4_dev_r1422615_1.0.1csp1-X3.solaris10.sparc/modules/mod_mpm_prefork.so
FEE74000      16K rwx--  /shared/build/autobuild/install/apache24-2.4.4_dev_r1422615_1.0.1csp1-X3.solaris10.sparc/modules/mod_mpm_prefork.so


Directly after the original apache start it is in the region occupied by mod_ssl:

FEE50000     192K r-x--  /shared/build/autobuild/install/apache24-2.4.4_dev_r1422615_1.0.1csp1-X3.solaris10.sparc/modules/mod_ssl.so
FEE8E000      16K rwx--  /shared/build/autobuild/install/apache24-2.4.4_dev_r1422615_1.0.1csp1-X3.solaris10.sparc/modules/mod_ssl.so

The problematic code is in modules/ssl/ssl_util_stapling.c (trunk and 2.4):

 68 static int stapling_ex_idx = -1;
 69
 70 void ssl_stapling_ex_init(void)
 71 {
 72     if (stapling_ex_idx != -1)
 73         return;
 74     stapling_ex_idx = X509_get_ex_new_index(0, "X509 cached OCSP info", 0, 0,
 75                                             certinfo_free);
 76 }

It registers the function certinfo_free in OpenSSL via a function pointer. The function is implemented in mod_ssl (ssl_util_stapling.c) which can be loaded into another adress range after restart (when linked dynamically). Then the pointer is no longer valid:

AH00060: seg fault or similar nasty error detected in the parent process

Currently I don't know how o fix that:

- there seems to be no API to remove an index once registered.
- there seems to be no API to set the free_func to a new value for an existing index (maybe I didn't look thorougly enough)

Note that furthermore there's a slight order problem in ssl_init_Module: it first calls ssl_pphrase_Handle(), then ssl_stapling_ex_init(). After restart, the call to ssl_pphrase_Handle() already triggers the problem before the next ssl_stapling_ex_init() could fix it.

Modules and libs are build as DSOs, but it seems only the modules get reloaded but not the SSL libs.

Reproducible by running start-restart-stop in a loop. No requests need to be send. Another scenario is adding modules to the LoadModule list before restart.

Config used (a more complex config makes the crash happen more frequently):

Define base "/path/to/my/apache/installation"
Define user "myuser"
Define group "mygroup"
Define sslport "8443"
Define cert "snakeoil-rsa"

ServerRoot "${base}"
ServerName "www.snakeoil.dom"
Listen ${sslport}

LoadModule socache_shmcb_module modules/mod_socache_shmcb.so
LoadModule slotmem_shm_module modules/mod_slotmem_shm.so
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule env_module modules/mod_env.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule session_module modules/mod_session.so
LoadModule session_crypto_module modules/mod_session_crypto.so
LoadModule ssl_module modules/mod_ssl.so
LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
LoadModule unixd_module modules/mod_unixd.so

<IfModule unixd_module>
  User ${user}
  Group ${group}
</IfModule>

<IfModule ssl_module>
  SSLPassPhraseDialog  builtin
  SSLSessionCache        "shmcb:run/ssl_scache.base(512000)"
  SSLSessionCacheTimeout  300
  SSLRandomSeed startup builtin
  SSLRandomSeed connect builtin
</IfModule>

<VirtualHost _default_:${sslport}>
  SSLEngine on
  SSLCertificateFile "conf/ssl.crt/${cert}.crt"
  SSLCertificateKeyFile "conf/ssl.key/${cert}.key"
</VirtualHost>
Comment 1 Stefan Fritsch 2013-01-01 15:55:12 UTC
(In reply to comment #0)
> Modules and libs are build as DSOs, but it seems only the modules get
> reloaded but not the SSL libs.

A workaround may be to add a function to apr-util that allows to unload all crypto drivers and call that function on restart. Or make apr-util unload the driver when the pool passed to apr_crypto_get_driver() is destroyed (doing refcounting to ensure that no user of the driver remains). This also requires changes in apu_dso.c, though.
Comment 2 Graham Leggett 2013-10-13 13:42:07 UTC
My understanding is that openssl implements reference counting itself to handle being configured two or more times.

If mod_ssl is registering a function pointer with openssl, it should be deregistering that pointer in a cleanup in a suitable pool. mod_ssl cannot assume it is the only consumer of openssl in the server.
Comment 3 Rainer Jung 2013-10-13 14:37:54 UTC
(In reply to Graham Leggett from comment #2)
> If mod_ssl is registering a function pointer with openssl, it should be
> deregistering that pointer in a cleanup in a suitable pool. mod_ssl cannot
> assume it is the only consumer of openssl in the server.

As I wrote originally:

- there seems to be no API to remove an index once registered.
- there seems to be no API to set the free_func to a new value for an existing index (maybe I didn't look thorougly enough)

So it is unclear to me how mod_ssl can deregister the pointer in OpenSSL. I didn't find am appropriate call in the OpenSSL API.
Comment 4 Alexander Pyhalov 2014-04-15 11:23:15 UTC
32-bit/x86 Apache 2.4 crashes at startup on OpenIndiana (illumos distribution) with similar stack trace when ssl is enabled. 64 bit apache 2.4 is not affected.
Apache 2.2 (both 32-bit and 64 bit) are not affected.

(gdb) bt
#0  0xfeaf9d40 in ?? ()
#1  0xfe99dd5c in int_free_ex_data () from /lib/libcrypto.so.1.0.0
#2  0xfe99daea in CRYPTO_free_ex_data () from /lib/libcrypto.so.1.0.0
#3  0xfea1df45 in x509_cb () from /lib/libcrypto.so.1.0.0
#4  0xfea22a39 in asn1_item_combine_free () from /lib/libcrypto.so.1.0.0
#5  0xfea22c33 in ASN1_item_free () from /lib/libcrypto.so.1.0.0
#6  0xfea1e132 in X509_free () from /lib/libcrypto.so.1.0.0
#7  0xfe524657 in ssl_pphrase_Handle (s=0x8117ab8, p=0x811bb60) at /export/home/alp/srcs/oi-userland/components/apache24/httpd-2.4.7/modules/ssl/ssl_engine_pphrase.c:116
#8  0xfe51183e in ssl_init_Module (p=0x80f2870, plog=0x8119b58, ptemp=0x811bb60, base_server=0x8117ab8) at /export/home/alp/srcs/oi-userland/components/apache24/httpd-2.4.7/modules/ssl/ssl_engine_init.c:54
#9  0x0809f33b in ap_run_post_config (pconf=0x80f2870, plog=0x8119b58, ptemp=0x811bb60, s=0x8117ab8) at /export/home/alp/srcs/oi-userland/components/apache24/httpd-2.4.7/server/config.c:87
#10 0x0807c5ff in main (argc=3, argv=0x8047d9c) at /export/home/alp/srcs/oi-userland/components/apache24/httpd-2.4.7/server/main.c:333
Comment 5 Alexander Pyhalov 2014-04-15 12:37:43 UTC
My issue was fixed after updating from 2.4.7 to apache 2.4.9.
Comment 6 Ruediger Pluem 2014-09-11 06:09:52 UTC
*** Bug 56919 has been marked as a duplicate of this bug. ***
Comment 7 Alex Bligh 2014-09-11 06:25:45 UTC
Note (as per 56519 which is a duplicate of this), this bug can be triggered at start as well as at reload / restart.
Comment 8 Alex Bligh 2014-09-11 14:40:09 UTC
In https://issues.apache.org/bugzilla/show_bug.cgi?id=56919 (now marked as a dupe of this bug), Kaspar Brand said:

> Getting rid of ex_data might be cleaner in the end, and was actually one
> of Joe's questions on the dev list in October 2009:
>
>https://mail-archives.apache.org/mod_mbox/httpd-dev/200910.mbox/%3C20091025200721.GA20714@redhat.com%3E

Joe Orton appears to have predicted this with remarkable prescience:

> 1) the use of an ex_data structure attached to the X509 * to store the 
> stapling-specific state seems unnecessary.  Was there a reason why you 
> did this rather than simply extending the modssl_pk_server_t structure? 
> (The ex_data indices have historically been a nightmare with mod_ssl due 
> to the fact that OpenSSL might get unloaded from memory during startup, 
> and any cached copies of the index values outside of OpenSSL may or may 
> not be reliable.  Global state == bad!)

I am certainly not an expert either in apache or OCSP but I agree this looks like the way to go.

I found the code a little incomprehensible as the ex_data stuff is pretty opaque as far as I'm concerned, but if this is the way people think it should be fixed I could take a look. Any hints about how this bit works would be appreciated.
Comment 9 Alex Bligh 2014-09-12 09:06:54 UTC
Created attachment 32010 [details]
Proof of concept patch to address the issue

Attached is a proof of concept patch to address the issue. This moves the storage of the stapling information to the modssl_pk_server_t structure, and out of X509 ex_data, which is the source of the issue. It thus has a server lifetime.

Please note this is COMPILE TESTED ONLY. IE I have not checked whether it actually works at all. Also note that I am almost entirely unfamiliar with OCSP.

I have assumed that one set of stapling information per certificate is required, not per certificate algorithm (i.e. we do not need an array).

Feedback appreciated.
Comment 10 Kaspar Brand 2014-09-14 08:08:27 UTC
(In reply to Alex Bligh from comment #9)
> I have assumed that one set of stapling information per certificate is
> required, not per certificate algorithm (i.e. we do not need an array).

Thank you for looking into this. A couple of comments:

- Your patch is against 2.4.7 I assume. In 2.4.8, major changes were done in the area of server certificate configuration (r1573360), i.e. there is no longer a fixed limit of three certificates (RSA/DSA/ECC) per SSL_CTX.

- OCSP stapling (RFC 6066, section 8) is a per-certificate feature, actually, so we need to make sure that we have a per-certificate store (not a per-modssl_pk_server_t one only) for the certinfo struct.

- Using an array which parallels the current "cert_files" in the modssl_pk_server_t struct might be sufficient for the time being, though I'm reluctant to say that it's futureproof. If RFC 6961 ("Multiple Certificate Status Request Extension") support becomes available in OpenSSL one day, there would be a need to have a certinfo struct attached to each certificate in the chain.

- It would be best to write the patch against trunk, after which it can be proposed for backport to 2.4.x.
Comment 11 Alex Bligh 2014-09-14 09:50:03 UTC
Yes, it's against 2.4.7.

Re it being per certificate: any idea what the best way is to get back to a certificate index from the information available in the callback? Or, given your third comment, if I build some sort of associative array, what would be the best key into it? Obviously (e.g.) the CN would not be an appropriate key.
Comment 12 Kaspar Brand 2014-09-14 16:59:53 UTC
(In reply to Alex Bligh from comment #11)
> if I build some sort of associative array, what would be the best key into it?

In a well-behaving X.509 PKI, a certificate is uniquely identified by the issuer name and serial number (RFC 5280, section 4.1.2.2). In the real world, this might not always be the case, so a hash over the certificate's DER encoding is more reliable (it is also what the stapling code in mod_ssl currently uses as a key for its cache). You can get at the certificate in the callback via SSL_get_certificate(), and then use X509_digest() to derive that key.
Comment 13 Alex Bligh 2014-09-14 20:15:19 UTC
Created attachment 32021 [details]
v2 Proof of concept patch to address the issue

Attached is a revised proof of concept patch to address the issue. This moves the storage of the stapling information to the modssl_pk_server_t structure, and out of X509 ex_data, which is the source of the issue. It thus has a server lifetime.

Please note this is COMPILE TESTED ONLY. IE I have not checked whether it actually works at all. Also note that I am almost entirely unfamiliar with OCSP.

This addresses the following comments:

- The patch is now against trunk

- Rather than storing one set of stapling info per modssl_pk_server_t, there is now a hash based on the SHA1 digest of the certificate data

- It's also even smaller than the previous attempt
Comment 14 Alex Bligh 2014-09-18 17:31:16 UTC
Any feedback on the above?
Comment 15 Kaspar Brand 2014-09-21 07:46:31 UTC
(In reply to Alex Bligh from comment #14)
> Any feedback on the above?

It's on the right track, I think. Comments about v2:

- Keying the array/hash on the cert's SHA-1 digest is fine. What it also means is that we don't need to store it on a per-vhost basis (modssl_pk_server_t), it's sufficient to put it into SSLModConfigRec (similar to stapling_cache, this also avoids duplicate storage in case a certificate is used for more than one vhost). I would suggest to rename it to "stapling_cert_info", furthermore, to make its specific purpose more explicit.

- Initialization, i.e. apr_hash_make(), is best done in ssl_engine_config.c (in ssl_config_global_create() in this case). You don't need to check for apr_hash_make() or apr_pcalloc() failures, since httpd configures a global OOM handler for these APR failures.

- In ssl_stapling_init_cert(), there's a simpler way of achieving memory management by APR: instead of

    if (aia) {
        /* Ugly: ensure memory managed by apr */
        char *uri;
        uri = sk_OPENSSL_STRING_pop(aia);
        cinf->uri = apr_pstrdup(p, uri);
        OPENSSL_free(uri);
    }

you can simply say:

    if (aia) {
        cinf->uri = apr_pstrdup(p, sk_OPENSSL_STRING_value(aia, 0));
        X509_email_free(aia);
    }

  (Make sure to free "aia" - the "1" in X509_get1_ocsp(x) means that you own the copy.)

- In stapling_get_cert_info(), I suggest to check for X509_digest() returning 1 before proceeding, e.g.:

    if ((x == NULL) || (X509_digest(x, EVP_sha1(), idx, NULL) != 1))
        return NULL;

- In the certinfo struct, I wonder if we still need to explicitly keep the "idx" member, as we have to calculate the SHA-1 hash on the fly in stapling_get_cert_info() anyway. Could you try to figure out if dropping "idx" from the struct is feasible? (BTW, instead of using "20", I would prefer SHA_DIGEST_LENGTH being used, and the "Cached info stored in certificate ex_info" comment should be reworded.)

- In ssl_engine_init.c, there's a second call of ssl_stapling_init_cert() which needs adjustment as well (for the OpenSSL 1.0.2 or later case).

- Also remove the ssl_stapling_ex_init() declaration from ssl_private.h.

Thanks for your work on getting this fixed, much appreciated.
Comment 16 Alex Bligh 2014-09-21 08:24:31 UTC
Kaspar,

Thanks for this.

I thought about moving to a global ssl stapling info hash. My concern about doing this was two fold.

1. Does the OCSP somehow requires a state per vhost - for instance what happens if 2 sites have a different stapling_force_url for the same certificate? As SSLStaplingForceURL is a vhost ctx directive, wouldn't this introduce a back compatibility problem?

2. Does this work with apache's threading model without further mutexes? Do I then need to worry about concurrent accesses to the stapling info in a way I didn't before?

3. If I make this change, presumably the check for ""ssl_stapling_init_cert: certificate already initialized!"" should go.

Mainly because of (1), I'm tempted to do this without the change in scope of the stapling info (i.e. to keep it at a vhost level).

Does that make sense?

Alex
Comment 17 Alex Bligh 2014-09-21 09:05:33 UTC
Created attachment 32035 [details]
v3 Proof of concept patch to address the issue

Attached is v3 of the proof of concept patch. Details below:

    (v3) Remove OCSP stapling info from X509 ex_data

    Remove OSCP stapling info from X509 ex_data, and manage it within
    normal APR pools with server lifecyle. This is to address BZ 54357
    and BZ 56919. Introduce a hash of stapling info indexed by the
    SHA1 hash of the certificate content.

    Note this code as been compile tested only at this stage and
    is submitted as a proof of concept.

    Changes since v2:
    * change stapling_info to stapling_cert_info
    * move init of stapling_cert_info hash to modssl_ctx_init_server
    * Drop unnecessary memory allocation failure checks
    * Simply extraction of uri string into apr memory management
    * Free aia structure
    * In stapling_get_cert_info check for X509_digest failure
    * Use SHA_DIGEST_LENGTH not hardcoded 20
    * Fix up second call to ssl_stapling_init_cert
    * Remove ssl_stapling_ex_init() declaration from ssl_private.h

    Changes NOT done since v2 that were suggested by Kaspar Brand

    * move stapling_cert_info to SSLModConfigRec. Reasons include the
      fact that SSLStaplingForceURL etc are per vHost.

    * Drop use of idx member from cert_info structure. Reason: the
      index to the apr hash structure has to be held somewhere for all
      the recorded certificates. Whilst we do calculate the SHA1 hash
      on the fly, the apr hash algorithm needs something to compare it
      to (i.e. the SHA1s of all the recorded certificates) so it can
      index the correct URI. As the SHA1 value needs to be stored
      somewhere for each certificate, it might as well be stored in
      the struct. It's possible I've misunderstood what Kaspar was
      suggesting here.
Comment 18 Alex Bligh 2014-09-21 09:09:56 UTC
Created attachment 32036 [details]
v3 Proof of concept patch to address the issue (with signed-of-by)

This time with Signed-Off-By line.
Comment 19 Kaspar Brand 2014-09-21 10:12:55 UTC
(In reply to Alex Bligh from comment #16)
> 1. Does the OCSP somehow requires a state per vhost - for instance what
> happens if 2 sites have a different stapling_force_url for the same
> certificate? As SSLStaplingForceURL is a vhost ctx directive, wouldn't this
> introduce a back compatibility problem?

No, it doesn't. stapling_renew_response() favors stapling_force_url over all certs for a given vhost, i.e. any OCSP URI from stapling_cert_info (i.e. the cert itself) is ignored in this case.

> 2. Does this work with apache's threading model without further mutexes? Do
> I then need to worry about concurrent accesses to the stapling info in a way
> I didn't before?

stapling_cert_info is basically an "init-once" thing - i.e., it's only populated at startup (in ssl_init_server_certs()) and therefore different from the stapling_cache.

> 3. If I make this change, presumably the check for ""ssl_stapling_init_cert:
> certificate already initialized!"" should go.

Yes, exactly, I was expecting you to figure this out.

(In reply to Alex Bligh from comment #17)
>     * Drop use of idx member from cert_info structure. Reason: the
>       index to the apr hash structure has to be held somewhere for all
>       the recorded certificates. Whilst we do calculate the SHA1 hash
>       on the fly, the apr hash algorithm needs something to compare it
>       to (i.e. the SHA1s of all the recorded certificates) so it can
>       index the correct URI. As the SHA1 value needs to be stored
>       somewhere for each certificate, it might as well be stored in
>       the struct. It's possible I've misunderstood what Kaspar was
>       suggesting here.

Previously the SHA-1 hash was used as a key for the OCSP response cache (mc->stapling_cache), and as there was ex_info attached to each certificate, it was convenient to have it only calculated once. But now that we always have to calculate the hash in stapling_cb()/stapling_get_cert_info() anyway, we could just pass it directly to stapling_cache_response() and stapling_get_cached_response() (these are the only places where cinf->idx is still used) and avoid the reduntant storage. I.e., you could merge the code from staping_get_cert_info() right into stapling_cb() and use a local "idx" which you would pass as an argument to stapling_get_cached_response() and stapling_renew_response() (which in turn would supply it to stapling_cache_response()).
Comment 20 Alex Bligh 2014-09-21 10:41:21 UTC
(In reply to Kaspar Brand from comment #19)
> (In reply to Alex Bligh from comment #16)
> > 1. Does the OCSP somehow requires a state per vhost - for instance what
> > happens if 2 sites have a different stapling_force_url for the same
> > certificate? As SSLStaplingForceURL is a vhost ctx directive, wouldn't this
> > introduce a back compatibility problem?
> 
> No, it doesn't. stapling_renew_response() favors stapling_force_url over all
> certs for a given vhost, i.e. any OCSP URI from stapling_cert_info (i.e. the
> cert itself) is ignored in this case.

Oh I see. So ssl_stapling_init_cert remains done once (or more) per
vhost, but we use a per SSL server cache. We keep the check for no
stapling URI and no force URI *on that server*, then if the cert
has no URI we error if any of the vhosts using that cert have no
force URI.

Presumably I then need to use the global SSL state's pool rather than
the server pool, which means I can drop the changes to pass pool in
and use (I presume) myModConfig(s)->pPool or similar.


> (In reply to Alex Bligh from comment #17)
> >     * Drop use of idx member from cert_info structure. Reason: the
> >       index to the apr hash structure has to be held somewhere for all
> >       the recorded certificates. Whilst we do calculate the SHA1 hash
> >       on the fly, the apr hash algorithm needs something to compare it
> >       to (i.e. the SHA1s of all the recorded certificates) so it can
> >       index the correct URI. As the SHA1 value needs to be stored
> >       somewhere for each certificate, it might as well be stored in
> >       the struct. It's possible I've misunderstood what Kaspar was
> >       suggesting here.
> 
> Previously the SHA-1 hash was used as a key for the OCSP response cache
> (mc->stapling_cache), and as there was ex_info attached to each certificate,
> it was convenient to have it only calculated once. But now that we always
> have to calculate the hash in stapling_cb()/stapling_get_cert_info() anyway,
> we could just pass it directly to stapling_cache_response() and
> stapling_get_cached_response() (these are the only places where cinf->idx is
> still used) and avoid the reduntant storage. I.e., you could merge the code
> from staping_get_cert_info() right into stapling_cb() and use a local "idx"
> which you would pass as an argument to stapling_get_cached_response() and
> stapling_renew_response() (which in turn would supply it to
> stapling_cache_response()).

I get the feeling I'm being dumb here.

Yes, I understand that the code does not use cinf->idx /afterwards/ save
where you point out and if it did it could simply use the locally generated
idx. Or to put it another way, before my changes we could have locally
generated idx with an X.509 digest in stapling_cb()/stapling_get_cert_info().

However, now we are using stapling_cert_info, that no longer holds true (I
think). That's because we need to store the SHA1 of each certificate somewhere
anyway, because those SHA1 values (now) form the key to the stapling_cert_info
hash. IE when we create the local X.509 SHA1, we then need to go through
each of the cert_info structures within the apr hash to determine which ones
match, so we can get access to the OTHER fields within cert_info. Therefore
the SHA1 of each certificate needs to be stored somewhere with a reference
to the relevant cert_info structure, and as good a place as any is to store
it within the cert_info structure itself - partly because that's minimum
code change, and partly because frankly where else are you going to store it?
I suppose I am assuming (possible wrongly) that apr_hash_set isn't actually
making a copy of the key here.

Again, perhaps I'm missing the point here.
Comment 21 Alex Bligh 2014-09-21 10:58:39 UTC
Created attachment 32037 [details]
v4 Proof of concept patch to address the issue

v4 Proof of concept patch to address the issue:

    (v4) Remove OCSP stapling info from X509 ex_data

    Remove OSCP stapling info from X509 ex_data, and manage it within
    normal APR pools with SSLModConfigRec lifecyle. This is to address
    BZ 54357 and BZ 56919. Introduce a hash of stapling info indexed
    by the SHA1 hash of the certificate content.

    Note this code as been compile tested only at this stage and
    is submitted as a proof of concept.

    Changes since v3:
    * move stapling_cert_info to SSLModConfigRec

    Changes since v2:
    * change stapling_info to stapling_cert_info
    * move init of stapling_cert_info hash to modssl_ctx_init_server
    * Drop unnecessary memory allocation failure checks
    * Simply extraction of uri string into apr memory management
    * Free aia structure
    * In stapling_get_cert_info check for X509_digest failure
    * Use SHA_DIGEST_LENGTH not hardcoded 20
    * Fix up second call to ssl_stapling_init_cert
    * Remove ssl_stapling_ex_init() declaration from ssl_private.h

    Changes NOT done since v3 that were suggested by Kaspar Brand

    * Drop use of idx member from cert_info structure. Reason: the
      index to the apr hash structure has to be held somewhere for all
      the recorded certificates. Whilst we do calculate the SHA1 hash
      on the fly, the apr hash algorithm needs something to compare it
      to (i.e. the SHA1s of all the recorded certificates) so it can
      index the correct URI. As the SHA1 value needs to be stored
      somewhere for each certificate, it might as well be stored in
      the struct. It's possible I've misunderstood what Kaspar was
      suggesting here.
Comment 22 Kaspar Brand 2014-09-21 17:23:11 UTC
(In reply to Alex Bligh from comment #20)
> However, now we are using stapling_cert_info, that no longer holds true (I
> think). That's because we need to store the SHA1 of each certificate
> somewhere
> anyway, because those SHA1 values (now) form the key to the
> stapling_cert_info
> hash.

Ah yes, you're right. I didn't realize that the stapling_cert_info hash is populated with apr_hash_set()...

> I suppose I am assuming (possible wrongly) that apr_hash_set isn't actually
> making a copy of the key here.

... and your assumption is correct, of course (it's indeed a straightforward way of keeping the idx around).

Two comments about v4: 

- In ssl_stapling_init_cert, we're potentially allocating/"leaking" unneeded certinfo structs, I think. If a specific certificate is configured for more than one vhost (or at restart), we're constructing "cinf" even if we might not really need it later for apr_hash_set. Is it possible to first check with apr_hash_get if there's already a suitable entry in stapling_cert_info, and return early in this case? (Needs a local idx[] for temporary storage of the X509_digest result, I guess.)

- It is actually safe to use

        cinf->uri = apr_pstrdup(p, sk_OPENSSL_STRING_value(aia, 0));

in ssl_stapling_init_cert (without checking the intermediate result of sk_OPENSSL_STRING_value), as aia will only be non-NULL if the stack includes at least one element (and furthermore, apr_pstrdup has a non-NULL check for the second argument).
Comment 23 Alex Bligh 2014-09-21 17:49:12 UTC
Created attachment 32039 [details]
v5 Proof of concept patch to address the issue

Kaspar,

Try the attached v5 patch which should address both your issues.

Alex

Remove OCSP stapling info from X509 ex_data, and manage it within
normal APR pools with SSLModConfigRec lifecyle. This is to address
BZ 54357 and BZ 56919. Introduce a hash of stapling info indexed
by the SHA1 hash of the certificate content.

Note this code as been compile tested only at this stage and
is submitted as a proof of concept.

Changes since v4:
* simplify (again) URI extraction
* check certificate does not exist in hash first

Changes since v3:
* move stapling_cert_info to SSLModConfigRec

Changes since v2:
* change stapling_info to stapling_cert_info
* move init of stapling_cert_info hash to modssl_ctx_init_server
* Drop unnecessary memory allocation failure checks
* Simplify extraction of uri string into apr memory management
* Free aia structure
* In stapling_get_cert_info check for X509_digest failure
* Use SHA_DIGEST_LENGTH not hardcoded 20
* Fix up second call to ssl_stapling_init_cert
* Remove ssl_stapling_ex_init() declaration from ssl_private.h
Comment 24 Alex Bligh 2014-09-21 18:02:16 UTC
Hmm - all this talk of combining the hash entry across different vhosts (i.e. moving stapling_cert_info from modssl_pk_server_t to SSLModConfigRec) assumes that
    issuer = stapling_get_issuer(mctx, x);
returns the same information irrespective of the mctx.

In theory it would I presume be possible for the same certificate on one vhost to use a different set of extra_certs than on another, which in theory could result in a different value of 'issuer' per vhost, and hence in theory a different cert_info structure. Is this in practice an issue?
Comment 25 Kaspar Brand 2014-09-22 05:32:38 UTC
(In reply to Alex Bligh from comment #24)
> In theory it would I presume be possible for the same certificate on one
> vhost to use a different set of extra_certs than on another, which in theory
> could result in a different value of 'issuer' per vhost, and hence in theory
> a different cert_info structure. Is this in practice an issue?

For the stapling part, it's not an issue. What stapling_get_issuer() really needs from the issuing CA is its public key (for creating the request's OCSP_CERTID, see RFC 6960 section 4.1.1, "issuerKeyHash"). Even if there are multiple versions of an issuer cert (e.g. with different validities, alternative signature algorithms etc.), the issuerKeyHash will always be the same for all certificates signed by this key (otherwise the signature of the server cert would fail verification).

Or put differently, the members of the certinfo structure ("idx", "cid", "uri") for a specific cert only depend on the cert itself and the public key of its issuer, nothing else.

v5 is very close I think. These lines in ssl_stapling_init_cert()

    cinf = apr_hash_get(mc->stapling_cert_info, idx, sizeof(idx));
    if (cinf) {
        /* It's already in the hash. However, it may not have a uri
         * If not, check we have a force URL */
        if (!cinf->uri && !mctx->stapling_force_url) {
            ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, APLOGNO(02218)
                         "ssl_stapling_init_cert: no responder URL");
            return 0;
        }
        return 1;
    }

can be shortened to

    if ((cinf = apr_hash_get(mc->stapling_cert_info, idx, sizeof(idx))))
        return 1;

as the case of cinf already existing in the hash, but cinf->uri being undefined is a "can't happen", really: remember that "idx" is a SHA-1 hash over the full DER encoding of the certificate (which of course includes the OCSP URI in the AIA extension), so unless we have a collision with the DER encoding of two completely distinct certificates (which would also be very bad for the response cache, just as an aisde), it's impossible to reach the situation where "cinf && !cinf->uri" if we already have a hash entry. (Also, note that every log message needs a unique APLOGNO, so we would have to assign a new one if we kept the message.)
Comment 26 Alex Bligh 2014-09-22 06:33:55 UTC
> v5 is very close I think. These lines in ssl_stapling_init_cert()
> 
>     cinf = apr_hash_get(mc->stapling_cert_info, idx, sizeof(idx));
>     if (cinf) {
>         /* It's already in the hash. However, it may not have a uri
>          * If not, check we have a force URL */
>         if (!cinf->uri && !mctx->stapling_force_url) {
>             ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, APLOGNO(02218)
>                          "ssl_stapling_init_cert: no responder URL");
>             return 0;
>         }
>         return 1;
>     }
> 
> can be shortened to
> 
>     if ((cinf = apr_hash_get(mc->stapling_cert_info, idx, sizeof(idx))))
>         return 1;
> 
> as the case of cinf already existing in the hash, but cinf->uri being
> undefined is a "can't happen", really:

I think you might be wrong about that.

Let's say two vhosts have the same certificate. The first vhost this
comes across calls ssl_stapling_init_cert, and this calls X509_get1_ocsp.
X509_get1_ocsp reveals this certificate has no URI, however on this
vhost, SSLStaplingForceURL is set at a vhost level. The certificate's
info then gets added to the hash. The second vhost then gets processed.
The SHA is the same (obviously) so what the above section does is
to check that on the second vhost, SSLStaplingForceURL is also set.
If we don't make this check, then on the first vhost will be OK
because it has SSLStaplingForceURL set, and the second vhost will
be OK although it does not have SSLStaplingForceURL set. That means
that when it gets to deal with the stapling for the second vhost,
there will be no URL. IE it's purpose is to look for an illegal
configuration and return 0 in that case.

Someone who knows how to test OCSP should really test this code works
as opposed to merely compiles. I have no idea how to do that.
Comment 27 Alex Bligh 2014-09-22 07:50:26 UTC
(missed the bit re the log numbers).

It's actually exactly the same log message as further down the same function. If that's an issue, I could easily refactor the check into a function of its own so the log message appears only once.
Comment 28 Kaspar Brand 2014-09-22 20:08:03 UTC
(In reply to Alex Bligh from comment #26)
> If we don't make this check, then on the first vhost will be OK
> because it has SSLStaplingForceURL set, and the second vhost will
> be OK although it does not have SSLStaplingForceURL set.

I see. You're right, I didn't think of this case. Maybe it would be good to log mctx->sc->vhost_id in the message as well, so that it's easier to identify for which vhost the problem actually exists (we might even consider switching to ssl_log_xerror, which logs more cert details, but this would mean passing in ptemp to ssl_stapling_init_cert).

Using the same APLOGNO() for two different places in the codeshould be avoided, i.e. instead of factoring it out, we can just use a new number for the first occurrence (I can deal with this when checking in, just put in APLOGNO() for the time being).

> Someone who knows how to test OCSP should really test this code works
> as opposed to merely compiles. I have no idea how to do that.

If you have a cert from a publicly-trusted CA, chances are very high that it also includes an OCSP URI, so configuring the SSLStaplingCache and adding "SSLUseStapling on" will turn on stapling (if your cert lacks an OCSP URI, you could at least try to force a "fake" SSLStaplingForceURL and see if it is properly taken into account). Then, openssl s_client and its "-status" option can be used to connect to the server and request a stapled OCSP response (which will be dumped if provided by the server, otherwise you get "OCSP response: no response sent"). I will also give it a try later this week.
Comment 29 Alex Bligh 2014-09-22 20:24:45 UTC
Created attachment 32043 [details]
v6 Proof of concept patch to address the issue

Kaspar,

Here's a v6 of the patch that does the refactor - prepared before I read your email properly. Feel free to use v5 instead, but it may save you an APLOGNO patch and make it easier to backport.

Re your comment on logging, isn't this why one passes 's' to ap_log_error, so that it can log the server that has the issue?

Alex
Comment 30 Kaspar Brand 2014-09-24 12:29:20 UTC
Created attachment 32053 [details]
Patch v7 - store stapling certinfo in a global hash, based on work by Alex Bligh

Hmm, ok, so after another closer look, I think I found another issue with the existing code, actually: IINM, with the current certinfo_free code, we actually leak the OCSP_CERTID stored cinf->cid (which gets allocated by OCSP_cert_to_id).

I came to this conclusion when I was restructuring ssl_stapling_init_cert() a bit more, and am attaching my current version. I did some limited "real-world" testing, but more testing and further reviews are welcome and appreciated, of course.

Joe and Steve: I've Cc'ed you in the hope that you could share your insights and opinion on the currently suggested approach.
Comment 31 Alex Bligh 2014-09-24 13:18:23 UTC
(In reply to Kaspar Brand from comment #30)
> Created attachment 32053 [details]
> Patch v7 - store stapling certinfo in a global hash, based on work by Alex
> Bligh
> 
> Hmm, ok, so after another closer look, I think I found another issue with
> the existing code, actually: IINM, with the current certinfo_free code, we
> actually leak the OCSP_CERTID stored cinf->cid (which gets allocated by
> OCSP_cert_to_id).
> 
> I came to this conclusion when I was restructuring ssl_stapling_init_cert()
> a bit more, and am attaching my current version. I did some limited
> "real-world" testing, but more testing and further reviews are welcome and
> appreciated, of course.

Ah yes. Your v7 is still leaking it on server restart.

Whilst we could put in some form of pool handler, that is rather tiresome and I worry about lifetime issues. I had been trying to keep SSL objects out of the cert_info structure.

Do we need something like:
   apr_pool_cleanup_register(p, cid, OCSP_CERTID_free, apr_pool_cleanup_null);

in there?
Comment 32 Kaspar Brand 2014-09-24 16:38:19 UTC
(In reply to Alex Bligh from comment #31)
> Ah yes. Your v7 is still leaking it on server restart.

On restart? How exactly? I don't follow yet, but perhaps I'm missing the obvious.
Comment 33 Alex Bligh 2014-09-24 18:11:14 UTC
(In reply to Kaspar Brand from comment #32)
> (In reply to Alex Bligh from comment #31)
> > Ah yes. Your v7 is still leaking it on server restart.
> 
> On restart? How exactly? I don't follow yet, but perhaps I'm missing the
> obvious.

Kaspar,

Here's what I think will happen.

When ssl_stapling_init_cert is run, it does:
 cid = OCSP_cert_to_id(NULL, x, issuer);

Your new patch (v7) implies that this actually allocates something that needs to be deallocated (as opposed to merely returning a pointer to an existing object). I didn't realise that (because the old code was never freeing cid under any circumstances). This is presumably why you have inserted:
  OCSP_CERTID_free(cid);
if aia is NULL and there is no stapling URL.

But if this is true, we have an issue when the server is restarted as follows:

If the stapling info is correct, then the cinf struct will have a reference to the allocated OSCP_CERTID (cid), and this will be inserted into the hash table. This is used later on stapling callbacks.

But when the server apr pool is freed (on a restart), it will free the hash table of cinf entries and the cinf entries themselves, but cinf->cid will not be freed (i.e. OCSP_CERTID_free() will not be called), because it is not allocated in an apr pool and we haven't registered a cleanup handler for it.

Therefore, when the restart occurs, the hash table will be entry and it will call OCSP_cert_to_id again for each certificate, allocating another OSCP_CERTID structure (and anything beneath that). As far as I can tell, this will be leaked on each restart.

I think:
  apr_pool_cleanup_register(p, cid, OCSP_CERTID_free, apr_pool_cleanup_null);

or similar somewhere around the hash_set will fix this.

Alex
Comment 34 Kaspar Brand 2014-09-25 04:58:33 UTC
(In reply to Alex Bligh from comment #33)
> I didn't realise that (because the old code was never freeing cid under any circumstances).

Yes, that's what was referring to in comment 30 with the issue in the current certinfo_free code.

> This is presumably why you have
> inserted:
>   OCSP_CERTID_free(cid);
> if aia is NULL and there is no stapling URL.

That's because we're aborting early, and we don't want to store any certinfo in this case - i.e., its part of the cleanup of temporary stuff.

> But when the server apr pool is freed (on a restart),
[...]
> Therefore, when the restart occurs, the hash table will be entry and it will
> call OCSP_cert_to_id again for each certificate, allocating another
> OSCP_CERTID structure (and anything beneath that). As far as I can tell,
> this will be leaked on each restart.

This doesn't happen, and is probably the reason you thought v7 would leak. The SSLModConfigRec ("mc") survives restarts, and the stapling_cert_info hash is not cleared. Put differently, we only add certinfo for a specific certificate once in the lifetime of the process - if apr_hash_set() for certificate X was called at startup, then it's skipped if certificate X is encountered again in any of the additional rounds (in fact, this also the reason I put in the TRACE1 log statement, which you'll see only once per certificate and process lifetime when configuring "LogLevel ssl:trace1").

> I think:
>   apr_pool_cleanup_register(p, cid, OCSP_CERTID_free, apr_pool_cleanup_null);
> 
> or similar somewhere around the hash_set will fix this.

It's sort of moot because it would only be called when httpd exits. I don't have strong feelings, and hope that we get some further opinions from other httpd devs at this point (Rainer/Ruediger/Joe/Steve?).
Comment 35 Alex Bligh 2014-09-25 06:05:15 UTC
Kaspar,

> This doesn't happen, and is probably the reason you thought v7 would leak.
> The SSLModConfigRec ("mc") survives restarts, and the stapling_cert_info
> hash is not cleared. Put differently, we only add certinfo for a specific
> certificate once in the lifetime of the process - if apr_hash_set() for
> certificate X was called at startup, then it's skipped if certificate X is
> encountered again in any of the additional rounds (in fact, this also the
> reason I put in the TRACE1 log statement, which you'll see only once per
> certificate and process lifetime when configuring "LogLevel ssl:trace1").

OK, thanks, I didn't understand that. I will have to think of a more contrived example:

Imagine a server with 100 SSL Certificates, which are all changed and the SSL server reloaded once a minute. As the certs are changed, they have different SHA-1 sums. This means not only the OSCP_CERTID but also the certinfo structure leak, as nothing is ever removed from the hash.

Technically on server reload we should be freeing the hash and its contents.

I am fantastically unbothered about this.
Comment 36 Kaspar Brand 2014-10-01 14:52:47 UTC
Created attachment 32073 [details]
Patch v8 (based on work by Alex Bligh) - store stapling certinfo in a pconf-allocated hash

Here is another attempt, which tries to improve memory management wrt v7. Specifically, the certinfo stuff is now put into a static hash which is allocated out of the pconf pool (i.e., cleared at restarts), and a cleanup function is registered to take care of the OCSP_CERTID allocations (which actually leak with the current code in trunk and 2.4.x).

Would appreciate feedback/reviews from other devs (in particular about the memory management things).
Comment 37 Alex Bligh 2014-10-01 14:59:39 UTC
Comment on attachment 32073 [details]
Patch v8 (based on work by Alex Bligh) - store stapling certinfo in a pconf-allocated hash

Kaspar,

Can you legally pass ssl_stapling_certid_free as a parameter to apr_pool_cleanup_register? Might the ssl module not be unloaded / reloaded to a different address at that point (this was the cause of the original bug)? That's why I was trying to simply pass OCSP_CERTID_free. If that legally will take a NULL we should be OK.

Alex
Comment 38 Kaspar Brand 2014-10-01 17:51:54 UTC
(In reply to Alex Bligh from comment #37)
> Can you legally pass ssl_stapling_certid_free as a parameter to
> apr_pool_cleanup_register?

Yes, because this cleanup is called before the module is unloaded (unload_module in mod_so is implemented as a cleanup call registered with the pconf pool, and because ssl_stapling_init_cert registers its cleanup afterwards, it is called before mod_ssl is unloaded by mod_so).

> Might the ssl module not be unloaded / reloaded
> to a different address at that point (this was the cause of the original
> bug)?

Doesn't matter in this case. With the current X509_get_ex_new_index call, the problem is most likely the one pointed out by Rainer in the initial description: libcrypto doesn't get unloaded (in contrast to mod_ssl), and therefore keeps the pointer to the old free_func.
Comment 39 Yann Ylavic 2014-10-01 21:49:18 UTC
(In reply to Kaspar Brand from comment #36)
> Would appreciate feedback/reviews from other devs (in particular about the
> memory management things).

The patch looks good to me, +1.
Thank you and Alex for the good work.
Comment 40 Kaspar Brand 2014-10-04 11:00:53 UTC
I have committed v8 (with some very minor tweaks plus the APLOGNO adjustments) as r1629372 to trunk. Let's see if this triggers any additional feedback on the mailing list, otherwise I will propose for backport to 2.4.x soon.
Comment 41 Alex Bligh 2014-10-08 21:11:02 UTC
Any news on the 2.4 backport?
Comment 42 Kaspar Brand 2014-10-11 09:12:49 UTC
Proposed for backport to 2.4.x with r1631030.
Comment 43 Alex Bligh 2014-10-20 09:36:30 UTC
Would it help if I produced a backport patch for this, or is this already under way?
Comment 44 Kaspar Brand 2014-10-26 07:23:29 UTC
(In reply to Alex Bligh from comment #43)
> Would it help if I produced a backport patch for this, or is this already
> under way?

The patch is linked to in the STATUS file (together with the proposal for backport), see https://svn.apache.org/viewvc/httpd/httpd/branches/2.4.x/STATUS?r1=1631030&r2=1631029.
Comment 45 Jeff Trawick 2014-10-30 14:40:46 UTC
This has been approved and merged to the 2.4.x branch for upcoming 2.4.11.