Bug 63305

Summary: Segmentation fault in mod_ldap on gracefull reload
Product: Apache httpd-2 Reporter: Martin Fúsek <mfusek>
Component: mod_ldapAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: normal CC: Pavel
Priority: P2 Keywords: FixedInTrunk
Version: 2.4.23   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Solve the issue

Description Martin Fúsek 2019-04-01 06:34:47 UTC
Created attachment 36505 [details]
Solve the issue

Our server using our closed source variant of mod_authnz_ldap (mod_agwldap) which is using mod_ldap. Server is heavy loaded and is reloaded once per hour to change CRL (certificate revocation list) to use TLS client certificate authentication. LDAP contains more then one million users. Mod_ldap is configured to use shared cache. On reload there is intermittent sig seg crash. Crash can be simulated on load test with 1600 different users and 20 threads, when server is reloaded every 3 second (only for load test purpose).
Back trace (some times crash is in different location):

#0 util_ldap_search_node_free (cache=0x7f3bb9e3f030, n=0x74696c6d79473132) at util_ldap_cache.c:202
#1 0x00007f3bbe245bc0 in util_ald_destroy_cache (cache=0x7f3bb9e3f030) at util_ldap_cache_mgr.c:412
#2 0x00007f3bbe244c3d in util_ldap_url_node_free (cache=0x7f3bb9e1e038, n=0x7f3bb9e57a80) at util_ldap_cache.c:73
#3 0x00007f3bbe245bc0 in util_ald_destroy_cache (cache=0x7f3bb9e1e038) at util_ldap_cache_mgr.c:412
#4 0x00007f3bbe244d6d in util_ldap_cache_module_kill (data=0x7f3bc4d99548) at util_ldap_cache.c:402
#5 0x00007f3bc42d403e in apr_pool_destroy () from /usr/lib64/libapr-1.so.0
#6 0x00007f3bc42d4295 in apr_pool_clear () from /usr/lib64/libapr-1.so.0
#7 0x0000558a6b72bd6d in main (argc=18, argv=0x7ffdd1e726b8) at main.c:713

void util_ldap_search_node_free(util_ald_cache_t *cache, void *n)
{
int i = 0;
util_search_node_t *node = n;
**int k = node->numvals;**

if (node->vals) {
for (;k;k--,i++) {
if (node->vals[i]) {
util_ald_free(cache, node->vals[i]);
}
}
util_ald_free(cache, node->vals);
}
util_ald_free(cache, node->username);
util_ald_free(cache, node->dn);
util_ald_free(cache, node->bindpw);
util_ald_free(cache, node);
}

Patch wich solve the issue (against branch 2.4.x) is included. We believe that error is inside mod_ldap and not inside our mod_agwldap. We tried newest mod_ldap (trunk) without luck. When patch is applayed there is no crashes inside load test.
Comment 1 Christophe JAILLET 2019-07-20 08:46:43 UTC
This has been fixed in trunk in r1856735.

The patch is different from the one attached in this PR.
Could you please test and confirm that it solves the issue for you as well?
Comment 2 Martin Fúsek 2020-01-20 08:43:34 UTC
(In reply to Christophe JAILLET from comment #1)
> This has been fixed in trunk in r1856735.
> 
> The patch is different from the one attached in this PR.
> Could you please test and confirm that it solves the issue for you as well?

Yes it works. But in trunk revision 1831165: "mod_ldap: log and abort locking errors.  related to PR60296 investigation  RMM corruption is really nasty, so abort on locking failures" make it also broken. Without revision 1831165(commenting out if (rv != APR_SUCCESS) {) it works OK. Problem is that some request probably arrive after lock cleanup (after reload) and assertion fail (new lock on cache cleanup works ok).

CoreDumpDirectory /tmp/

#LDAPLibraryDebug 7
LDAPLibraryDebug disabled
LDAPConnectionPoolTTL 15

 LDAPSharedCacheSize 10485760
 LDAPCacheEntries 10240
 LDAPCacheTTL 600
 LDAPOpCacheEntries 10240
 LDAPOpCacheTTL 600
 LDAPVerifyServerCert Off

<Location /ldap-status>
    SetHandler ldap-status
    Require host 127.0.0.1
</Location>

    <Location />
        AuthName "ISDS - DS"
AuthLDAPBindDN "<redacted>"
AuthLDAPBindPassword "<redacted>"
AuthType basic
AuthBasicProvider ldap
AuthLDAPRemoteUserIsDN on
<LimitExcept OPTIONS>
Require valid-user
</LimitExcept>


AuthLDAPURL ldap://172.24.40.126/ou=test,o=test?cn,isdsRights?sub? TLS


</Location>