*** Overview *** In mod_ldap (util_ldap.c), during util_ldap_cache_checkuserid, ldap connection may be rebound to the checked user dn and still be known to be bound to its original dn or anonymous. Reuse of these connections may later lead to unexpected authentication failures. These problem are particularly annoying with an ldap server that refuse anonymous connection or in which users has no rights to read other users entry. This may also has some security problem since the connection pool may contain bound connection marked as anonymous one. *** Symptoms *** These problems are usually reported as no more good authentication after a failed or even successful previous authentication. Usually the following or a similar error is logged: auth_ldap authenticate: user <username> authentication failed; URI <URI> [User not found][No such object] *** Test case *** To easily trigger such problem, configure an ldap server that give access to the users entry only using an non-anonymous bound connection. Configure users to have no access to other user entry (you are now in the worst case). use the mod_auth_ldap with the following configuration on given URI: AuthType Basic AuthName "LDAP authentication" AuthLDAPEnabled on AuthLDAPBindDN <dn of the only ldap account that have access to all user entry> AuthLDAPBindPassword <password for this account> AuthLDAPURL <appropriate ldap url> AuthLDAPAuthoritative on Than try to access this URI using good and wrong authentication. Access to this URI will quickly became fuzzy and the message above will be reported for valid user with good and bad password. *** Explanation *** Here are the auth_util related steps involved in a mod_auth_ldap authentication: 1) mod_auth_ldap_check_user_id() gets called to check user authentication 2) mod_auth_ldap_check_user_id() retrieve a cached connection from util_ldap_connection_find() 3) util_ldap_connection_find() search for a connection matching the host, port, binddn/bindpw required by util_ldap_connection_find() which have taken these from your httpd configuration 4) if a matching connection is found, it is returned as is, else a non-matching connection is set to be unbound and is returned 5) mod_auth_ldap_check_user_id() provide the retrieved connection to util_ldap_cache_checkuserid() with the username/password provide by the user and the filter provide in your configuration 6) util_ldap_cache_checkuserid() check the user cache for a previous successful authentication of this user, if found, no use of the ldap connection is done 7) if none are found, the ldap connection is open using util_ldap_connection_open() which means bind the connection if it is currently known to be unbound using the binddn/bindpw previously stored by util_ldap_connection_find(), and set it to be known bound, else do nothing ! 8) search the ldap server for the dn of the user based on the provided filter 9) if one and only one record is return, retrieve the provided dn. 10) rebind the connection to the user dn using the provided user password to know if the user password is correct. This is done using, a direct call to an ldap api function called ldap_simple_bind_s(). The known to be bound and binddn of the util_ldap connection structure use for connection caching are not updated by this call which lead to the problem. 11) later, on return from util_ldap_cache_checkuserid, mod_auth_ldap_check_user_id release as is the connection to the cache using a misnamed function called util_ldap_connection_close. 12) if an error has been reported and only if this error is LDAP_SERVER_DOWN, the connection will be unbound (and some retries will be done), avoiding the problem when no server has answer (funny, no?) Has you should have understand, in step 10, the connection is rebound to another user without updating cache information and in step 11, this rebound connection is released to the cache. *** Solution *** To solve this issue, there is two options: 1) Synchronize the cache information of the provided connection with its new required binding, setting it to unbound and use util_ldap_connection_open() to rebound the connection properly which ensure correct util_ldap cache usage 2) Retrieve another connection from cache for the user authentication using util_ldap_connection_find(). Choosing between these option is choosing between keeping the 'search user' connection bound to the AuthLDAPBindDN opposed to using only one connection for authentication. I have seen a patch that use option two, but I am afraid that this patch does not properly release the first connection to the cache using util_ldap_connection_close.
Created attachment 10470 [details] Patch using option one in the solution explained above
*** Patch *** For my part, I have choose option one, which is using only one connection for both search and authentication, leave a connection bound to the authenticated user on the cache. The attached unified patch has been done against the latest public released, which is at the time of this writing version 2.0.48. I have also manually check the head of development (version 2.1.x), and it should apply too. I have no more time right known to test this patch thoroughly. It is in production on our server since it was written and I will keep this bug report informed of any further problem we may encounter.
There is other discussions on these issues in bug 17599 (which provide a probably wrong patch using option two describe above) and bug 21787 which provide a similar solution to this one by always marking the ldap connection unbound after authentication even if it was bound to the authenticated user.
Bug 24683 may seems also related
We tried to use the patch attached id=10470, and found that while it gave correct results, the number of open connections to the LDAP server increased linearly over time. We started hitting limits on the LDAP server on open connections. This may be a generic flaw in mod_ldap, in that there is no bounds I can see on the number of cached connections or how long they may be held open. In this case, it doesn't appear that the open connections served much of a function. We ended up using a patch suggested in comments to http://nagoya.apache.org/bugzilla/show_bug.cgi?id=17274 On Apache 2.0.49 this was: $ diff mod_auth_ldap.c~ mod_auth_ldap.c 329c329 < util_ldap_connection_close(ldc); --- > util_ldap_connection_destroy(ldc); But this may defeat connection caching entirely. I don't claim to understand the code in detail.
Created attachment 11296 [details] Cumulatiive patch that fix the issue of the increasing connections to the LDAP server describe above
The previous patch provide a fix to excessive locking done during connection searching in the connection cache. This correction is also available in the version 1.23 of util_ldap.c in the CVS tree. This patch contains also the previous patch related to connection rebind and is appliable to the latest stable release to date (2.0.49).
I did a series of tests using 2.0.49 with each of: 1) util_ldap.c from cvs version 1.24 2) util_ldap.c patched with 11296 3) util_ldap.c patched with 10470 and the change to util_ldap_connection_destroy I did tests with one data set that stepped through 11 usernames in nearly serial order, and another that was more of a random walk across the same usernames. Both data sets included some pseudo-random failures. It can be summarized as follows: 1) the CVS code left 10 sockets in use at the end of the test until I HUPed the server. It reached that near level pretty quickly and then stayed there. The authentication results of the CVS code were entirely unreliable. For the serial test, 34 good, 69 bad. ("good" is test cases the expected result) For the random test, 410 good, 394 bad. (Would you suggest combining CVS with another patch?) The versions 2) and 3) both returned 100% good results Version (3) promptly closed all connections (as expected) Version (2) on the serial test left 9 sockets in use at the end; on the random test left 4 sockets in use at the end Both test data sets showed some reuse of sockets.
I'm adding this note to document some further tests on patch 11296. E-mail correspondence with Denis Gervalle, suggested I should test the effects of the number of processes. (All my tests are on Linux running the prefork model.) I did the tests above with the default settings of StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 0 For comparison, I set up a low process number test with: StartServers 1 MinSpareServers 1 MaxSpareServers 1 MaxClients 150 MaxRequestsPerChild 0 and high process number test with: StartServers 10 MinSpareServers 10 MaxSpareServers 20 MaxClients 150 MaxRequestsPerChild 0 I ran the serial and random test data against these two new configurations under 2.0.49 with patch 11296 All the test results were correct; they differed in socket usage. The "low process" config left 1 socket open to the LDAP server at the end of both data sets. The "high process" config left 15 sockets open at the end of the serial data set and 13 sockets open at the end of the random data set. Combined with the test above, this seems to indicate that 11296 is holding sockets between requests on the order of one per process. This rate of usage looks fairly stable over time. It goes up and down in tests, but there's no long-term upward trend as there had been with 10470. In all my tests I'm getting a log message [debug] util_ldap.c(1139): LDAP cache: Unable to init Shared Cache: no file which I guess indicates there's no shared state among processes. I've tried to explicitly specify a cache file writable by the web server, but it does not seem to have any effect.
Please try the patch at http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27748 and tell me if it fixes this problem. This patch has been applied to v2.1.0-dev, and awaits backporting to v2.0.50-dev. This is specifically in reference to the auth failures described.
*** This bug has been marked as a duplicate of 27748 ***
I repeated my test set up with the roll-up patch 11618 from bug 27748 The results look good. I'm now getting no unexpected authentication results, and socket usage looks similar to Denis Gervalle's previous patch. I still have the warning "LDAP cache: Unable to init Shared Cache: no file", but I suppose that's a different issue.