Bug 44865 - mod_dav's lock database becomes consistently corrupt
Summary: mod_dav's lock database becomes consistently corrupt
Status: ASSIGNED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_dav (show other bugs)
Version: 2.2.8
Hardware: PC Linux
: P2 normal with 1 vote (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-23 10:19 UTC by Jeremy Orem
Modified: 2012-09-17 09:39 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Orem 2008-04-23 10:19:27 UTC
I am seeing the following errors in my apache logs:

[Wed Apr 23 09:58:33 2008] [error] [client 10.2.80.101] Could not LOCK /user/27afdfe55a109f3c9bd486328c548e5c10f14ae2/ due to a failed precondition (e.g. other locks).  [500, #0]
[Wed Apr 23 09:58:33 2008] [error] [client 10.2.80.101] The locks could not be queried for verification against a possible "If:" header.  [500, #0]
[Wed Apr 23 09:58:33 2008] [error] [client 10.2.80.101] (2)No such file or directory: The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]

This typically happens about 5 - 20 min after deleting the lock database.

The server is "high volume" running a max of about 24 dav req/s and averages about 4.11 dav req/s.

The lock database is sitting on a ext3 partition.
Comment 1 Joe Orton 2008-04-24 02:57:43 UTC
To be clear, you are manually (and deliberately) deleting the lock database?

This seems like expected behaviour, if so.

A request has been submitted with a LOCK which references a previously-existing lock (e.g. on /user/).  mod_dav will do a R/O attempt to open the lock database, to verify the status of that lock.  That fails since the lock database has gone.
Comment 2 Jeremy Orem 2008-04-24 11:06:34 UTC
(In reply to comment #1)
> To be clear, you are manually (and deliberately) deleting the lock database?

Sorry I wasn't very clear there.  By deleting the lock database I meant:

* See corruption in logs
* Stop apache
* Delete lock database
* Start apache
Comment 3 Joe Orton 2008-04-25 03:08:33 UTC
OK, so what are the "corruption" errors which are being logged which lead you to delete the lock database in the first place?
Comment 4 Jeremy Orem 2008-04-25 09:16:19 UTC
The errors I pasted are the errors that cause me to delete the lock database. 

More examples: 

cat /var/log/httpd/error_log | grep corrupt  |tail -n 4
[Fri Apr 25 09:13:00 2008] [error] [client 10.2.80.101] (2)No such file or directory: The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]
[Fri Apr 25 09:13:41 2008] [error] [client 10.2.80.101] (2)No such file or directory: The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]
[Fri Apr 25 09:14:10 2008] [error] [client 10.2.80.101] The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]
[Fri Apr 25 09:14:11 2008] [error] [client 10.2.80.101] The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]


Note that I haven't deleted the lock database since around Apr 15.
Comment 5 Joe Orton 2008-04-25 09:33:52 UTC
Has the lock database been re-created on disk at the point these errors are logged?
Comment 6 Jeremy Orem 2008-04-25 10:25:00 UTC
It doesn't look like the lock database is being recreated.  I watched the lock directory for about a minute with inotify.  Results:

     43 lockdb.dir IN_CLOSE_WRITE
     43 lockdb.dir IN_OPEN
     46 lockdb.dir IN_ACCESS
     46 lockdb.pag IN_CLOSE_WRITE
     46 lockdb.pag IN_OPEN
    187 lockdb.pag IN_MODIFY
    243 lockdb.pag IN_ACCESS

I was able to trigger

[Fri Apr 25 10:24:16 2008] [error] [client 10.2.80.101] (2)No such file or directory: The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]

several times while watching.  

The strange thing is that it seems to only become corrupt for certain directories.  For example even though locking /user/05c3d5e2202a7f92edddfb4753c5fa771758daeb/ causes 500 errors I can lock a different directory with no problems.
Comment 7 Jeremy Orem 2008-05-09 17:28:51 UTC
Anything else I can do to assist here?
Comment 8 Wim Lewis 2011-04-19 20:56:24 UTC
I am seeing identical or very similar corruption (2.2.15 on MacOSX). In my case I think that bug #50773 might the specific cause.
Comment 9 Werner Schalk 2012-09-17 09:39:19 UTC
I can confirm this bug exists in 2.2.22 and I am constantly hitting it. Any fixes in sight?