44865 – mod_dav's lock database becomes consistently corrupt

Bug 44865 - mod_dav's lock database becomes consistently corrupt

Summary: mod_dav's lock database becomes consistently corrupt

Status:	ASSIGNED

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	mod_dav (show other bugs)
Version:	2.2.8
Hardware:	PC Linux

Importance:	P2 normal with 1 vote (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-04-23 10:19 UTC by Jeremy Orem
Modified:	2012-09-17 09:39 UTC (History)
CC List:	2 users (show)

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jeremy Orem 2008-04-23 10:19:27 UTC

I am seeing the following errors in my apache logs:

[Wed Apr 23 09:58:33 2008] [error] [client 10.2.80.101] Could not LOCK /user/27afdfe55a109f3c9bd486328c548e5c10f14ae2/ due to a failed precondition (e.g. other locks).  [500, #0]
[Wed Apr 23 09:58:33 2008] [error] [client 10.2.80.101] The locks could not be queried for verification against a possible "If:" header.  [500, #0]
[Wed Apr 23 09:58:33 2008] [error] [client 10.2.80.101] (2)No such file or directory: The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]

This typically happens about 5 - 20 min after deleting the lock database.

The server is "high volume" running a max of about 24 dav req/s and averages about 4.11 dav req/s.

The lock database is sitting on a ext3 partition.

Comment 1 Joe Orton 2008-04-24 02:57:43 UTC

To be clear, you are manually (and deliberately) deleting the lock database?

This seems like expected behaviour, if so.

A request has been submitted with a LOCK which references a previously-existing lock (e.g. on /user/).  mod_dav will do a R/O attempt to open the lock database, to verify the status of that lock.  That fails since the lock database has gone.

Comment 2 Jeremy Orem 2008-04-24 11:06:34 UTC

(In reply to comment #1)
> To be clear, you are manually (and deliberately) deleting the lock database?

Sorry I wasn't very clear there.  By deleting the lock database I meant:

* See corruption in logs
* Stop apache
* Delete lock database
* Start apache

Comment 3 Joe Orton 2008-04-25 03:08:33 UTC

OK, so what are the "corruption" errors which are being logged which lead you to delete the lock database in the first place?

Comment 4 Jeremy Orem 2008-04-25 09:16:19 UTC

The errors I pasted are the errors that cause me to delete the lock database. 

More examples: 

cat /var/log/httpd/error_log | grep corrupt  |tail -n 4
[Fri Apr 25 09:13:00 2008] [error] [client 10.2.80.101] (2)No such file or directory: The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]
[Fri Apr 25 09:13:41 2008] [error] [client 10.2.80.101] (2)No such file or directory: The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]
[Fri Apr 25 09:14:10 2008] [error] [client 10.2.80.101] The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]
[Fri Apr 25 09:14:11 2008] [error] [client 10.2.80.101] The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]


Note that I haven't deleted the lock database since around Apr 15.

Comment 5 Joe Orton 2008-04-25 09:33:52 UTC

Has the lock database been re-created on disk at the point these errors are logged?

Comment 6 Jeremy Orem 2008-04-25 10:25:00 UTC

It doesn't look like the lock database is being recreated.  I watched the lock directory for about a minute with inotify.  Results:

     43 lockdb.dir IN_CLOSE_WRITE
     43 lockdb.dir IN_OPEN
     46 lockdb.dir IN_ACCESS
     46 lockdb.pag IN_CLOSE_WRITE
     46 lockdb.pag IN_OPEN
    187 lockdb.pag IN_MODIFY
    243 lockdb.pag IN_ACCESS

I was able to trigger

[Fri Apr 25 10:24:16 2008] [error] [client 10.2.80.101] (2)No such file or directory: The lock database was found to be corrupt. An indirect lock's direct lock could not be found.  [500, #402]

several times while watching.  

The strange thing is that it seems to only become corrupt for certain directories.  For example even though locking /user/05c3d5e2202a7f92edddfb4753c5fa771758daeb/ causes 500 errors I can lock a different directory with no problems.

Comment 7 Jeremy Orem 2008-05-09 17:28:51 UTC

Anything else I can do to assist here?

Comment 8 Wim Lewis 2011-04-19 20:56:24 UTC

I am seeing identical or very similar corruption (2.2.15 on MacOSX). In my case I think that bug #50773 might the specific cause.

Comment 9 Werner Schalk 2012-09-17 09:39:19 UTC

I can confirm this bug exists in 2.2.22 and I am constantly hitting it. Any fixes in sight?