Bug 50237 - webdav LOCK access with multiple threads to same folder causes UNLOCK failures
Summary: webdav LOCK access with multiple threads to same folder causes UNLOCK failures
Status: NEEDINFO
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_dav (show other bugs)
Version: 2.5-HEAD
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-08 13:33 UTC by Mike Henry
Modified: 2010-11-18 18:16 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Henry 2010-11-08 13:33:15 UTC
If multiple processes or threads are writing to unique files in the same folder on a webdav server using LOCKS, intermittent webdav errors will occur during the UNLOCK request.

Server configuration: Centos 5.5. Tested with Apache/2.2.3, Apache/2.2.17 and Apache/2.3.8.

I have a custom application that uses libcurl to upload files to a webdav server. However, I have reproduced the same problem using cadaver, a simple script and a locally hosted Apache webdav server.

The error occurs during an UNLOCK. A 400 (bad request) is returned. Once this failure occurs, the file remains locked. I used several methods over several days to ensure the LOCK/PUT/UNLOCK sequences being issued are valid and contain the correct lock token (wireshark, added custom logging to dav, etc.).

I used a patched version of cadaver that adds scripting ability so I could automate the test.

Here is what the test does:

One shell script is repeatedly running this:

1. LOCK webdav/test/a.bin 2. PUT a.bin webdav/test/a.bin 3. UNLOCK webdav/test/a.bin

At the same time, a second shell script is repeatedly running this (same webdav folder, but accessing a different file):

1. LOCK webdav/test/b.bin 2. PUT b.bin webdav/test/b.bin 3. UNLOCK webdav/test/b.bin

The test repeats anywhere from a few times to several hundred times before the failure occurs. Below is the output from cadaver at the failure point:

Locking `webdav/test/a.bin': succeeded. Lock token <opaquelocktoken:7ca6e55d-1bf6-4ce0-a8a4-7d97117db8af>: Depth infinity on `http://localhost:81/webdav/test/a.bin' Scope: exclusive Type: write Timeout: infinite Owner: Uploading a.bin to `/webdav/test/a.bin': [.......... succeeded. Unlocking `webdav/test/a.bin': failed: 400 Bad Request

At the failure point, this message is in the Apache error_log:

(11)Resource temporarily unavailable: The locktoken specified in the "Lock-Token:" header is invalid because this resource has no outstanding locks. [400, #0]

I tried adding additional debug messages to dav, but have not been able to track down why the lock is not valid. I also tried to make a debug build of Apache so I could use gdb to help debug at least one of the scripts to better understand the flow of the lock data, but was unable to get debug symbols in the binary (I used the --enable-maintainer-mode configure option).
Comment 1 Stefan Fritsch 2010-11-10 15:53:21 UTC
I could not reproduce this (neither with 2.2.16 nor with trunk).

I suspect that there may be a lock contention when accessing the dbm file that stores the locks. Which dbm library are you using? (Try 'ldd /usr/lib/apr-util-1/apr_dbm_db.so' or 'ldd /usr/lib/libaprutil-1.so' depending on how you compiled it). I have Berkley DB 4.8.

To get more info, you could also try strace-ing httpd and see which system call returns 'Resource temporarily unavailable' (EAGAIN). If it is something related to your dav lockdb, that would give a hint. But it's also possible that it is not related to DAV at all but a bug in the logging code.
Comment 2 Jeff Trawick 2010-11-10 16:02:35 UTC
>To get more info, you could also try strace-ing httpd and see which system 
>call returns 'Resource temporarily unavailable' (EAGAIN).

Mike, when you tested at trunk did you see 'Resource temporarily available' logged?

Possibly the logging of EAGAIN is because the dav error APIs prior to httpd 2.3.4 did not take an apr_status_t parameter, and assumed errno was meaningful for the problem at hand (and that's a common error for network I/O).
Comment 3 Mike Henry 2010-11-18 18:16:47 UTC
I lost my PC that I was using the debug the issue. I will have to setup everything up on another system. I will update the bug after completed.