Bug 27654 - Deadlock in apr_thread_cond_wait on WIN32 platform
Summary: Deadlock in apr_thread_cond_wait on WIN32 platform
Status: RESOLVED FIXED
Alias: None
Product: APR
Classification: Unclassified
Component: APR (show other bugs)
Version: 0.9.4
Hardware: PC Windows 2000
: P3 normal with 4 votes (vote)
Target Milestone: ---
Assignee: Apache Portable Runtime bugs mailinglist
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2004-03-15 00:12 UTC by Klaus Keppler
Modified: 2005-08-29 15:05 UTC (History)
0 users



Attachments
This patch avoid race conditions. The previous patch does not (5.40 KB, patch)
2005-07-20 20:47 UTC, eholyat
Details | Diff
Patch tested with jxta-c project (3.76 KB, patch)
2005-07-20 21:14 UTC, Henry Jen
Details | Diff
new patch also fix broadcast problems pointed by E Holyat (5.30 KB, patch)
2005-07-21 01:39 UTC, Henry Jen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Klaus Keppler 2004-03-15 00:12:14 UTC
There seems to be a bug in the condition variable implementation
on WIN32 platforms (see locks/win32/thread_cond.c).

An application using apr_thread_cond_wait will run very
well under linux (see httpd2's worker-mpm:
fdqueue.c -> ap_queue_pop uses condition variables).
But under WIN32 already with two threads using the same
condition variable a deadlock occurs after short time.

Debugging brought me to the source of apr_thread_cond_wait();
as you can see the function "apr_thread_mutex_unlock(mutex)"
is called every time the while(1)-loop is run.
I think this shouldn't be a problem in general (since an unlocked
mutex can't be unlocked once more), but at least on WIN32 it *did*
cause trouble. And I am no windows guru so I don't know
if/how windows bothers about multiple unlocking.

After changing the code to unlock the mutex only once
everything worked very fine. :-)

I tested the patch with Win2K.
Without patch, my application ran (with 2 worker threads) only up to 20-30
loops before ending in an deadlock; now I tested 200.000 loops without
problems.

Another approach would be to add a counter to every mutex and
ignore unlock calls when counter==0.

Klaus


--- locks/win32/thread_cond-old.c       Mon Feb 23 11:56:45 2004
+++ locks/win32/thread_cond.c   Mon Feb 23 12:01:46 2004
@@ -85,6 +85,8 @@
 {
     DWORD res;

+    int unlock_once = 1;
+
     while (1) {
         res = WaitForSingleObject(cond->mutex, INFINITE);
         if (res != WAIT_OBJECT_0) {
@@ -93,7 +95,10 @@
         cond->num_waiting++;
         ReleaseMutex(cond->mutex);

-        apr_thread_mutex_unlock(mutex);
+        if (unlock_once) {
+            unlock_once = 0;
+            apr_thread_mutex_unlock(mutex);
+        }
         res = WaitForSingleObject(cond->event, INFINITE);
         cond->num_waiting--;
         if (res != WAIT_OBJECT_0) {
@@ -125,6 +130,8 @@
     DWORD res;
     DWORD timeout_ms = (DWORD) apr_time_as_msec(timeout);

+    int unlock_once = 1;
+
     while (1) {
         res = WaitForSingleObject(cond->mutex, timeout_ms);
         if (res != WAIT_OBJECT_0) {
@@ -136,7 +143,10 @@
         cond->num_waiting++;
         ReleaseMutex(cond->mutex);

-        apr_thread_mutex_unlock(mutex);
+        if (unlock_once) {
+            unlock_once = 0;
+            apr_thread_mutex_unlock(mutex);
+        }
         res = WaitForSingleObject(cond->event, timeout_ms);
         cond->num_waiting--;
         if (res != WAIT_OBJECT_0) {
Comment 1 Jeff Trawick 2004-03-15 00:14:13 UTC
PatchAvailable is a special term for the Keywords field, but close enough ;)
Comment 2 Klaus Keppler 2004-08-28 16:58:08 UTC
There's a general problem when calling apr_mutex_unlock several times under 
WIN32; since this might call the WIN32 API function "LeaveCriticalSection".

Quoting MSDN docs on LeaveCriticalSection:
---snip---
[...]
If a thread calls LeaveCriticalSection when it does not have ownership of the 
specified critical section object, an error occurs that may cause another 
thread using EnterCriticalSection to wait indefinitely.
[...]
---snip---

However - this patch removes problems when using condvars under WIN32.
Another solution would be to add a "lock counter" to APR mutexes.
Comment 3 eholyat 2005-07-20 20:47:48 UTC
Created attachment 15721 [details]
This patch avoid race conditions.  The previous patch does not

The previous patch addressed only the unlock being called more than once.

This attachment avoids race conditions that the previous patch doesn't. This
patch also fixes the multiple calls to unlock. This patch also consolidates the
the duplicate efforts in apr_thread_cond_wait and apr_thread_cond_timedwait
Comment 4 Henry Jen 2005-07-20 21:14:05 UTC
Created attachment 15722 [details]
Patch tested with jxta-c project

another patch to address the same issue and which would also fix bug 34336 as
it totally removing the cond->mutex. I believe using the mutex passed in to
ensure mutual access should be sufficient.

Unless the different threads can use different mutex for the same cond,
but I don't see that as a valid usage and nor can I think of a use case
for that.
Comment 5 Henry Jen 2005-07-21 01:39:09 UTC
Created attachment 15725 [details]
new patch also fix broadcast problems pointed by E Holyat

This patch fix the broadcast issue pointed out by E Holyat
Comment 6 Henry Jen 2005-08-29 23:05:55 UTC
Patch had been applied and available in 1.2.1, also had been backported.