There seems to be a bug in the condition variable implementation on WIN32 platforms (see locks/win32/thread_cond.c). An application using apr_thread_cond_wait will run very well under linux (see httpd2's worker-mpm: fdqueue.c -> ap_queue_pop uses condition variables). But under WIN32 already with two threads using the same condition variable a deadlock occurs after short time. Debugging brought me to the source of apr_thread_cond_wait(); as you can see the function "apr_thread_mutex_unlock(mutex)" is called every time the while(1)-loop is run. I think this shouldn't be a problem in general (since an unlocked mutex can't be unlocked once more), but at least on WIN32 it *did* cause trouble. And I am no windows guru so I don't know if/how windows bothers about multiple unlocking. After changing the code to unlock the mutex only once everything worked very fine. :-) I tested the patch with Win2K. Without patch, my application ran (with 2 worker threads) only up to 20-30 loops before ending in an deadlock; now I tested 200.000 loops without problems. Another approach would be to add a counter to every mutex and ignore unlock calls when counter==0. Klaus --- locks/win32/thread_cond-old.c Mon Feb 23 11:56:45 2004 +++ locks/win32/thread_cond.c Mon Feb 23 12:01:46 2004 @@ -85,6 +85,8 @@ { DWORD res; + int unlock_once = 1; + while (1) { res = WaitForSingleObject(cond->mutex, INFINITE); if (res != WAIT_OBJECT_0) { @@ -93,7 +95,10 @@ cond->num_waiting++; ReleaseMutex(cond->mutex); - apr_thread_mutex_unlock(mutex); + if (unlock_once) { + unlock_once = 0; + apr_thread_mutex_unlock(mutex); + } res = WaitForSingleObject(cond->event, INFINITE); cond->num_waiting--; if (res != WAIT_OBJECT_0) { @@ -125,6 +130,8 @@ DWORD res; DWORD timeout_ms = (DWORD) apr_time_as_msec(timeout); + int unlock_once = 1; + while (1) { res = WaitForSingleObject(cond->mutex, timeout_ms); if (res != WAIT_OBJECT_0) { @@ -136,7 +143,10 @@ cond->num_waiting++; ReleaseMutex(cond->mutex); - apr_thread_mutex_unlock(mutex); + if (unlock_once) { + unlock_once = 0; + apr_thread_mutex_unlock(mutex); + } res = WaitForSingleObject(cond->event, timeout_ms); cond->num_waiting--; if (res != WAIT_OBJECT_0) {
PatchAvailable is a special term for the Keywords field, but close enough ;)
There's a general problem when calling apr_mutex_unlock several times under WIN32; since this might call the WIN32 API function "LeaveCriticalSection". Quoting MSDN docs on LeaveCriticalSection: ---snip--- [...] If a thread calls LeaveCriticalSection when it does not have ownership of the specified critical section object, an error occurs that may cause another thread using EnterCriticalSection to wait indefinitely. [...] ---snip--- However - this patch removes problems when using condvars under WIN32. Another solution would be to add a "lock counter" to APR mutexes.
Created attachment 15721 [details] This patch avoid race conditions. The previous patch does not The previous patch addressed only the unlock being called more than once. This attachment avoids race conditions that the previous patch doesn't. This patch also fixes the multiple calls to unlock. This patch also consolidates the the duplicate efforts in apr_thread_cond_wait and apr_thread_cond_timedwait
Created attachment 15722 [details] Patch tested with jxta-c project another patch to address the same issue and which would also fix bug 34336 as it totally removing the cond->mutex. I believe using the mutex passed in to ensure mutual access should be sufficient. Unless the different threads can use different mutex for the same cond, but I don't see that as a valid usage and nor can I think of a use case for that.
Created attachment 15725 [details] new patch also fix broadcast problems pointed by E Holyat This patch fix the broadcast issue pointed out by E Holyat
Patch had been applied and available in 1.2.1, also had been backported.