Even after fix bug 56642, mod_proxy_wstunnel with ProxyWebsocketAsync On and ProxyWebsocketIdleTimeout doesn't work well. I sent and receive small data many times, and sometimes went infinite loop. I got core and backtrace, and I saw one loop was inside of apr_skiplist_insert (event.c line 1464), and another is on while loop (event.c line 1767). I looked at event.c and found all timer_skiplist seemed to be well protected with g_timer_skiplist_mtx, so I don't have any idea why the infinity loop occurs.
I notice there is no checking of the return code for lock/unlock of those mutexes. They could be failing
(In reply to Eric Covener from comment #1) > I notice there is no checking of the return code for lock/unlock of those > mutexes. They could be failing OK, I try to put trace logs to log whether lock/unlock success or not.
Maybe the following path can help? Index: server/mpm/event/event.c =================================================================== --- server/mpm/event/event.c (revision 1603193) +++ server/mpm/event/event.c (working copy) @@ -1975,7 +1975,9 @@ static void * APR_THREAD_FUNC listener_thread(apr_ int i = 0; socket_callback_baton_t *baton = (socket_callback_baton_t *) pt->baton; if (baton->cancel_event) { + apr_thread_mutex_lock(g_timer_skiplist_mtx); baton->cancel_event->canceled = 1; + apr_thread_mutex_unlock(g_timer_skiplist_mtx); } /* We only signal once per N sockets with this baton */ -- There seem to be a race between the cancellation check and set.
(In reply to Yann Ylavic from comment #3) > There seem to be a race between the cancellation check and set. Hmm, these are in the same thread, sorry for the noise.
(In reply to Takashi Sato from comment #2) > (In reply to Eric Covener from comment #1) > > I notice there is no checking of the return code for lock/unlock of those > > mutexes. They could be failing > > OK, I try to put trace logs to log whether lock/unlock success or not. I tried. lock/unlock succeed... Then, I tried to migrate apr_skiplist to apr_array, then works fine (of cource performance is bad and may have memory leaks)! It seems apr_skiplist has some bug.
(In reply to Takashi Sato from comment #5) > (In reply to Takashi Sato from comment #2) > > (In reply to Eric Covener from comment #1) > > > I notice there is no checking of the return code for lock/unlock of those > > > mutexes. They could be failing > > > > OK, I try to put trace logs to log whether lock/unlock success or not. > > I tried. lock/unlock succeed... > > Then, I tried to migrate apr_skiplist to apr_array, then works fine (of > cource performance is bad and may have memory leaks)! > It seems apr_skiplist has some bug. You might try Yanns patch. I don't know anything about the skiplist impl, but I don't know what might get moved while other threads add/remove unrelated entries. In that case, his lock would unfortunately be required.
Created attachment 31733 [details] patch for debug (event MPM using apr_array for timer list)
I observed apr_skiplist corruption, so I have annalyzed apr_skiplist, and found the bug 56654. With 56654 fixed APR, I don't see any infinite loop. I don't think Yann's patch. baton->cancel_event->canceled does not affect any for timer_skiplist.