Bug 51794 - Race condition in NioEndpoint$Poller causes socket to not be read until selectorTimeout
Race condition in NioEndpoint$Poller causes socket to not be read until selec...
Status: RESOLVED FIXED
Product: Tomcat 6
Classification: Unclassified
Component: Catalina
6.0.33
PC Mac OS X 10.4
: P2 normal (vote)
: default
Assigned To: Tomcat Developers Mailing List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2011-09-09 18:51 UTC by dlord
Modified: 2011-09-21 12:08 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dlord 2011-09-09 18:51:35 UTC
I'm seeing an occasional race condition that happens between NioEndpoint$Poller#run and Http11NioProtocol finishing an event.

What happens is that there is a race condition upon waking up the Selector that can cause a SelectionKey to not be marked with InterestOps.READ as soon as it could.  The race is on Poller.wakeupCounter.  These steps can occur which causes my SelectionKey to not have its interest ops reset soon and the Selector to not have wakeup() called on it.

1. Poller0 - Starts processing its run loop and calls events().
2. Poller0 - Checks the wakeupCounter.get() > 0 which evaluates to false (wakeupCounter is 0).
3. Worker1 - Finishes an event and calls in to Poller#addEvent
4. Worker1 - Enqueues the PollerEvent that will reset the interest ops.
5. Worker1 - Calls wakeupCounter.incrementAndGet to increment and get wakeupCounter to 1.  This fails the check to call selector.wakeup();
6. Poller0 - Calls wakeupCounter.set(-1)
7. Poller0 - Calls selector.select(selectorTimeout) and blocks for either the full time out or until another unrelated SelectionKey is polled.
8. Poller0 - Ultimately gets around to calling events() again which now resets the interest ops for the SelectionKey that was processed by Worker1.

The simple fix for this is to change this code:
                            if (wakeupCounter.get() > 0) {
                                //if we are here, means we have other stuff to do
                                //do a non blocking select
                                keyCount = selector.selectNow();
                            } else {
                                wakeupCounter.set(-1);
                                keyCount = selector.select(selectorTimeout);
                            }

to this 
                            if (wakeupCounter.getAndSet(-1) > 0) {
                                //if we are here, means we have other stuff to do
                                //do a non blocking select
                                keyCount = selector.selectNow();
                            } else {
                                keyCount = selector.select(selectorTimeout);
                            }
Comment 1 Filip Hanik 2011-09-14 14:20:11 UTC
Thanks for the detailed info. We'll get this fixed.
Comment 2 Filip Hanik 2011-09-14 14:44:48 UTC
trunk - r1170647
7.0.x - r1170656
6.0.x recommended for fix in status file
Comment 3 Mark Thomas 2011-09-21 12:08:20 UTC
Fixed in 6.0.x and will be included in 6.0.34 onwards.