There is a long lived bug either in the JDK or even in Linux epoll implementation that makes it possible for the select() call to return immediately with 0 SelectionKey to be processed. In this case, if you call back the select() funtion immediately, you'll get a 100% CPU usage. A workaround has been implemented in Apache MINA, in Netty, in Grizzly, but I don't see such a workaround implemented in Tomcat. The idea is to avoid calling back select() if the previous call has returned 0, after a few iteration. In this case, a new Selector is created, all the channels registered in the old selector are registered in the new selector, and the old selector is ditched. You can have a look at Grizzly code, line 501 : https://github.com/javaee/grizzly/blob/master/modules/grizzly/src/main/java/org/glassfish/grizzly/nio/SelectorRunner.java Or Apache MINA, line 609 and following : https://github.com/apache/mina/blob/2.1.X/mina-core/src/main/java/org/apache/mina/core/polling/AbstractPollingIoProcessor.java Or in Netty, line 849 and following : https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/nio/NioEventLoop.java This workaround is critical for those three projects to properly work on Linux (the problem does not exist on Windows or Mac OSX, this is the reason Grizzly has added a flag to activate it or not). FTR, I'm currently being hit by such a random CPU 100% peak on a project I'm working on. A thread dump shows that the thread consuming the CPU is the one doing the infinite select() loop : at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:825) at java.lang.Thread.run(Thread.java:748)
I would recommend investigating and discussing this first on the user list.
Discussion started on the users mailing list (I would have assumed that it would rather be a dev mailing list discussion, but I followed your advice) Thanks !
Following the discussion on the mailing list, and given I could find only one mention of a possible issue overall ( https://github.com/netty/netty/issues/327 ), I will not add a workaround for now. I did not get feedback on the NIO2 resilience to this possible problem. Leaving this open for further research.
The associated JRE bug is https://bugs.openjdk.java.net/browse/JDK-8238279 I have confirmed that the reproducer provided with that bug (https://github.com/cedric780/EPollArrayWrapper-bug) still triggers with the latest Java 8 from Adopt OpenJDK. I think this is enough evidence to implement a work-around in Tomcat.
Ok, so there's a reproducer for this now. It's supposedly fixed in Java 11. Personally, given the ugliness of the workaround, the rarity of the issue and the fact that there's a fix, I would rather not do anything.
I ran 10 tests with Java 11 and didn't see the issue. The developer of the reproducer also confirmed the issue is fixed in Java 11. I'm happy to implement a work-around but I'd be equally happy with closing this as WONTFIX and pointing folks that are experiencing this issue to Java 11 and/or the Java 8 bug. Given your preference for WONTFIX are there any objections to taking that approach?
Resolving as WONTFIX as per previous comments