Bug 66482 - Nio2 websocket timeout cause the response is no longer possible, always pending
Summary: Nio2 websocket timeout cause the response is no longer possible, always pending
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: WebSocket (show other bugs)
Version: 8.5.84
Hardware: PC All
: P2 major (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-20 04:41 UTC by zhougang
Modified: 2023-02-28 09:14 UTC (History)
0 users



Attachments
Exception stack (135.55 KB, image/png)
2023-02-20 04:54 UTC, zhougang
Details

Note You need to log in before you can comment on or make changes to this bug.
Description zhougang 2023-02-20 04:41:13 UTC
In an update of Tomcat (8.5.54 -- 8.5.84), we found in the test that the server could not respond to HTTP requests, some requests were in Pending state and some were normal. This situation persists and you can only restart Tomcat. After testing, we determined that the problem was caused by the version upgrade, and it only occurred in the server.xml configuration Nio2, and the configuration for Nio did not have this problem.

Nio2 configured to: server XML Connector set protocol to org. Apache. Coyote. Http11. Http11Nio2Protocol



A ThreadLocal variable called inlineCompletion in Nio2Endpoint.java has an exception to the run method between startInline and endInline, which results in endInline not executing. The variable state is true. In the readCompletionHandler completed, the Nio2Endpoint.isInline() result is always true, and the thread will no longer execute getEndpoint().processsocket.

Here is the stack that causes this problem:

java.lang.IllegalStateException: Writing not allowed due to timeout or cancellation

at sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:350)

at sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:416)

at org.apache.tomcat.util.net.Nio2Channel.write(Nio2Channel.java:184)

at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$Nio2OperationState.run(Nio2Endpoint.java:1098)

at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$Nio2OperationState.start(Nio2Endpoint.java:1040)

at org.apache.tomcat.util.net.SocketWrapperBase.vectoredOperation(SocketWrapperBase.java:1469)

at org.apache.tomcat.util.net.SocketWrapperBase.write(SocketWrapperBase.java:1395)

at org.apache.tomcat.util.net.SocketWrapperBase.write(SocketWrapperBase.java:1366)

at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:93)

at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:512)

at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:314)

at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:262)

at org.apache.tomcat.websocket.WsSession.sendCloseMessage(WsSession.java:824)

at org.apache.tomcat.websocket.WsSession.onClose(WsSession.java:747)

at org.apache.tomcat.websocket.WsFrameBase.processDataControl(WsFrameBase.java:367)

at org.apache.tomcat.websocket.WsFrameBase.processData(WsFrameBase.java:296)

at org.apache.tomcat.websocket.WsFrameBase.processInputBuffer(WsFrameBase.java:133)

at org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:85)

at org.apache.tomcat.websocket.server.WsFrameServer.doOnDataAvailable(WsFrameServer.java:183)

at org.apache.tomcat.websocket.server.WsFrameServer.notifyDataAvailable(WsFrameServer.java:162)

at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.upgradeDispatch(WsHttpUpgradeHandler.java:157)

at org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:60)

at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:59)

at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:885)

at org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.doRun(Nio2Endpoint.java:1725)

at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)

at org.apache.tomcat.util.net.AbstractEndpoint.processSocket(AbstractEndpoint.java:1184)

at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$2.completed(Nio2Endpoint.java:637)

at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$2.completed(Nio2Endpoint.java:615)

at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)

at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:281)

at sun.nio.ch.WindowsAsynchronousSocketChannelImpl$ReadTask.completed(WindowsAsynchronousSocketChannelImpl.java:579)

at sun.nio.ch.Iocp$EventHandlerTask.run(Iocp.java:397)

at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)



Problems in: at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$Nio2OperationState.run (Nio2Endpoint. Java: 1098)



And make sure the problem is since 8.5.76, commit id is f2cf25b270a84d0daeefc4f215aa2f56e1df99 01



The above problem is in server.xml, <Connector>. Did not appear after the tag's protocol was deleted or set to Nio.

We do not see a fix for this issue in the latest version 8.5.85.
Comment 1 zhougang 2023-02-20 04:54:13 UTC
Created attachment 38498 [details]
Exception stack
Comment 2 Remy Maucherat 2023-02-20 08:55:05 UTC
Ok, so an exception was not really supposed to happen here, so there was no try/finally in Nio2OperationState.start. Probably not a great idea.

However, I don't understand why the socket close backport would be related to this. The Nio2OperationState.start and the rest were the same before that.
Comment 3 Mark Thomas 2023-02-20 10:33:12 UTC
It might be because 01f2cf25b270a84d0daeefc4f215aa2f56e1df99 also changed the WebSocket implementation to use AsyncIO.
Comment 4 Remy Maucherat 2023-02-20 15:41:35 UTC
Ok, so that makes sense. I checked the code and both read and write can throw this undocumented exception (there are other exceptions but they don't occur, unlike this one apparently).
The easiest is probably going to be to do a try/catch for the whole Nio2OperationState.run and call the completion failure with the exception.
Comment 5 zhougang 2023-02-21 01:37:42 UTC
(In reply to Remy Maucherat from comment #4)
> Ok, so that makes sense. I checked the code and both read and write can
> throw this undocumented exception (there are other exceptions but they don't
> occur, unlike this one apparently).
> The easiest is probably going to be to do a try/catch for the whole
> Nio2OperationState.run and call the completion failure with the exception.

Yes, I did try/finally for the entire Nio2OperationState.run and then the problem did not occur in the test.  But are there any side effects to this modification?  Because we've seen elsewhere that there are isInline methods that do something based on this variable value.
Comment 6 Remy Maucherat 2023-02-27 12:33:21 UTC
I've never seen this exception before, so it is probably not very common. Restoring the state with a try/finally and propagating this very unexpected exception is likely good enough, especially since you were nice enough to test it. I credited you as zhougang but if you would like something else let me know.

The patch will be in 11M4, 10.1.7, 9.0.73 and 8.5.87.
Comment 7 zhougang 2023-02-28 09:14:31 UTC
(In reply to Remy Maucherat from comment #6)
> I've never seen this exception before, so it is probably not very common.
> Restoring the state with a try/finally and propagating this very unexpected
> exception is likely good enough, especially since you were nice enough to
> test it. I credited you as zhougang but if you would like something else let
> me know.
> 
> The patch will be in 11M4, 10.1.7, 9.0.73 and 8.5.87.

OK. Thanks for your response and quick fix. This exception is very unusual and only occurs when there are many websocket send timeout and frequent attempts to establish connections. And I have no more questions.