Bug 63931

Summary: The remote endpoint was in state [TEXT_FULL_WRITING] which is an invalid state
Product: Tomcat 9 Reporter: Saurav Singh <sauravksingh>
Component: WebSocketAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: major    
Priority: P2    
Version: unspecified   
Target Milestone: -----   
Hardware: PC   
OS: All   

Description Saurav Singh 2019-11-18 07:49:20 UTC
Given:
RabbitMQ as a incoming streaming message source consumer 
When:
WebScoket subscribe for incoming message to show on UI. It works fine for initial few moments.
Then: 
The moment incoming streaming message rate goes more than 800messages/sec after few min started receiving below Exception:

 synchronized (this) {
 if (session.isOpen()) {
                    try {
                        log.info("Sending message to client session ID : " + session.getId());
                        ((StandardWebSocketSession) session).getNativeSession().getBasicRemote().sendText(message);
                    } catch (Exception e) {
                        log.error("Encountered an error sending to client!", e);
                    }
                }
}

we also used StampedLock at method level to socket session management.

------------
2019-11-18T13:06:32,084 [tainer#0-1] co.ea.ex.no.tr.we.WebSocketServerHandler ERROR: o.tr.we.WebSocketServerHandler(dToSession:  520) Encountered an error sending to client!
java.lang.IllegalStateException: The remote endpoint was in state [TEXT_FULL_WRITING] which is an invalid state for called method
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.checkState(WsRemoteEndpointImplBase.java:1229)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.textStart(WsRemoteEndpointImplBase.java:1191)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendString(WsRemoteEndpointImplBase.java:190)
	at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:37)
	at com.eaton.export.notifications.transport.websocket.WebSocketServerHandler.sendToSession(WebSocketServerHandler.java:518)
Comment 1 Mark Thomas 2019-11-18 09:27:33 UTC
Looking at the Tomcat code, there are two ways this can happen.

1. A previous messages has an error which throws an exception. This essentially means the WebSocket connection is no longer usable.

2. Attempting to send messages concurrently.

From your description, this looks like a case of 1. Once you see the exception, do all subsequent attempts to write a message trigger that exception? Or is it intermittent?
Comment 2 Saurav Singh 2019-11-18 09:39:35 UTC
Hey Mark Thanks for response.

 1. Once you see the exception, do all subsequent attempts to write a message trigger that exception? Answer --> Yes It does try to write message and trigger that excpetion and then all threads hung forever.


Or is it intermittent? Answer --> It is not intermittent it happened over the time always reproducable.
Comment 3 Mark Thomas 2019-11-18 09:48:08 UTC
You should see a call to onClose() before this happens. You should also see an IOException in the thread writing the message.

One thing I think Tomcat could improve on, is preventing applications trying to write further messages once the close message has been written to the client. Although in this scenario, that probably means you just get a different series of exceptions.
Comment 4 Saurav Singh 2019-11-18 09:56:57 UTC
So I am working on Spring boot websocket which has method a afterConnectionClosed(final WebSocketSession session, final CloseStatus status) throws Exception that I have overriden on same handler class however I haven't seen any log information being printed anywhere in log. I am not sure whether I should be like that or any method i have override or configuration changes can help me to move ahead.
Comment 5 Saurav Singh 2019-11-18 10:00:28 UTC
You can see just now got same issue after raising the bug ticket:

2019-11-18T15:29:29,041 [tainer#0-1] co.ea.ex.no.tr.we.WebSocketServerHandler ERROR: o.tr.we.WebSocketServerHandler(dToSession:  525) Encountered an error sending session StandardWebSocketSession[id=406e750f-554d-a786-8f01-6f0a24408833, uri=wss://localhost:9004/websocket] open to client!
java.lang.IllegalStateException: The remote endpoint was in state [TEXT_FULL_WRITING] which is an invalid state for called method
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.checkState(WsRemoteEndpointImplBase.java:1229)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.textStart(WsRemoteEndpointImplBase.java:1191)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendString(WsRemoteEndpointImplBase.java:190)
	at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:37)
Comment 6 Saurav Singh 2019-11-19 06:21:39 UTC
Can I expect any help from apahe team to understand what's wrong going on?
Comment 7 Mark Thomas 2019-11-19 12:48:07 UTC
From the tone of comment #6 you appear to have forgotten that everyone here is a volunteer, providing you with help and assistance (at no cost) to use the software that you have been given (also at no cost). 

What makes this worse is that it has already been explained to you what is going wrong. An explanation that was provided within 2 hours of the issue being opened.

Let me try again with more words.

Something goes wrong. This is probably a timeout due to congestion at the client and/or network and/or Tomcat because one or more of those components can't handle the message volume.

At the point something goes wrong Tomcat will:
- throw an IOException that the application will see (based on the stack trace you provided)
- close the session, triggering an onClose() event for the Endpoint that the application should handle

If the application ignores the Exception and the onClose() event and continues to try and write messages you will see the IllegalStateException reported in the original report.

That the application ignores / swallows / fails to handle the Exception is an application issue, not a Tomcat issue. This is something that needs to be fixed in the application, not in Tomcat.

That the application ignores / swallows / fails to handle the onClose() event is an application issue, not a Tomcat issue. This is something that needs to be fixed in the application, not in Tomcat. You may wish to enquire with the Spring team what the application needs to do to ensure that it is notified when the onClose() event is triggered.

The only area where Tomcat may be able to do more (and why this issue is still open rather than being resolved as INVALID) is preventing applications writing messages after a close frame has been written to the client. Even if we are able to improve that, the only difference you are likely to see is a difference exception if the application tries to write a message after the close frame has been sent.
Comment 8 Saurav Singh 2019-11-19 13:00:37 UTC
Hey Mark! Sorry if my tone of comment #6 sounds something different which wasn't meant to be its just being impatient.

When i raised the bug with Apche i didn't think of cost involved; its apache foundation decision to go open source or go license i respect that.

Bug was raised in seek to understand what's goffy thing going on in production when incoming messages rate increases. and to be very honest no IOException being thrown any where.

Thanks for your help with free of cost. Have a good one.
Comment 9 Mark Thomas 2019-12-03 22:00:05 UTC
That you aren't seeing any exceptions before this one is not what I would expect.

We have a unit test (TestWebSocketFrameClient) that can be fairly easily tweaked to simulate this scenario - a server sending data faster than a client can handle. I've done some investigation with this and I don't see the expected exceptions either.

Digging into the code, I think something might not be quite right with the low level asyncIO API. You can try useAsyncIO="false" on the Connector in server.xml to see if that helps.

Meanwhile, the thread to follow on the dev@ list is this one:
https://tomcat.markmail.org/thread/kt7rcu7utlb77wk4
Comment 10 Mark Thomas 2019-12-04 18:31:57 UTC
We tracked down the issue with the application code not seeing a SocketTimeoutException. That is fixed in:
- master for 9.0.30 onwards
- 8.5.x for 8.5.50 onwards

I also confirmed that both onError() and onClose() were called both before and after this change.