Bug 56304

Summary: WebSocket send locks and timeout does not occur after 20 seconds
Product: Tomcat 7 Reporter: Rossen Stoyanchev <rstoyanchev>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Severity: major    
Priority: P2    
Version: 7.0.52   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Extract from thread dump

Description Rossen Stoyanchev 2014-03-23 06:11:07 UTC
Created attachment 31429 [details]
Extract from thread dump

Start a WebSocket server on one computer.
Connect from aonther using a browser.
Starts sending messages from server to client periodically.

Turn off wifi or unplug network cable on client side. Initially Tomcat appears 
to be sending messages but eventually one of the sends hangs indefinitely.

The 20 second timeout as documented on the WebSocket FAQ does not occur and attempts to close the WebSocket session from another thread also locks that thread as well.

If you now turn the wifi back on or plug the network cable, at last the two threads that are stuck are released and the WebSocket handler gets a notification of the session closing.

In the attachment, the first two stack traces are of (1) the thread trying to send and (2) the thread that attempted to close. The 3rd stack trace is of another hung thread.

NOTE that this does not occur on Tomcat 8 where the timout does occur and also attempts to close a session do succeed.

Also I am told that the timeout does occur on Windows but I haven't verified it myself. I am using Linux.
Comment 1 Mark Thomas 2014-03-23 08:48:22 UTC
Are you sure you used the same connector on Tomcat 7 and Tomcat 8? The default connector on Tomcat 7 is BIO whereas it is NIO on Tomcat 8.

When you refer to Windows or Linux is that client side, server side or both?
Comment 2 Rossen Stoyanchev 2014-03-23 14:34:24 UTC
The reference to Windows is for the server.

I am using default server configuration on both Tomcat 7 and 8 so I guess that probably accounts for the different behavior. I can try Tomcat 7 with NIO connector to confirm what the behavior is.

That said is the BLOCKING_SEND_TIMEOUT property not supported with default Tomcat 7 configuration? This is not reflected in the WebSocket FAQ.

Also should an attempt to close the session from a separate thread block as well? It would be difficult then (or rather not possible) for an application not to lose two threads to a client that disconnected abnormally.

If that is indeed the case then perhaps it should probably be recommended prominently in the FAQ to switch to the NIO connector if using Tomcat 7 with WebSockets.
Comment 3 Mark Thomas 2014-03-24 09:55:32 UTC
For the record, I can reproduce this with 8.0.x using:
- Unbuntu 12.04.04 LTS VM as the server (fully patched)
- Latest 8.0.x code
- Oracle 1.7.0_45 64-bit JVM
- Snake WebSocket example
- Firefox on OSX (fully patched) as the client

Simply turning off the WiFi on the OSX client leads to a read thread (http-bio-8080-exec-NN) and a write thread (SnakeTimer) blocking indefinitely. I'm looking into what can be done about this now.
Comment 4 Mark Thomas 2014-03-24 12:01:18 UTC
Add info from the users list that the timeout does occur after approx 15 minutes.

This is - believe it or not - working as designed! See this thread from SO (it is off topic for SO but spot on for this problem)

It would appear that the only way to handle this if you want to continue to use the BIO connector is is to reduce /proc/sys/net/ipv4/tcp_retries2 

I'll add some information about this to the WebSocket docs.
Comment 5 Rossen Stoyanchev 2014-03-24 12:40:21 UTC
Thanks, when you say it's expected to work this way, does it also mean the send timeout cannot be enforced with the BIO connector? 

Can anything meaningful be done if an application can detect a timeout situation? Currently trying to close the session from another thread locks up that thread too.
Comment 6 Mark Thomas 2014-03-24 12:50:20 UTC
There is no Java API available for setting a write timeout.

Closing the socket may or may not be possible - it depends on the locking implemented by the JRE. I've taken a look at the Java side and there is a lock on close but not on write. However, a lock could occur on the same object in the native code.

Given your experience, it looks like tweaking the kernel network parameters are the best (only?) option if you want to run WebSocket + HTTP BIO + Linux.

I have added some appropriate notes to the WebSocket docs for 8.0.x (8.0.5+) and 7.0.x (7.0.53+)
Comment 7 Rossen Stoyanchev 2014-03-24 15:02:11 UTC
I guess that makes the combination of WebSocket and BIO pretty unusable. Good to know in any case. I have to double check the behavior with Servlet 3 async in the same scenario with NIO. 

BTW while you're in the FAQ, it says that the buffer size limit is for incoming messages. However an exception is raised also when trying to send messages larger than 8K. I suspect it is an issue with the FAQ?
Comment 8 Mark Thomas 2014-03-24 16:01:54 UTC
If you have an issue sending more than 8k then that is a separate question for the users mailing list. Tomcat imposes no such limitation as it has no need to buffer then entire message.
Comment 9 Rossen Stoyanchev 2014-03-24 16:37:27 UTC
Okay thanks Mark.