Bug 68884 - Delayed HTTP Traffic Processing After Mass Websocket Disconnect/Reconnect
Summary: Delayed HTTP Traffic Processing After Mass Websocket Disconnect/Reconnect
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: WebSocket (show other bugs)
Version: 9.0.75
Hardware: All Linux
: P2 normal (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-11 20:58 UTC by Matt M
Modified: 2024-05-03 12:48 UTC (History)
2 users (show)



Attachments
Proposed patch (1.86 KB, text/plain)
2024-04-11 21:16 UTC, Matt M
Details
Proposed patch (2.25 KB, patch)
2024-04-12 18:20 UTC, Matt M
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matt M 2024-04-11 20:58:00 UTC
Apache Tomcat Bug Report
Delayed HTTP Traffic Processing After Mass Websocket Disconnect/Reconnect

Description:

A significant delay of 10+ minutes occurs in resuming normal HTTP traffic processing after a mass websocket disconnect/reconnect event. This issue arises when a network interruption or stop-the-world garbage collection event exceeds the maxIdleTimeout (35 seconds), leading to numerous websocket session closures. 

With several thousand websocket sessions closing simultaneously, all available nio2 threads (maxThreads=50) become occupied with the closure process. These threads enter a continuous loop, repeatedly calling Thread.yield while waiting to acquire the WsRemoteEndpointImplBase messagePartInProgress semaphore. This behavior, introduced as part of the fix for BZ66508, allows closing threads to relinquish CPU time while waiting for the send semaphore (up to the default 20-second timeout).

java. base@11.0.21/java.lang.Thread.yield(Native Method)
org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.acquireMessagePartInProgressSemaphore(WsRemoteEndpointImplServer.java:130)
org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase-java: 292)
org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase. java: 256)
org.apache.tomcat.websocket.WsSession.sendCloseMessage(WsSession.java:801)
org.apache.tomcat.websocket.WsSession.onClose(WsSession.java:711)

Observations indicate that on Linux, Thread.yield places the thread at a lower priority in the CPU scheduling queue, resulting in a prolonged series of yield calls until the timeout is reached and a SocketTimeoutException is triggered. HTTP traffic processing remains stalled until all session closures are completed.

We have implemented a temporary solution by introducing a property to limit the time spent in the on-close yield loop. Reducing this value from the default significantly improves recovery time. Additionally, decreasing maxThreads appears to further extend the recovery time, although the exact relationship requires further investigation.

Reproducing the Issue:

The issue, initially identified in a scenario with 50 threads and 5000 maximum websocket connections, can also be reproduced at a smaller scale with varying thread and session counts.

1. Establish several thousand websocket connections that periodically send/receive data to simulate traffic.
2. Induce a JVM pause or network interruption lasting 40 seconds or more.
3. Restore client-side connectivity.
4. Start a timer and attempt to obtain a 200 response from the server.
5. Stop the timer once a successful response is received.

Test Configurations and Results:

5 nio2 threads, 300 websocket connections:

Close Timeout	Recovery Times (seconds)
10s	        218, 300, 159, 168, 312
 5s	        60, 42, 102, 199, 160
 2s	        27, 30, 42, 19, 18
 1s	        13, 15, 15

15 nio2 threads, 300 websocket connections:

Close Timeout	Recovery Time (seconds)
2s	        11, 8, 7, 6, 7, 12

Observations:

The issue was initially observed with Tomcat 9.0.75 (embedded) and remains reproducible with versions up to 9.0.82 (embedded), even with the 9.0.86 fix for reentrant lock on close handling applied. While the 9.0.86 fix resolved a memory leak, it did not alleviate the extended recovery times.

Proposed Solution:

Introducing a separate property specifically for the on-close send timeout would allow for finer-grained control and optimization of session closure behavior, particularly for servers operating with fixed thread pool sizes.

Additional Notes:

While BZ66508 removed the fixed timeout for on-close acquisition, the potential for a 20-second wait during semaphore acquisition persists, leading to prolonged session closure times and increased overhead on the OS scheduler due to the repeated yield calls.

We are investigating the precise relationship between thread count and recovery time and will provide additional data as it becomes available.

We believe that implementing the proposed solution would significantly improve Tomcat's performance under these conditions and provide administrators with greater control over resource utilization during mass websocket disconnect events.
Comment 1 Matt M 2024-04-11 21:16:36 UTC
Created attachment 39658 [details]
Proposed patch
Comment 2 Mark Thomas 2024-04-12 13:37:29 UTC
Patches should be in diff -u form.

The Tomcat team avoids using system properties for configuration unless absolutely necessary (it isn't necessary here).

Given that this requires a network interruption or stop the world GC of > 35 seconds, I'd argue that if you are dealing with network drop-outs or GC pauses of that length you have much bigger problems than this issue. Not that we won't try and fix this issue but it may affect how we address it.
Comment 3 Matt M 2024-04-12 18:20:32 UTC
Created attachment 39664 [details]
Proposed patch

Updated format
Diff vs 9.0.86
Comment 4 Matt M 2024-04-12 18:37:05 UTC
Thank you for the feedback and completely agree that there are other factors involved in creating the conditions for this issue to occur.

The intent behind the proposal is to make those scenarios more recoverable when they do occur and to solicit feedback on if there may be a better way to accomplish this.
Comment 5 Christopher Schultz 2024-04-15 15:56:23 UTC
I'm ignorant of the threading details of Websocket, so maybe this is a stupid question: Could a single thread be used to perform these kinds of cleanups, instead of an army of threads? That would let a single thread trigger many session-shutdowns and only yield once for each of them.
Comment 6 Matt M 2024-05-02 17:02:48 UTC
(In reply to Christopher Schultz from comment #5)
> I'm ignorant of the threading details of Websocket, so maybe this is a
> stupid question: Could a single thread be used to perform these kinds of
> cleanups, instead of an army of threads? That would let a single thread
> trigger many session-shutdowns and only yield once for each of them.

My understanding is that there is a single write semaphore per remote endpoint socket so the threads trying to write their close messages are in contention for that and yield rather than block waiting in tryAcquire which was the original behavior pre bz66508.

The spin count in yield can run into the thousands even for a relatively short expiry period which puts additional pressure on the OS level thread scheduler.  More than one http thread can be in this spin wait.

Given that its waiting for a write semaphore, I am not sure if a dedicated close  thread would help - I think a similar approach may have been used a while back in jetty.  That write semaphore is used for any write, not just closures.

Is this something that can be looked into for 9.0.89?
Comment 7 Mark Thomas 2024-05-03 12:48:39 UTC
Fixed by reducing the timeout for non-standard WebSocket closes.

Fixed in:
- 11.0.x for 11.0.0-M20 onwards
- 10.1.x for 10.1.24 onwards
-  9.0.x for  9.0.89 onwards