Bug 56026 - RemoteEndpoint.Async#sendText(String, SendHandler) not thread safe
Summary: RemoteEndpoint.Async#sendText(String, SendHandler) not thread safe
Status: RESOLVED INVALID
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: WebSocket (show other bugs)
Version: 8.0.x-trunk
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-17 08:41 UTC by Simone Bordet
Modified: 2017-11-08 17:03 UTC (History)
4 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Simone Bordet 2014-01-17 08:41:13 UTC
If an application calls RemoteEndpoint.Async#sendText(String, SendHandler) concurrently from 2 (or more) threads, it gets the stack trace below.

Applications should not worry about synchronizing or queuing concurrent calls to Async.sendText() (or similar methods).

A simple chat application with 3 users chatting will result in concurrent calls to Async.sendText() when A and B both send a message that must be delivered to C, so the use case is pretty common.

Having application worry about synchronizing or queuing concurrent calls to Async.sendText() will put a huge burden to applications that perform concurrent writes, and I believe this was not the intent of JSR 356 (although the specification does not say anything about concurrency guarantees).

Thanks !

~~~~~~~~

2014-01-17 09:28:30,732 pool-1-thread-12 [ WARN][WebSocketTransport] 
java.lang.IllegalStateException: The remote endpoint was in state [TEXT_FULL_WRITING] which is an invalid state for called method
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.checkState(WsRemoteEndpointImplBase.java:1049)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.textStart(WsRemoteEndpointImplBase.java:1012)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendStringByCompletion(WsRemoteEndpointImplBase.java:174)
	at org.apache.tomcat.websocket.WsRemoteEndpointAsync.sendText(WsRemoteEndpointAsync.java:47)
Comment 1 Mark Thomas 2014-01-17 12:48:02 UTC
I have some sympathy with this view. Given that the Javadoc for RemoteEndpoint.Basic explicitly states that concurrent attempts to send messages are not allowed and the Javadoc for RemoteEndpoint.Async doesn't, you'd be forgiven for thinking that meant that RemoteEndpoint.Async permitted concurrent attempts to send messages.

Unfortunately, the (not very well documented) intention of the WebSocket EG was not to allow concurrent attempts to send messages. [1]

I did take a quick look at adding an option to relax (i.e. ignoring) this but the change required to do this with the current Tomcat code would be far from trivial.

I'm marking this as INVALID since the intention of the EG was to not allow this but WONTFIX is almost as appropriate if this is viewed as an enhancement request.

As an aside, one reason not to implement this particular enhancement is that apps that depended on it would not be portable between containers.

[1] https://java.net/projects/websocket-spec/lists/jsr356-experts/archive/2013-02/message/1
Comment 2 Simone Bordet 2014-01-17 13:18:53 UTC
Mark, thanks for your comments.

I am frankly surprised that an expert group states that:

"[async writes] should not be permitted and an IllegalStateException be thrown.
My reasoning for this is that was allowed it would open up all sorts of
buffering requirements for the container that could get tricky to manage."

Now those sorts of tricky buffering requirements must be managed by the application, with almost 100% chance that average joe websocket get them wrong.
For a simple chat application, an application developer must manage the buffering and possibly stack overflows due to callback invocations.

Containers should solve those problems on behalf of the application writers, otherwise one could say that transaction management or ORM mapping are too tricky to manage and better left to JEE application writers.

I hope you can change your mind on this issue :)

I see this problem as a lack of detail in the specification rather than a real intent to put burden onto application developer's shoulders.

Thanks!
Comment 3 Simone Bordet 2014-01-17 13:28:41 UTC
Remy, can I have a comment from you too, since you re-closed this issue as invalid ?

Thanks!
Comment 4 Remy Maucherat 2014-01-17 13:58:58 UTC
Feedback and comments on the specification should be sent to the Websockets EG.
Comment 5 Konstantin Preißer 2014-01-17 14:08:44 UTC
Hi Simone,

(In reply to Simone Bordet from comment #2)
> Mark, thanks for your comments.
> 
> I am frankly surprised that an expert group states that:
> 
> "[async writes] should not be permitted and an IllegalStateException be
> thrown.
> My reasoning for this is that was allowed it would open up all sorts of
> buffering requirements for the container that could get tricky to manage."
> 
> Now those sorts of tricky buffering requirements must be managed by the
> application, with almost 100% chance that average joe websocket get them
> wrong.
> For a simple chat application, an application developer must manage the
> buffering and possibly stack overflows due to callback invocations.

I'm not a Tomcat developer, but I have written a WebSocket example for Tomcat ("Drawboard") which uses the RemoteEndpoint.Async to send messages asynchronously.

Mark said the Javadoc of Async does not explicitely state that concurrent calls to this method are not allowed, but from a user's point of view, I would expect methods like Async#sendText(String, SendHandler) not to be callable until the previous Async operation has completed (when the SendHandler is invoked).

StackOverflows due to Callbacks should not be possible as the SendHandler method is required to always be called from a different thread than the one that initiated the send - see bug 55715.

Can you go into details with what you mean with "tricky buffering requirements"? For example, at the drawboard, I used a very simple buffering mechanism to ensure that concurrent send operations are serialized to the client, by using a LinkedList that stores messages that have yet to be sent to the client (and a bit synchronization).


> 
> Containers should solve those problems on behalf of the application writers,
> otherwise one could say that transaction management or ORM mapping are too
> tricky to manage and better left to JEE application writers.
> 
> I hope you can change your mind on this issue :)
> 
> I see this problem as a lack of detail in the specification rather than a
> real intent to put burden onto application developer's shoulders.
> 
> Thanks!
Comment 6 Simone Bordet 2014-01-17 14:39:39 UTC
Konstantin, thanks for your comments.

The "tricky buffering requirements" is actually a quote from the expert group.

Given how simply you have implemented these "tricky buffering requirements", it is even more surprising that this burden has been put by the EG onto the shoulders of application developers, while it could have been on the container's.

That is why I was hoping that this choice in Tomcat could have been reconsidered.

After all, there is nothing in the specification that forbids an implementation to do this buffering, and this will - like Mark noted - make applications not portable (which is what happened to me - that's why I reported the bug).

I'd prefer the container do the buffering for application developers, rather than having the application hardcode the buffering because the container implementers can perform optimizations or configure different policies for this buffering, easing up the job for application developers.

Thanks!
Comment 7 Mark Thomas 2014-01-17 15:22:22 UTC
You are equating the complexity of implementing buffering for a single application with known constraints and the complexity of implementing a general buffering solution appropriate for all applications where the constraints are unknown. They are not equivalent.
Comment 8 Sebb 2014-01-17 20:08:05 UTC
(In reply to Mark Thomas from comment #1)
> I have some sympathy with this view. Given that the Javadoc for
> RemoteEndpoint.Basic explicitly states that concurrent attempts to send
> messages are not allowed and the Javadoc for RemoteEndpoint.Async doesn't,
> you'd be forgiven for thinking that meant that RemoteEndpoint.Async
> permitted concurrent attempts to send messages.
> 
> Unfortunately, the (not very well documented) intention of the WebSocket EG
> was not to allow concurrent attempts to send messages. [1]

Would it be worth updating the Javadoc to make this restriction explicit?
Comment 9 Simone Bordet 2014-01-17 20:44:23 UTC
(In reply to Sebb from comment #8)
> Would it be worth updating the Javadoc to make this restriction explicit?

Would not that require a maintenance review of the specification via the JCP, since Javadocs are considered so ?

Also, other implementations may not have the need to have this restriction.
Comment 10 Mark Thornton 2014-04-23 14:24:59 UTC
It may be worth noting that the Tyrus 1.5 implementation does appear to be thread safe. Even arbitrary mixtures of Basic and Async sends work as one might expect: all messages are delivered without error Jetty, on the other hand, behaves in the same way as Tomcat.
Comment 11 Mark Thornton 2014-04-23 14:38:03 UTC
Some emails amongst the expert group doesn't constitute information of which users could be expected to be aware. Is this really the only place that the intent is described?
Comment 12 balusc 2016-06-10 14:33:58 UTC
Just in case, I investigated this issue for JSF <f:websocket> and OmniFaces <o:socket> too. This threadsafety problem doesn't occur on Jetty, Undertow (WildFly) nor Tyrus (GlassFish/Payara). It occurs on Tomcat only.

Work around was to synchronize session:

    synchronized (session) {
        results.add(session.getAsyncRemote().sendText(text));
    }

Frankly, this kind of contradicts the javax.websocket.Session javadoc http://docs.oracle.com/javaee/7/api/javax/websocket/Session.html:

> Session objects may be called by multiple threads. Implementations must ensure the integrity of the mutable properties of the session under such circumstances.
Comment 13 Alex 2016-06-21 17:10:35 UTC
I've ran into this issue too and I must say that although tomcat's current behavior doesn't directly contradict the standard, it's very unexpected and goes against simple common sense. This is an async call, and Session must be thread safe according to the spec. I can accept that there could be some uncertainty if one mixes sync and async sends, especially in partial messages, but if all calls are async this *really* should never happen. 

What's worse, as others have mentioned, this goes against what other implementations are doing without as much as documenting this behavior. Neither "Session"'s nor "RemoteEndpoint.Async"'s Javadoc have anything to say about thread safety or specifically the lack thereof.

Does the race condition occur only during the call to sendText()/sendBinary()? i.e. Would synchronizing such calls on the session object (as balusc suggested) solve 100% of these cases?
Comment 14 Jonathan Leech 2017-06-20 16:02:52 UTC
synchronizing on session as suggested doesn't appear to be sufficient to prevent the IllegalStateException, although it may help reduce the frequency. How is an API that can't be successfully used not broken?
Comment 15 Mark Thomas 2017-06-21 21:56:38 UTC
You'll have to take up your concerns over the API with the WebSocket EG. Although given there has been no WebSocket activity at all for Java EE 8 that might be a slow process.

Tomcat's starting position is always going to be implementing the spec. Where there are ambiguities then the following priority order of sources is usually followed:
- what is in the spec document
- what is in the Javadoc
- what was discussed by the EG

No, this isn't ideal but it is the best we have.

This isn't the first instance of a Java EE spec doing something that the user community thinks is just plain wrong. The coercion rules for EL comes to mind as an obvious example.

What we have done in the past is to add on option to enable the desired behaviour. Depending on circumstances the option may be a system property (applies to all web apps) or per web app. The default behaviour also varies.

It is worth noting that in the thread that is no longer accessible from the java.net (and I can't find a new location) I also mentioned that some refactoring had made implementing this a lot easier in Tomcat. What hasn't changed is that a number of questions remain open such as how many messages to accept before starting to reject them?

If there is interest, I suggest re-opening this issue as an enhancement request to  optionally enable the desired behaviour. I'll note that enhancement requests with patches tend to get looked faster than those without.