Bug 64644 - wrong state of WsSession on network outage
Summary: wrong state of WsSession on network outage
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: WebSocket (show other bugs)
Version: 9.0.31
Hardware: PC Linux
: P2 major (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-04 10:47 UTC by Saksham Verma
Modified: 2020-09-07 10:05 UTC (History)
0 users



Attachments
a web project to start a ws server and index.html showing value of active websocket connections (12.73 KB, application/x-zip-compressed)
2020-08-04 10:47 UTC, Saksham Verma
Details
a web project to start a ws server and index.html showing value of active websocket connections (11.91 KB, application/x-zip-compressed)
2020-08-04 15:35 UTC, Saksham Verma
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Saksham Verma 2020-08-04 10:47:22 UTC
Created attachment 37382 [details]
a web project to start a ws server and index.html showing value of active websocket connections

This issue happens on any Linux OS, when we have a remote websocket connection stable and then, network drops.

If the application is writing on that websocket channel, then the lastActiveTime keeps on getting updated even if there is no network. 

The writes are successful for next ~15 minutes because the keep-alive interval for tcp is 15 minutes. 

Thus, from the time network was dropped to next 15 minutes the WsSession is shown as active.

And if the application is not writing anything on that channel, then after the maxIdleTimeout the WsSession.checkExpiration closes the connection. 

But when application is writing it shows wrong state of websocket session. 

Steps to reproduce:

1. Start the attached web project on a linux OS. It sets the maxIdleTimeout to 10 seconds and if any websocket session is created it sends a message every 2 seconds.

2. On any remote system connect a websocket session. I used the "smart websocket client" extension in chrome. 

3. Check that the /WebsocketActiveTimeIssue/index.html shows 1.

4. Drop the network connection between the client and server. I was using a VM so, I disconnected the network from Oracle virtual box.

5. For next 15 minutes that webpage will keep showing 1. 

Root cause:
Tomcat updates the lastActiveTime anytime application writes anything on that channel.

Thus, WsSession.checkExpiration() method never expires it unless the transport layer throws an error which happens only after 15 minutes.

Effect:
In my product, I need to show the devices which are connected to my IoT platform. The devices send the ping every T seconds (T is much less than maxIdleTimeout), thus, if there were any Idle Read events, then I could use that to close the connection and show the device as disconnected.

Suggestion:
Have two types of active times. viz. One for read and one for write:
Throw an IdleRead or IdleWrite event when any of the active times are passed the maxIdle timeout.


I have tried a fix on my system. But I don't know how to raise a PR with that. Though I am on the dev mailing list but got recently added.
Comment 1 Saksham Verma 2020-08-04 15:35:29 UTC
Created attachment 37383 [details]
a web project to start a ws server and index.html showing value of active websocket connections
Comment 2 Saksham Verma 2020-08-04 15:36:39 UTC
(In reply to Saksham Verma from comment #1)
> Created attachment 37383 [details]
> a web project to start a ws server and index.html showing value of active
> websocket connections

Use the latest attachment to reproduce the issue. Ignore the first attachment.
Comment 3 Saksham Verma 2020-08-04 17:55:53 UTC
PR: https://github.com/apache/tomcat/pull/330/files

Could any maintainer please review.
Comment 4 Mark Thomas 2020-08-17 09:09:36 UTC
This issue describes one particular drawback of the max idle timeout feature of WebSocket sessions in the Java WebSocket spec.

I've been thinking about various WebSocket usage patterns and whether there was a way to add a custom option that changed how this was treated. E.g. only applied to reads, only applied to writes, etc.). My conclusion is that to cover all possible use cases you need three idle timeouts:
- maxIdleTimeout (the current value)
- maxIdleReadTimeout
- maxIdleWriteTimeout

This is probably something that needs to be added to the Jakarta WebSocket 2.1 spec. I've created https://github.com/eclipse-ee4j/websocket-api/issues/356 to track this

To address this in current Tomcat versions, I am thinking of a couple of custom properties in the user properties Map associated with the session.
Comment 5 Saksham Verma 2020-08-30 07:40:50 UTC
As per your suggestion(In reply to Mark Thomas from comment #4)
> This issue describes one particular drawback of the max idle timeout feature
> of WebSocket sessions in the Java WebSocket spec.
> 
> I've been thinking about various WebSocket usage patterns and whether there
> was a way to add a custom option that changed how this was treated. E.g.
> only applied to reads, only applied to writes, etc.). My conclusion is that
> to cover all possible use cases you need three idle timeouts:
> - maxIdleTimeout (the current value)
> - maxIdleReadTimeout
> - maxIdleWriteTimeout
> 
> This is probably something that needs to be added to the Jakarta WebSocket
> 2.1 spec. I've created
> https://github.com/eclipse-ee4j/websocket-api/issues/356 to track this
> 
> To address this in current Tomcat versions, I am thinking of a couple of
> custom properties in the user properties Map associated with the session.



As per this suggestion I have raised another PR
https://github.com/apache/tomcat/pull/346
including the supporting tests. 

Could you please review it?

We need to deploy this fix for our product Thingworx as it is impacting many of our customers.
Comment 6 Mark Thomas 2020-09-01 13:06:01 UTC
Fixed in:
- master for 10.0.0-M8 onwards
- 9.0.x for 9.0.38 onwards
- 8.5.x for 8.5.58 onwards
- 7.0.x for 7.0.106 onwards

Thanks for the PR and for adapting it based on feedback.
Comment 7 Saksham Verma 2020-09-07 10:05:28 UTC
(In reply to Mark Thomas from comment #6)
> Fixed in:
> - master for 10.0.0-M8 onwards
> - 9.0.x for 9.0.38 onwards
> - 8.5.x for 8.5.58 onwards
> - 7.0.x for 7.0.106 onwards
> 
> Thanks for the PR and for adapting it based on feedback.

Could you please provide a tentative Date for 9.0.38 release?