Bug 63022 - Nio2Endpoint(SocketProcessorBase) for async request doesn't decrement LimitLatch while handling socket that is already closed
Summary: Nio2Endpoint(SocketProcessorBase) for async request doesn't decrement LimitLa...
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 8.5.34
Hardware: PC Mac OS X 10.1
: P2 normal (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2018-12-20 01:01 UTC by Pankaj
Modified: 2018-12-22 20:37 UTC (History)
0 users

Web application(SimpleWebApp) to recreate bug (510.33 KB, application/zip)
2018-12-20 01:10 UTC, Pankaj
NIO2 close patch (6.14 KB, patch)
2018-12-20 13:04 UTC, Remy Maucherat
Details | Diff
Web application(SimpleWebApp) to recreate bug (150.23 KB, application/zip)
2018-12-21 23:41 UTC, Pankaj
Web app to increment count (150.23 KB, application/zip)
2018-12-21 23:44 UTC, Pankaj

Note You need to log in before you can comment on or make changes to this bug.
Description Pankaj 2018-12-20 01:01:41 UTC
This bug has similar symptoms to Bug 61918 (Nio2Endpoint may miscount connectionLimitLatch while closing socket that is already closed) however other bug only addresses the case when request is served synchronously. In case of asynchronous request, connectionCount(LimitLatch) is still not getting decremented when socket is already closed. 

We are not sure what is causing socket to close independently in our network however issue can be artificially reproduced by forcefully closing the socket during "async" request(these steps are similar to steps to reproduce mentioned in other Bug 61918). We used jconsole to monitor connectionCount and Yourkit for looking into LimitLatch. In both avenues, count never got decremented.

Please see the attachment for:
1. Web application(SimpleWebApp)to recreate bug based on tomcat-embed-core-8.5.37. Example request(/async_long?time=2)

2. Jconsole screenshot for connectionCount.
3. Yourkit screenshot for LimitLatch count.
Comment 1 Pankaj 2018-12-20 01:10:50 UTC
Created attachment 36338 [details]
Web application(SimpleWebApp) to recreate bug
Comment 2 Remy Maucherat 2018-12-20 09:22:27 UTC
During async there's no IO operation going on, so no way to detect a socket disconnect. As a result, nothing will actually get closed magically. You need to perform an IO operation to find it out, otherwise you can remain in async mode forever.

So the test closes the socket, and then async completes. The problem is that the wrapper isClosed() now returns true, which will prevent the async complete processing (SocketProcessorBase.run calls isClosed()). This is not normal, but closing using introspection is not either. So maybe there's a "problem" here, but the test is not a representation of a bug which needs to be fixed. Also maybe SocketWrapper.isClosed() shouldn't check the socket state and this could solve this situation along with more legitimate ones, but this needs to be carefully reviewed.

Set need info to investigate and review some more, just in case.
Comment 3 Remy Maucherat 2018-12-20 13:04:59 UTC
Created attachment 36340 [details]
NIO2 close patch

Simplify the socket close for NIO2, which would probably fix bad side effects like seen here. This builds on previous close refactorings and looking at all the callers this should clearly should work fine now. Please review.
Comment 4 Pankaj 2018-12-20 21:55:32 UTC
Thanks for the quick reply. I have applied the patch on tomcat 9.0.14, it resolves the bug and connectionCount does get decremented. Although I was not able to apply the same patch against 8.5.34 since Nio2Endpoint class has changed between releases.

Changing bug state to new, please let me know if you need any other information.
Comment 5 Remy Maucherat 2018-12-21 13:01:56 UTC
I prefer needinfo. I understand your testcase but it is not legitimate as is.

NIO also checks the socket open state and I verified it does the same thing as NIO2 (you made it sound like it is a NIO2 issue, it is not), unlike APR (isClosed uses only a flag). Since APR only checks the internal wrapper closed flag, it should be ok to only do that for NIO and NIO2 as well.
Comment 6 Pankaj 2018-12-21 23:41:25 UTC
Attached is an updated test case (AsyncConnectionCount.zip) where async request first writes xml data to stream before closing the socket. Connection count is still not getting decremented. 

We are actually hitting an issue in production on couple of nodes where limit latch count value is equal to max connection count(10000) and it never gets decremented. It strated happening after tomcat upgrade to 8.5.34 and switching from BIO to NIO connector.
Comment 7 Pankaj 2018-12-21 23:41:58 UTC
Created attachment 36343 [details]
Web application(SimpleWebApp) to recreate bug
Comment 8 Pankaj 2018-12-21 23:44:35 UTC
Created attachment 36344 [details]
Web app to increment count
Comment 9 Remy Maucherat 2018-12-22 12:13:45 UTC
This has no reason to fail more than the previous test, and it works for me. I think you are trying too hard to find problems here, and you should investigate first on the Tomcat user mailing list.

The close has been modified and will be in Tomcat 9.0.15 and 8.5.38.
Comment 10 Pankaj 2018-12-22 18:18:50 UTC
I think you misunderstood my last comment. The test case was updated just to make it more relevant to our use case however patch provided earlier does FIX this issue.
Comment 11 Remy Maucherat 2018-12-22 20:37:47 UTC
Ok, so this explains what I saw. The statement was way too convoluted for me, so I indeed misunderstood it a bit.