Bug 57340

Summary: NioConnector caches get corrupted on concurrent comet close
Product: Tomcat 7 Reporter: Nikita Zyulyaev <zyulyaev>
Component: ConnectorsAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: major CC: cl, uvi
Priority: P2    
Version: 7.0.47   
Target Milestone: ---   
Hardware: PC   
OS: All   

Description Nikita Zyulyaev 2014-12-11 11:20:44 UTC
Configuration:
Tomcat 7.0.47 NioConnector, nginx 1.6.2, atmosphere 2.2.3.

It happens when nginx and atmosphere close the same comet connection concurrently.

In NioEndpoint.Poller thread(A) the SocketChannel becomes ready for read when nginx closes it. Poller unregisters the channel for read and forks another thread(B) to handle close event. (see NioEndpoint:1239)
Then atmosphere calls close on the connection in thread(C) and Tomcat receives internal action with code ActionCode.COMET_CLOSE and adds the channel to the Poller, which registers it for read again. (see Http11NioProcessor.java:462).
The SocketChannel is still readable in case thread(B) hasn’t invalidated the SelectionKey yet, so Poller in thread(A) initiates the closing process again and forks thread(D).
Thread(B) completes the closing process and puts NioChannel and AttachmentKey into the corresponding caches. 
Then Thread(D) tries to close the channel again and realizes that it has already been closed (see AbstractProtocol.java:564) and puts the same NioChannel and AttachmentKey into caches. 
Caches become corrupted because they contain 2 references to the same object. Then any 2 subsequent requests may get the same NioChannel and AttachmentKey and some crazy stuff may happen (mixed up responses, etc).
Comment 1 cl 2014-12-12 16:22:18 UTC
We can confirm this and were able to reproduce it as well.
A fix would be highly appreciated because we also run into cases where actual responses were mixed up.
We consider this to be very serious in certain environments.
Comment 2 Christopher Schultz 2014-12-12 17:14:11 UTC
Please make sure this is reproducible on Tomcat 7.0.57, the most recent release of the 7.0.x branch. What you describe probably hasn't changed since 7.0.47, but please double-check.
Comment 3 Nikita Zyulyaev 2014-12-12 17:31:05 UTC
(In reply to Christopher Schultz from comment #2)
> Please make sure this is reproducible on Tomcat 7.0.57, the most recent
> release of the 7.0.x branch. What you describe probably hasn't changed since
> 7.0.47, but please double-check.

Yes, the issue can be reproduced on either 7.0.47, 7.0.55 or 7.0.57 versions of Tomcat.
Comment 4 Nikita Zyulyaev 2014-12-12 17:35:43 UTC
(In reply to cl from comment #1)
> We can confirm this and were able to reproduce it as well.
> A fix would be highly appreciated because we also run into cases where
> actual responses were mixed up.
> We consider this to be very serious in certain environments.

If you are interested in a quick workaround, you can disable bufferPool and keyCache by adding socket.bufferPool="0" and socket.keyCache="0" to the Connector tag in your server.xml. This will fix the symptoms.
Comment 5 Mark Thomas 2014-12-17 18:20:33 UTC
I have committed a fix for this to trunk (Tomcat 9.0.x). Are you able to check out trunk from svn, build it and comfirm whether or not this fixes the issue for you?
Comment 6 Nikita Zyulyaev 2014-12-18 13:21:00 UTC
(In reply to Mark Thomas from comment #5)
> I have committed a fix for this to trunk (Tomcat 9.0.x). Are you able to
> check out trunk from svn, build it and comfirm whether or not this fixes the
> issue for you?

I'm afraid we are not able to run our application on Tomcat 9.0.x. Please port your changes to 7.0.x branch.
Comment 7 Mark Thomas 2014-12-18 13:22:25 UTC
Are you able to build 7.0.x from svn and test against that?
Comment 8 Nikita Zyulyaev 2014-12-18 13:24:39 UTC
(In reply to Mark Thomas from comment #7)
> Are you able to build 7.0.x from svn and test against that?

Of course.
Comment 9 Mark Thomas 2014-12-18 13:26:11 UTC
Great - not all users are able/willing to do that. I'll get the back-ports done now.
Comment 10 Mark Thomas 2014-12-18 14:33:08 UTC
Fix applied to 8.0.x for 8.0.16 onwards and to 7.0.x for 7.0.58 onwards.

You should be good to build 7.0.x from svn trunk and test now.
Comment 11 Nikita Zyulyaev 2014-12-18 16:04:31 UTC
(In reply to Mark Thomas from comment #10)
> Fix applied to 8.0.x for 8.0.16 onwards and to 7.0.x for 7.0.58 onwards.
> 
> You should be good to build 7.0.x from svn trunk and test now.

I faced a very odd problem running 7.0.58. org/apache/tomcat/util/http/ValuesEnumerator was not included in the tomcat-coyote.jar during ant build.
But I applied your changes to the 7.0.47 and they worked successfully.
Comment 12 Mark Thomas 2014-12-18 16:36:18 UTC
Not sure what was going on with ValuesEnumerator. Thanks for going the extra mile and working around it.

Great to hear that the proposed fix did indeed work. It will be in the next release.
Comment 13 Konstantin Kolinko 2014-12-18 18:44:40 UTC
(In reply to Nikita Zyulyaev from comment #11)
> (In reply to Mark Thomas from comment #10)
> > Fix applied to 8.0.x for 8.0.16 onwards and to 7.0.x for 7.0.58 onwards.
> > 
> > You should be good to build 7.0.x from svn trunk and test now.
> 
> I faced a very odd problem running 7.0.58.
> org/apache/tomcat/util/http/ValuesEnumerator was not included in the
> tomcat-coyote.jar during ant build.
> But I applied your changes to the 7.0.47 and they worked successfully.

It works for me. I did a clean build and that class was successfully compiled and packed into tomcat-coyote.jar.

I expected to find ValuesEnumerator.java, but that package-accessible class is actually defined in MimeHeaders.java. That code is old: the last change to MimeHeaders.java was 2,5 years ago.
Comment 14 Konstantin Kolinko 2015-10-25 18:40:12 UTC
(In reply to Mark Thomas from comment #10)
> Fix applied to 8.0.x for 8.0.16 onwards and to 7.0.x for 7.0.58 onwards.

Fixed in Tomcat 6 as part of bug 57943 and will be in 6.0.45 onwards. (r1710473)