Bug 57546 - Memory Leak in SecureNioChannel
Summary: Memory Leak in SecureNioChannel
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: WebSocket (show other bugs)
Version: 8.0.33
Hardware: PC Linux
: P1 major (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 57750 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-02-06 22:44 UTC by Paul Gaylie
Modified: 2016-04-03 18:37 UTC (History)
2 users (show)



Attachments
Too many instances in hashmap (317.65 KB, image/jpeg)
2015-02-06 22:44 UTC, Paul Gaylie
Details
Screen Shot from hprof (254.58 KB, image/jpeg)
2015-06-24 17:49 UTC, Paul Gaylie
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Gaylie 2015-02-06 22:44:04 UTC
Created attachment 32440 [details]
Too many instances in  hashmap

We are connecting mobile devices via websocket to Tomcat 8.0.18 server. The physical mobile connection can die at random times. When the connection goes down the instance of org.apache.coyote.http11.upgrade.NioProcessor is not removed from the connections HashMap in class org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler. There are 4,181 entries in the connections HashMap. However, there are in fact 1,400 real socket connections. See DumpHprof1.jpg. This displays a drill-down into one of the HashMap entries. The instance of SafeCommConnection is my ServerEndpoint POJO. When onClose is called in my ServerEndpoint I set a flag called isClosing to true. You can see on the right side that it is set to true so I have removed it from my app but the instance remains in memory held by the NioProtocol's connections HashMap. It seems that http11NioProtocol.release method is not getting called. Also, here is a stack trace in the stderr log file. It gets hundreds of these:

06-Feb-2015 14:12:05.255 INFO [http-nio-443-exec-385] org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doClose Failed to close the ServletOutputStream connection cleanly
 java.io.IOException: An existing connection was forcibly closed by the remote host
	at sun.nio.ch.SocketDispatcher.write0(Native Method)
	at sun.nio.ch.SocketDispatcher.write(Unknown Source)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
	at sun.nio.ch.IOUtil.write(Unknown Source)
	at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
	at org.apache.tomcat.util.net.SecureNioChannel.flush(SecureNioChannel.java:134)
	at org.apache.tomcat.util.net.SecureNioChannel.close(SecureNioChannel.java:370)
	at org.apache.tomcat.util.net.SecureNioChannel.close(SecureNioChannel.java:398)
	at org.apache.coyote.http11.upgrade.NioServletOutputStream.doClose(NioServletOutputStream.java:138)
	at org.apache.coyote.http11.upgrade.AbstractServletOutputStream.close(AbstractServletOutputStream.java:137)
	at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doClose(WsRemoteEndpointImplServer.java:143)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.close(WsRemoteEndpointImplBase.java:638)
	at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:118)
	at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:450)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:338)
	at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessageBlock(WsRemoteEndpointImplBase.java:270)
	at org.apache.tomcat.websocket.WsSession.sendCloseMessage(WsSession.java:563)
	at org.apache.tomcat.websocket.WsSession.onClose(WsSession.java:503)
	at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.close(WsHttpUpgradeHandler.java:183)
	at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.access$200(WsHttpUpgradeHandler.java:48)
	at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsReadListener.onDataAvailable(WsHttpUpgradeHandler.java:214)
	at org.apache.coyote.http11.upgrade.AbstractServletInputStream.onDataAvailable(AbstractServletInputStream.java:194)
	at org.apache.coyote.http11.upgrade.AbstractProcessor.upgradeDispatch(AbstractProcessor.java:96)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
	at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1558)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1515)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Unknown Source)
Comment 1 Mark Thomas 2015-02-10 00:20:02 UTC
release() isn't expected to be called for an HTTP upgrade connection so that method getting called is not part of the problem.

I've tried to reproduce this with 8.0.x trunk but I can't. I've been trying the various WebSocket example applications with and without https and killing the browser to simulate dropping the connection.

A simple test case that demonstrates this issue would be a big help.

I do wonder if some of the connector fixes made since 8.0.18 may have fixed this. Are you able to build the latest 8.0.x code from svn and test that? Alternatively I could provide you with a test build.
Comment 2 Paul Gaylie 2015-02-10 01:28:48 UTC
Mark,

Unfortunately, killing a browser is not a good simulation because the browser is smart enough to properly close the websocket when it gets killed.

Here is what to do: connect using a mobile browser on your phone (like Chrome on Android) and then shutoff the mobile data connection on your phone. Then you should see the doClose() operation happening on the server side.

Thanks for looking into this.
Comment 3 Mark Thomas 2015-02-10 08:47:53 UTC
(In reply to Paul Gaylie from comment #2)
> Mark,
> 
> Unfortunately, killing a browser is not a good simulation because the
> browser is smart enough to properly close the websocket when it gets killed.

With kill -9 the browser nver gets the chance. The close isn't clean - a bunch of IOExceptions are triggered.

> Here is what to do: connect using a mobile browser on your phone (like
> Chrome on Android) and then shutoff the mobile data connection on your
> phone. Then you should see the doClose() operation happening on the server
> side.

I'll try but right now my money is on this already being fixed in the post 8.0.18 changes.
Comment 4 Mark Thomas 2015-02-10 14:59:55 UTC
Interesting. I tried testing with a separate machine so I could pull out the network cable. That has triggered a problem that looks like the one described here. I'm looking into this at the moment.

Musing on why I didn't see this when I kill -9'd the browser, I suspect that the client OS was doing some clean-up since it was still connected.
Comment 5 Mark Thomas 2015-02-10 15:54:42 UTC
Thanks for the report and the pointers to reproduce this.

I have fixed the problem in trunk, 8.0.x (for 8.0.19 onwards) and 7.0.x (for 7.0.60 onwards).
Comment 6 Konstantin Kolinko 2015-02-10 22:49:50 UTC
Only the first part of the fix was applied to Tomcat 7 (r1658738).

The second part was applied to Tomcat 8 (r1658794), but has not been ported to Tomcat 7 yet. Thus I am REOPENing this issue, so that it is not forgotten.


The issues in the first part:
- as noted by Remy (in "Re: r1658734"):

> Good find, but what happens if onDataAvailable or onWritePossible throw a
runtime exception (like a NPE), since this is user code. It could also leak
then ? Shouldn't the code catch everything, also call onError on the
listener and close ?

- as noted by Chris (in "Re: r1658737"):

> Is it worth predicating the log call with if(getLog().isDebugEnabled)?
Comment 7 Mark Thomas 2015-02-10 23:05:04 UTC
This wasn't forgotten it deliberately wasn't back-ported because this isn't user code in Tomcat 7. It is a Tomcat internal API only used by WebSockets and we know exactly which exceptions are thrown. Therefore the additional protection added in trunk and 8.0.x is not necessary in 7.0.x.

Regarding the debug, the look-up in the StringManager is likely to be much faster than the throwing of the Exception that triggers this look-up. Performance is pretty much a lost cause once an Exception has been thrown.
Comment 8 Paul Gaylie 2015-02-11 02:29:51 UTC
Hi Mark,

Great that you found the bug.

Exactly what code needed to be updated? Just curious where the proper fix needed to be applied - was driving me nuts.
Comment 9 Mark Thomas 2015-02-11 09:23:05 UTC
See r1658734 and r1658790.
Comment 10 Mark Thomas 2015-03-24 22:27:32 UTC
*** Bug 57750 has been marked as a duplicate of this bug. ***
Comment 11 Paul Gaylie 2015-05-14 20:32:21 UTC
Mark,

It looks like you fixed this is version 8.0.19 but this version is not released. Any time frame when this fix will be available?

Thanks, Paul
Comment 12 Chuck Caldarale 2015-05-14 20:49:03 UTC
(In reply to Paul Gaylie from comment #11)
> It looks like you fixed this is version 8.0.19 but this version is not
> released. Any time frame when this fix will be available?

It's available now - download the current 8.0.22 version.  Changes are cumulative, so the fix has been in all releases beyond 8.0.19.
Comment 13 Paul Gaylie 2015-06-24 17:49:24 UTC
Created attachment 32854 [details]
Screen Shot from hprof

Guys,

I installed Tomcat 8.0.22 on our production server. Unfortunately, the memory leak is still there. If you look at the attachment TomcatLeak2.jpg you will see 2,968 items in the connections HashMap. There should be 1,718. I am not sure what to suggest other than re-running the connection crash test. I can send you the hprof file but it is over 1 GB.
Comment 14 Mark Thomas 2015-06-24 18:37:03 UTC
(In reply to Paul Gaylie from comment #13)
> Created attachment 32854 [details]
> Screen Shot from hprof
> 
> Guys,

Please consider using a more gender neutral form of address when addressing the Tomcat committers.

> I installed Tomcat 8.0.22 on our production server. Unfortunately, the
> memory leak is still there. If you look at the attachment TomcatLeak2.jpg
> you will see 2,968 items in the connections HashMap. There should be 1,718.
> I am not sure what to suggest other than re-running the connection crash
> test. I can send you the hprof file but it is over 1 GB.

It looks like there are multiple issues. Please open a new bug. As with any bug, you'll need to provide the steps to reproduce the issue on a clean install of the latest stable release of the version (8.0.x in this case) where you see the issue.
Comment 15 Jan 2016-02-29 15:23:03 UTC
Hi,

I'm afraid we're facing the same problem with Tomcat 7.0.67 and 8.0.32. Paul, can you confirm that this bug is fixed now?
Comment 16 chinanzio 2016-04-03 15:53:00 UTC
Hello, i am facing a similar problem ...
Tomcat 8.0.23 and 8.0.33
Linux

Using jconsole i can see how memory grows till virtual machine crashes, i have to restart tomcat every 5 or 6 days

I am usng a tool called Memory ANalyzer for eclipse http://www.eclipse.org/mat/ and, after a memory dump, it says the following

One instance of "org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler" loaded by "java.net.URLClassLoader @ 0xe5613ed8" occupies 101,526,264 (48.71%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentHashMap$Node[]" loaded by "<system class loader>".

Keywords
java.util.concurrent.ConcurrentHashMap$Node[]
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler
java.net.URLClassLoader @ 0xe5613ed8


in a few days it is going to occupy 99.99% of the memory.

Do you need more info ? ... i don't know how to reproduce it ...
I was using tomcat 8.0.12 a few month ago, i don't remember if it was also happening ... i am going to try it out next restart ...

Cheers
Comment 17 Mark Thomas 2016-04-03 18:37:23 UTC
(In reply to chinanzio from comment #16)

This bug has been fixed.

There is nothing in comment #16 that suggests that you are seeing the same bug. To repeat earlier advice:

Please open a new bug. As with any bug, you'll need to provide the steps to reproduce the issue on a clean install of the latest stable release of the version (8.0.x in this case) where you see the issue.