Bug 28727 - CLOSE_WAIT connections draw 100 % cpu
Summary: CLOSE_WAIT connections draw 100 % cpu
Status: RESOLVED INVALID
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 5.0.19
Hardware: PC Solaris
: P3 normal with 2 votes (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-05-01 18:49 UTC by Thomas Strasser
Modified: 2008-03-01 10:23 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Strasser 2004-05-01 18:49:25 UTC
We use Tomcat 5.018/5.0.19 in a mission critical application. Every second or 
third try specific connections fail. Tomcat then seems to be unable to close 
the connections, although the client already has terminated its session (and 
shutdown his computer). Within Tomcat the connections are left 
as "CLOSE_WAIT". (=> netstat -a)

As soon as one connection shows up as "CLOSE_WAIT", the connection-thread 
begins to loop (seems as if select is no more blocking) and starts to eat up 
cpu power. When two or three threads are in the state "CLOSE_WAIT", Tomcat 
performance slows down until the non-blocking "CLOSE_WAIT"-threads are 
consuming everything.

We have to shutdown Tomcat then.

What is happening here? Is there a workaround available? What about the CLOSE-
WAIT-connections (they are all issued via mobile clients via GPRS)

We tried both Tomcat 5.0.19 and 5.0.18. Same situation.


Environment: 
Windows 2000/XP, Service Packs applied; Apache 2.0.47, Mode_JK2, JDK1.4.2_03 
and JDK 1.4.2_04 (tried both, both SUN)
Comment 1 Remy Maucherat 2004-05-07 10:54:42 UTC
I can't reproduce this, and I have yet to experience one of your "non-blocking
CLOSE_WAIT-threads". What is exactly "looping", and do you have any idea why ?
Is it the Apache <-> Tomcat link which has a problem ?
Given that we can't really look into it, you'll have to provide more details if
you want a fix.
Comment 2 Neal Ensor 2004-08-19 01:45:50 UTC
I'm experiencing something similar I believe using Tomcat 5.0.27 on an
application server that's proxied over from another (Apache) web front-end.  The
application runs fine for a time, but quickly CLOSE_WAIT connections to the
Apache server start building up and the TC5 server becomes completely
non-responsive.  truss-ing the Tomcat process reveals some sort of looping activity:

/88:    lwp_mutex_lock(0x000EC290)      (sleeping...)
/111:   lwp_mutex_lock(0x000EC290)      (sleeping...)
/65:    lwp_cond_wait(0x0078EE88, 0x0078EE70, 0x00000000) (sleeping...)
/84:    lwp_mutex_lock(0x000EC290)      (sleeping...)
/9:     lwp_cond_wait(0x0002D020, 0x0002D008, 0x00000000) (sleeping...)
/141:   lwp_mutex_lock(0x000EC290)      (sleeping...)
/25:    lwp_mutex_lock(0x000EC290)      (sleeping...)
/118:   lwp_mutex_lock(0x000EC290)      (sleeping...)
/60:    lwp_cond_wait(0x00D4DDF0, 0x00D4DDD8, 0x00000000) (sleeping...)
/42:    lwp_cond_wait(0x003FF4A8, 0x003FF490, 0xEE9FF7F0) (sleeping...)
/68:    lwp_cond_wait(0x00791258, 0x00791240, 0xECFFF7F0) (sleeping...)
/61:    lwp_cond_wait(0x00D4E4A0, 0x00D4E488, 0x00000000) (sleeping...)
/158:   lwp_mutex_lock(0x000EC290)      (sleeping...)
/107:   lwp_mutex_lock(0x000EC290)      (sleeping...)
/22:    lwp_mutex_lock(0x000EC290)      (sleeping...)
/112:   lwp_mutex_lock(0x000EC290)      (sleeping...)
/71:    lwp_mutex_lock(0x000EC290)      (sleeping...)
/183:   lwp_mutex_lock(0x000EC290)      (sleeping...)
/54:    lwp_cond_wait(0x0045D220, 0x0045D208, 0x00000000) (sleeping...)
/16:    lwp_mutex_lock(0x000EC290)      (sleeping...)
/144:   lwp_mutex_lock(0x000EC290)      (sleeping...)
/114:   lwp_mutex_lock(0x000EC290)      (sleeping...)
/23:    lwp_mutex_lock(0x000EC290)      (sleeping...)

followed by many lines of :
/10:    poll(0xF98FFD58, 0, 50)                         = 0
/2:     lwp_cond_wait(0x0002CB80, 0x0002CB68, 0xFC77FD30) Err#62 ETIME

During this time, Tomcat is apparently sleeping, and completely non-responsive.
A restart results in the server responding again for a short time, then going
back into the same state.

Apache 1.3.27 on web server, using ProxyPass to the application server, Solaris
9 OS, JVM 1.4.2_04.  Tomcat version 5.0.27. Any pointers would be appreciated.
Comment 3 Yoav Shapira 2004-08-30 18:59:25 UTC
Does it happen with Tomcat-standalone, i.e. no Apache or connector in front?
Comment 4 Yoav Shapira 2004-09-15 18:18:21 UTC
Can you also please try Tomcat 5.0.28 and the latest connector binary (1.2.6 
for mod_jk)?

I'm downgrading this from a Blocker severity to normal, as no one else has 
complaind, and we haven't been able to reproduce it.
Comment 5 Domenico Aquilino 2004-09-23 17:45:41 UTC
I can confirm this bug on jakarta-tomcat-5.0.19-embed on Solaris 2.8

Suddenly, CLOSE_WAIT connections start to grow and the whole application
which embeds tomcat becomes unresponsive.
I understand it is hard to reproduce this bug, but this is a blocker for us.
Of course, I can help checking new versions where this issue has been addressed.

Platform:
SunOS e4500 5.8 Generic_108528-16 sun4u sparc SUNW,Ultra-Enterprise
j2re1.4.2_02
Comment 6 Yoav Shapira 2004-09-23 17:47:10 UTC
Well, then please try newer versions than 5.0.19 ;)
Comment 7 Domenico Aquilino 2004-09-24 10:41:15 UTC
I'm going to switch to Tomcat 5.0.28 embedded.
The bug usually shows up in 2/3 days, depending on users activity.
Comment 8 Yoav Shapira 2004-09-30 19:12:04 UTC
Please keep us posted ;)  I don't want to leave the issue open forever without 
us being able to reproduce it.
Comment 9 Remy Maucherat 2004-09-30 20:31:42 UTC
Tomcat calls close on all the sockets, in a very obvious way. After that, if the
socket isn't properly closed, then it's a bug in either the VM or the network stack.
Comment 10 Domenico Aquilino 2004-10-06 16:02:05 UTC
I can confirm this bug on jakarta-tomcat-5.0.28-embed on Solaris 2.8


Here are my suggestions to reproduce it:

- start Tomcat on server S

- monitor the status of connections on server S
while(1)
netstat -a|grep CLOSE_WAIT|wc -l
sleep 5
end

- establish more than 500 concurrent HTTP connections from client C
(We used JMeter to run a work job with 500 threads, 5 requests each)

- DETACH THE NETWORK CABLE on client C
when you are sure the server is managing at least 500 connections from C

- ATTACH THE NETWORK CABLE on client C

Expected result:
Connections in CLOSE_WAIT status get closed gracefully on server S.

Actual Result:
Connections in CLOSE_WAIT status never get closed on server S.
If you connect to the Tomcat server, response time is very slow.
The number of connections in CLOSE_WAIT status keeps growing.


Thanks for your time and efforts


Comment 11 Remy Maucherat 2004-10-06 16:10:34 UTC
I am not really interested in observations on the "problem".
Please reopen this report only when you can actually prove that Tomcat is at
fault. Tomcat will call close on all these sockets. If they are not properly
closed for some reason, then there is nothing that can be done here.
Comment 12 Domenico Aquilino 2004-10-06 16:38:20 UTC
Sorry,
I just suggested how to reproduce this bug. That's not "observations".

As far as "Tomcat fault" is concerned, let me know which evidence
do you need, as a non-responsive Tomcat server draining 100% of cpu is not enough.

Comment 13 Remy Maucherat 2004-10-06 16:48:03 UTC
It's obviously not enough. If you cannot point out some flaw with socket
handling in Tomcat, or indicate where it would go into a loop, then there's
nothing I can fix.

Since you're using Solaris, please first make sure you have applied ALL service
packs and stuff (Sun fixes Java on Solaris this way).
Comment 14 Domenico Aquilino 2004-10-06 18:27:37 UTC
The solaris server we use to run Tomcat is up to date.
However, the "OS" field of this bug is Windows XP,
so it doesn't seem to be a platform issue.

I checked and we do not use a particular SocketFactory, or set any socket params.

I believe the Connection pool could lock up itself when too many
connections have to be closed. 
Comment 15 Rainer Jung 2004-10-07 06:56:11 UTC
I suggest you try to reproduce first with only 10 or 50 threads. Reproduction
would be successful, if you still observe continuous CPU usage in this case,
maybe less than 100%, but still noticeable.

Then in this state you take a Thread Dump (sending the QUIT signal to the tomcat
process). It will write the method stack for all threads inside tomcat to
catalina.out. Wait a few seconds an take another Thread Dump, again wait and
take a third one.

Then look at the stacks of your working threads. Anything special about the top
part of the individual stacks?
Comment 16 Remy Maucherat 2004-10-07 09:18:40 UTC
Right. Stack traces showing some kind of bad behavior with the Tomcat code would
be a start.
Comment 17 Remy Maucherat 2004-10-07 09:54:06 UTC
BTW, for that kind of problem, would it be possible to test using the new thread
pool from Tomcat 5.5 ?
Get Tomcat 5.5.3, and set strategy="ms" on the Connector element.
Comment 18 Peter Evans 2006-07-05 15:40:28 UTC
We are seeing exactly the same symptoms with Tomcat/5.0.28 (on Linux).  Yes, I
know, this is pretty old.  But it worked well enough until we changed to a
different architecture that causes more load.  As we already have the latest
mod_jk it would appear that tomcat is the problem.  Does anybody see this
problem any more with 5.5? 
Comment 19 Jay Qu 2006-07-21 16:48:32 UTC
I think it is been resolved. Please check this
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6215050 
The java 1.4.2.12 and java 1.5.0.07 should fix the problem.
Comment 20 Sudhir Pandey 2008-02-22 04:20:47 UTC
Same Problem faced on solaries10 related to CLOSE_WAIT.Even I reduced 
tcp_time_wait_interval kernel parameter 10000(ms).
Comment 21 bugmenot 2008-02-26 01:54:29 UTC
also happens with debian:

Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_02-b05, mixed mode)
apache-tomcat-6.0.14
libapache-mod-jk 1.2.18-3etch1
Comment 22 bugmenot 2008-02-26 02:49:32 UTC
(In reply to comment #21)
> Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_02-b05, mixed mode)
> apache-tomcat-6.0.14
> libapache-mod-jk 1.2.18-3etch1

my fault. Works for me!

I am using apache httpclient class, which doesnt close sockets correctly. See
http://www.mail-archive.com/commons-httpclient-dev@jakarta.apache.org/msg04338.html 

Comment 23 Mark Thomas 2008-03-01 10:23:38 UTC
Re-closing as invalid based on last comment.