Bug 40418 - APR connector deadlock
APR connector deadlock
Status: RESOLVED FIXED
Product: Tomcat 5
Classification: Unclassified
Component: Connector:HTTP
5.5.17
All All
: P2 normal (vote)
: ---
Assigned To: Tomcat Developers Mailing List
:
: 40423 (view as bug list)
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2006-09-05 15:59 UTC by Vicen
Modified: 2006-09-06 06:40 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vicen 2006-09-05 15:59:19 UTC
Under high ssl load (SPECweb2005 Banking application) the acceptor thread stops
accepting new connections. The acceptor blocks in getWorkerThread() because
there isn't Worker threads available and maxThreads has been reached. The
problem is that all Worker threads are blocked waiting for a socket to process,
but this socket is never assigned because there is a leak in the Acceptor thread
code. I can attach the stacktrace of the relevant threads if necessary.

Regards,
 
- Vicenç


This modification remove the leak of Worker threads.

Index: connectors/util/java/org/apache/tomcat/util/net/AprEndpoint.java
===================================================================
--- connectors/util/java/org/apache/tomcat/util/net/AprEndpoint.java   
(revision 130)
+++ connectors/util/java/org/apache/tomcat/util/net/AprEndpoint.java    (working
copy)
@@ -980,6 +980,7 @@
          */
         public void run() {

+            Worker workerThread = null;
             // Loop until we receive a shutdown command
             while (running) {

@@ -994,12 +995,15 @@

                 try {
                     // Allocate a new worker thread
-                    Worker workerThread = getWorkerThread();
+                    if (workerThread == null)
+                        workerThread = getWorkerThread();
+
                     // Accept the next incoming connection from the server socket
                     long socket = Socket.accept(serverSock);
                     // Hand this socket off to an appropriate processor
                     if (setSocketOptions(socket)) {
                         workerThread.assign(socket);
+                        if(socket != 0) workerThread = null;
                     } else {
                         // Close socket and pool right away
                         Socket.destroy(socket);
Comment 1 Remy Maucherat 2006-09-06 09:40:32 UTC
This is not making sense. If you think accept can return 0 when using SSL, then
there's a bug with accept (since it's most likely not supposed to return a null
pointer; I have not coded for that case, and it's a miracle it doesn't just
crash right away), it has nothing to do with a "deadlock".
Comment 2 Vicen 2006-09-06 10:40:36 UTC
(In reply to comment #1)
> This is not making sense. If you think accept can return 0 when using SSL, then
> there's a bug with accept (since it's most likely not supposed to return a null
> pointer; I have not coded for that case, and it's a miracle it doesn't just
> crash right away), it has nothing to do with a "deadlock".

(In reply to comment #1)
> This is not making sense. If you think accept can return 0 when using SSL, then
> there's a bug with accept (since it's most likely not supposed to return a null
> pointer; I have not coded for that case, and it's a miracle it doesn't just
> crash right away), it has nothing to do with a "deadlock".

If setSocketOptions() returns false the connector leak one Worker thread or am I
missing something?

- Vicenç
Comment 3 Remy Maucherat 2006-09-06 11:08:07 UTC
You need to test again with the current Tomcat code.
Comment 4 Remy Maucherat 2006-09-06 11:14:43 UTC
*** Bug 40423 has been marked as a duplicate of this bug. ***
Comment 5 Vicen 2006-09-06 13:15:55 UTC
(In reply to comment #3)
> You need to test again with the current Tomcat code.

Now I see the commit http://svn.apache.org/viewvc?rev=429003&view=rev. Sorry for
the noise.

Thanks,

- Vicenç
Comment 6 Remy Maucherat 2006-09-06 13:40:34 UTC
There's a small glitch remaining with that commit, since I put everything on one
line of code (unlike on the TC 6 branch), which is a mistake if accept throws an
exception (after verification, it cannot return 0) - it is allowed but most
likely it is not going to happen. I fixed that in a new commit.