Bug 49730 - Race condition in StandardThreadExecutor : requests are sometimes enqueued instead of creating new threads
Race condition in StandardThreadExecutor : requests are sometimes enqueued in...
Status: RESOLVED FIXED
Product: Tomcat 6
Classification: Unclassified
Component: Catalina
6.0.29
Macintosh All
: P2 normal (vote)
: default
Assigned To: Tomcat Developers Mailing List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2010-08-09 16:16 UTC by Sylvain Laurent
Modified: 2010-11-04 10:28 UTC (History)
0 users



Attachments
Patch for Tomcat 6 (3.64 KB, patch)
2010-08-09 16:38 UTC, Sylvain Laurent
Details | Diff
Patch for tomcat 7 (4.36 KB, patch)
2010-08-09 16:38 UTC, Sylvain Laurent
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sylvain Laurent 2010-08-09 16:16:04 UTC
In tomcat 6, I often configure an Executor with minSpareThreads=0 to work around memory leak issues upon redeployment.

Sometimes (especially in development), when I refresh a page of my webapp with Safari, Chrome or Firefox, some resources of the page take several seconds (>10s) to be served though they are static resources and should come in less than 50ms. For instance, over 15 requests for a page (1 for html, the others for resources like js, css, images...), I sometimes have 1 or 2 that take >10s. 

After analysis, I found that in org.apache.catalina.core.StandardThreadExecutor.TaskQueue.offer(Runnable) the statement 
if (parent.getActiveCount()<(parent.getPoolSize()))
is sometimes true unexpectedly. Here is the scenario :

- ThreadPoolExecutor is empty
- the user refreshes the page (or accesses it with an empty cache) in his web browser for a page that uses a more than 10-15 resources
- the browser establishes one TCP connection and a new Thread is created
- after the browser receives the response, it decides to load as many resources as possible in parallel. For this it establishes up to 6 TCP connections (in my tests)
- The Acceptor thread calls StandardThreadExecutor.execute to process each incoming connection.
- For each call, StandardThreadExecutor.TaskQueue.offer(Runnable) is being called
- if you study the sources of Java 6 ThreadPoolExecutor, you can see that there's a small delay between the time a new Thread is created (thus increasing poolSize) and the time it starts working on its first task (increasing the activeCount)
- Since in my case connections are established in a rapid burst, the calls to TaskQueue.offer() are sometimes faster than this small delay, so that we do have parent.getActiveCount()<parent.getPoolSize() and thus the task is enqueued instead of forcing the creation of a thread to serve it.
- Since Keep-Alive is enabled and tomcat 6 threads take care of only one TCP connection at a time, the requests in the queue must wait for the keep-alive timeout so that a Thread is returned to the pool to serve pending tasks.
- With 25s keepAliveTimeOut, it means that some requests take more than 25s to be served eventhough the pool was never full and the server quite idle!!!

Other facts about this issue :
- Although my test case starts with an empty pool, it can occur even on a loaded server. The thing that triggers the issue is the burst of new TCP connections.
- The problem is less severe with a lower keepAliveTimeout, or if keepalive is disabled. In any case, it also depends on the time taken to serve current requests.
- The issue also affects tomcat 7 but is less severe because tc7 threads are returned to the pool after each http request, even if the TCP connection is kept alive. The impact would the same as with tc6 with keepAlive disabled.
Comment 1 Sylvain Laurent 2010-08-09 16:38:11 UTC
Created attachment 25865 [details]
Patch for Tomcat 6
Comment 2 Sylvain Laurent 2010-08-09 16:38:47 UTC
Created attachment 25866 [details]
Patch for tomcat 7
Comment 3 Sylvain Laurent 2010-08-09 16:43:00 UTC
the proposed patch for tc6 tries to modify as few things as possible.
It would be cleaner to backport some stuff from tc7, but I don't know if we have to keep tc6 "interface" compatible with previous releases. Example : in tc7 TaskQueue is now a public class where it was an inner class of StandardThreadExecutor in tc6...
Comment 4 Mark Thomas 2010-09-30 14:05:30 UTC
Thanks for the analysis and the patch.

Patch applied to 7.0.x and will be in 7.0.3 onwards.

Patch proposed for 6.0.x

In terms of compatibility, we try and aim for binary compatibility between point releases although we do sometimes break that rule.
Comment 5 Konstantin Kolinko 2010-10-02 15:24:51 UTC
Regarding TC7:
1) Why StandardThreadExecutor in TC7 was not patched? Can it be removed, or should it be patched by applying TC6 patch to TC7?

2) (unrelated issue) Apparently there is a bug in catch (InterruptedException x) block in ThreadPoolExecutor.execute(..) in TC7:
      s/Thread.interrupted()/Thread.currentThread().interrupt()
For reference:
http://www.ibm.com/developerworks/java/library/j-jtp05236.html

Regarding TC6 and TC7 patches:
2) I would prefer using a finally block to decrement the counter on errors.
Comment 6 Sylvain Laurent 2010-10-02 17:36:49 UTC
(In reply to comment #5)
> Regarding TC7:
> 1) Why StandardThreadExecutor in TC7 was not patched? Can it be removed, or
> should it be patched by applying TC6 patch to TC7?

StandardThreadExecutor actually delegates to a org.apache.tomcat.util.threads.ThreadPoolExecutor which is the one that has been patched. 
But then your question leads to this next : why the catch (RejectedExecutionException rx) in StandardThreadExecutor.execute ? it's already handled in org.apache.tomcat.util.threads.ThreadPoolExecutor.execute

Some javadoc comments would also be useful to explain why we have 2 classes whereas there was only one in tc6 (I guess it's because of the Lifecycle refactoring and JMX things?)

> Regarding TC6 and TC7 patches:
> 2) I would prefer using a finally block to decrement the counter on errors.

agreed, but it should not be decremented if the task can be forced into the queue after a RejectedExecutionException.

Do you want a new patch to the current trunk ? for the interrupt() and the decrement in finally ?
Comment 7 Mark Thomas 2010-11-04 10:28:20 UTC
Fixed in 6.0.x for 6.0.30