Bug 64155 - Tomcat 7 Performance: acceptor thread bottleneck at getPoolSize() located at TaskQueue offer function
Summary: Tomcat 7 Performance: acceptor thread bottleneck at getPoolSize() located at ...
Status: NEEDINFO
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Catalina (show other bugs)
Version: trunk
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 64156 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-02-18 18:31 UTC by Torres Yang
Modified: 2020-02-25 16:04 UTC (History)
0 users



Attachments
Reduced getPoolSize (1.52 KB, patch)
2020-02-18 18:31 UTC, Torres Yang
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Torres Yang 2020-02-18 18:31:51 UTC
Created attachment 37022 [details]
Reduced getPoolSize

Tomcat 7 Performance:

During our performance testing, we found out that the acceptor thread bottleneck at the getPoolSize() located at TaskQueue offer function, which is an expensive call to AbstractQueuedSynchronizer acquire lock. 


Proposed fix is to store and use poolSize as local variable. This reduces 2-3 expensive calls down to 1 for each request.

    @Override
    public boolean offer(Runnable o) {
      //we can't do any checks
      if (parent == null) return super.offer(o);
      // getPoolSize() is expensive call to AbstractQueuedSynchronizer acquire lock
      final int poolSize = parent.getPoolSize();
      //we are maxed out on threads, simply queue the object
      if (poolSize == parent.getMaximumPoolSize()) return super.offer(o);
      //we have idle threads, just add it to the queue
      if (parent.getSubmittedCount() <= poolSize) return super.offer(o);
      //if we have less threads than maximum force creation of a new thread
      if (poolSize < parent.getMaximumPoolSize()) return false;
      //if we reached here, we need to add it to the queue
      return super.offer(o);
    }
Comment 1 Mark Thomas 2020-02-21 21:21:06 UTC
*** Bug 64156 has been marked as a duplicate of this bug. ***
Comment 2 Mark Thomas 2020-02-21 21:25:34 UTC
parent is an instance of o.a.t.u.threads.ThreadPoolExecutor

It inherits getPoolSize() from j.u.c.ThreadPoolExecutor

That method returns an int with no synchronization.

There is no use of AbstractQueuedSynchronizer in that call path.
Comment 3 Torres Yang 2020-02-21 21:35:53 UTC
Hi Mark,


Thanks for your prompt reply, o.a.t.u.threads.ThreadPoolExecutor indeed extends the j.u.c.ThreadPoolExecutor, and it inherits the getPoolSize() from j.u.c.ThreadPoolExecutor as well.

However, the j.u.c.ThreadPoolExecutor's getPoolSize() implementation has ReentrantLock in-placed.


Here is the code from Java 8:

    /**
     * Returns the current number of threads in the pool.
     *
     * @return the number of threads
     */
    public int getPoolSize() {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            // Remove rare and surprising possibility of
            // isTerminated() && getPoolSize() > 0
            return runStateAtLeast(ctl.get(), TIDYING) ? 0
                : workers.size();
        } finally {
            mainLock.unlock();
        }
    }


It's very obvious that whenever go into this function, it must first obtain the lock before anything.


Hope this explain your question or concern.



Bests,


Torres
Comment 4 Mark Thomas 2020-02-21 21:52:21 UTC
Ah. I see appears to have been added at some point in Java 7.

Under what test conditions does this bottleneck appear?
Comment 5 Torres Yang 2020-02-21 22:10:58 UTC
For our test setup, our gateway have 2000 tomcat threads waiting in the pool. At the same time, we simulate more than 100 virtual users try to hit the service on gateway.

You can image that under heavy load, we are expect to see slowness because of this getPoolSize() lock.
Comment 6 Mark Thomas 2020-02-25 16:04:02 UTC
I've built various test cases, some load testing Tomcat, some testing ThreadPoolExecutor directly and I am unable to reproduce any results that show contention on getPoolSize().

Please provide the simplest possible test case (i.e. one that tests ThreadPoolExecutor directly) that demonstrates decreasing performance with increasing concurrency.