Bug 58151 - High CPU Usage randomly on ajp-apr-8009-Poller
High CPU Usage randomly on ajp-apr-8009-Poller
Status: RESOLVED FIXED
Product: Tomcat 8
Classification: Unclassified
Component: Catalina
8.0.23
PC Linux
: P2 normal with 5 votes (vote)
: ----
Assigned To: Tomcat Developers Mailing List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2015-07-19 07:45 UTC by gregh
Modified: 2015-08-16 08:31 UTC (History)
1 user (show)



Attachments
4 dumps of the vm (87.13 KB, text/plain)
2015-07-19 07:45 UTC, gregh
Details
Dump for 8.0.24 (79.84 KB, text/plain)
2015-07-26 09:24 UTC, gregh
Details
Thread 3187 dump (89.03 KB, text/plain)
2015-08-02 08:44 UTC, gregh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gregh 2015-07-19 07:45:47 UTC
Created attachment 32910 [details]
4 dumps of the vm

Every now and then I get high cpu on Centos 5, 6 7 servers with Tomcat 8 switching back to 7 I do not get the problem.  It seems to be in the ajp-apr-8009-Poller.

I use apache front end mod_jk connector and tomcat-native-1.1.33 and jdk1.8.0_40 (oracle)

I have tried 8.0.21 and 8.0.23

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
2691 runner    20   0 3644m 565m 8324 S 144.0 14.7  20:36.12 java     

Dumps attached.

I have asked on the user list but with no success.

Cheers Greg
Comment 1 gregh 2015-07-22 08:48:35 UTC
Will give version 8.0.24 a test.  TBA.
Comment 2 gregh 2015-07-26 09:23:07 UTC
Well, same again

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
2707 runner    20   0 3728m 599m 7356 S 36.5 15.6  52:28.85 java
Comment 3 gregh 2015-07-26 09:24:21 UTC
Created attachment 32936 [details]
Dump for 8.0.24
Comment 4 Milo 2015-07-26 09:32:01 UTC
Hello,

I'm having trouble with the following setup:
- Gentoo / mariaDB
- Tomcat 8.0.23
- Tomcat native 1.1.33
- Apache http 2.4.12 with modproxy_ajp

server.xml:
...
<Connector port="8208" enableLookups="false" maxThreads="1000"
  protocol="org.apache.coyote.ajp.AjpAprProtocol"
  connectionTimeout="5000"
  maxConnections="8000"
/>
...

Everything works fina as long as I don' restart the apache server while the tomcat server is also running. If I do so the I get this in my catalina.out (after tweaking logging.properties):
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,918,626,023,232]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,918,626,023,232] from poller
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Processing socket [139,918,626,023,232] for event(s) [1]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,918,626,078,656]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,918,626,078,656] from poller
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Processing socket [139,918,626,078,656] for event(s) [1]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,918,626,023,232]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,918,626,023,232] from poller
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Processing socket [139,918,626,023,232] for event(s) [1]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,918,626,078,656]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,918,626,078,656] from poller
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Processing socket [139,918,626,078,656] for event(s) [1]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,918,626,023,232]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,918,626,023,232] from poller
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Processing socket [139,918,626,023,232] for event(s) [1]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,918,626,078,656]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,918,626,078,656] from poller
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Processing socket [139,918,626,078,656] for event(s) [1]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,918,626,023,232]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,918,626,023,232] from poller
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Processing socket [139,918,626,023,232] for event(s) [1]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,918,626,078,656]
07-Jun-2015 11:47:48.217 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,918,626,078,656] from poller
07-Jun-2015 11:47:48.218 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Processing socket [139,918,626,078,656] for event(s) [1]

As long as everything is running fine the output is like:
07-Jun-2015 20:44:01.106 FINE [http-apr-127.0.0.1-8508-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Processing socket [139,978,017,344,528] for event(s) [1]
07-Jun-2015 20:44:01.109 FINE [http-apr-127.0.0.1-8508-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,978,017,344,528] from poller
07-Jun-2015 20:44:01.929 FINE [http-apr-127.0.0.1-8508-Acceptor-0] org.apache.tomcat.util.net.AprEndpoint$Acceptor.run APR socket [139,977,950,259,008] opened with remote port [38,896]
07-Jun-2015 20:44:01.930 FINE [http-apr-127.0.0.1-8508-Acceptor-0] org.apache.tomcat.util.net.AprEndpoint.processSocketWithOptions socket [139,977,950,259,008]
07-Jun-2015 20:44:01.944 FINE [http-apr-127.0.0.1-8508-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,977,950,259,008]
07-Jun-2015 20:44:01.944 FINE [http-apr-127.0.0.1-8508-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,977,950,259,008] from poller
07-Jun-2015 20:44:06.471 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,977,950,307,072]
07-Jun-2015 20:44:06.471 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,977,950,307,072] from poller
07-Jun-2015 20:44:06.497 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,977,950,291,840]
07-Jun-2015 20:44:06.497 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,977,950,291,840] from poller
07-Jun-2015 20:44:06.501 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,977,950,322,336]
07-Jun-2015 20:44:06.501 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,977,950,322,336] from poller
07-Jun-2015 20:44:06.826 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Add to poller socket [139,977,950,271,008]
07-Jun-2015 20:44:06.826 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,977,950,271,008] from poller
07-Jun-2015 20:44:12.766 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.maintain Timing out [139,977,950,307,072]
07-Jun-2015 20:44:12.766 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,977,950,307,072] from poller
07-Jun-2015 20:44:12.766 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Removed [139,977,950,307,072] from poller
07-Jun-2015 20:44:12.767 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.maintain Timing out [139,977,950,271,008]
07-Jun-2015 20:44:12.767 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,977,950,271,008] from poller
07-Jun-2015 20:44:12.767 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Removed [139,977,950,271,008] from poller
07-Jun-2015 20:44:12.768 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.maintain Timing out [139,977,950,322,336]
07-Jun-2015 20:44:12.768 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,977,950,322,336] from poller
07-Jun-2015 20:44:12.768 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Removed [139,977,950,322,336] from poller
07-Jun-2015 20:44:12.769 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.maintain Timing out [139,977,950,291,840]
07-Jun-2015 20:44:12.769 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Attempting to remove [139,977,950,291,840] from poller
07-Jun-2015 20:44:12.769 FINE [ajp-apr-8208-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.removeFromPoller Removed [139,977,950,291,840] from poller

To me this looks like a bug that Tomcat is not able to detect that the apache socket has gone away. But could also be a bug in the native ajp implementation of the socket handlers. When disabling apr support from server.xml all seems to function as expected and restarting apache does not result in a flood of messages.

Any ideas on what could be happening? A bug?

Thanks for any help,
Milo van der Zee
Comment 5 gregh 2015-07-26 10:28:11 UTC
Does it work ok on tomcat 7?
Comment 6 Milo 2015-07-26 10:55:42 UTC
Works fine in Tomcat 7. In Tomcat 8 I see a massive amount of context switching and so I assume Tomcat tries to create a connection and fails but tries again immediately.
Comment 7 gregh 2015-07-27 06:31:19 UTC
Centos 7 running via jsvc

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
 1459 runner    20   0 4588960 578536   8312 S 263.0 15.4   8:00.51 jsvc  

:{
Comment 8 Mark Thomas 2015-07-30 18:39:58 UTC
I can't reproduce this on either Linux or Windows with the 8.0.24 code or the latest 8.0.x code and tc-native 1.1.33. I've been hammering the Tomcat home page (via httpd) using ab with the only impact of an httpd restart is that some requests are delayed a little.

Steps to reproduce this with a clean Tomcat install would help.

Is the CPU spike you observe permanent or temporary? If temporary, how long until it settles back down?

If this is permanent, figuring out what thread(s) are using the CPU [1] and then remote debugging [2] to figure out what code path is causing the problem would be extremely helpful.

[1] http://javadrama.blogspot.co.uk/2012/02/why-is-java-eating-my-cpu.html
[2] http://wiki.apache.org/tomcat/FAQ/Developing#Q1
Comment 9 gregh 2015-07-31 07:57:02 UTC
It happens randomly on all three of my Centos boxes 5/6/7.  Maybe its only centos/rh? Once it starts you have to stop tomcat and restart it.
from the  "Dump for 8.0.24" 

"ajp-apr-8009-Poller" #26 daemon prio=5 os_prio=0 tid=0x00007f1d48374800 nid=0xc92 runnable [0x00007f1ceb9fa000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.tomcat.jni.Poll.poll(Native Method)
	at org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1810)
	at java.lang.Thread.run(Thread.java:745) 


rv = Poll.poll(pollers[i], nextPollerTime, desc, true);

Why only on my Centos, mystery.

I will have another go at debugging it when it happens again.
Comment 10 Milo 2015-07-31 08:17:33 UTC
Hello,

I'm using Gentoo so not only Centos is affected. I can trigger the bug by restarting apache while there are sessions active.

Load on the server does not seem to influence the occurance of the bug.

Sometimes the load goes down again after about 30 minutes till hours. I did not actually measure the time but I noticed that sometimes the load goes back to normal after a long time.

Because of the context switching (>400,000 cs/sec) that is causing the load I assume that the parkNanos call is in the neighborhood of where to look for fixes.

"ajp-apr-8200-exec-107" #3671 daemon prio=5 os_prio=0 tid=0x00007f62a007f800 nid=0x3adc waiting on condition [0x00007f624d4a6000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000ed8dfbe0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at org.apache.tomcat.util.threads.TaskQueue.poll(Unknown Source)
        at org.apache.tomcat.util.threads.TaskQueue.poll(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:745)

Thanks.

MAG,
Milo
Comment 11 Mark Thomas 2015-07-31 09:33:02 UTC
A couple of further questions.

What, exactly, do you mean by "sessions acive"? An actively used AJP connection to Tomcat?

Also, what do you mean by "restart"? Stop/start? Graceful restart?

It sounds like a connection is going in and out of the Poller very quickly putting the Poller into (effecively) a tight loop. What we really need to know is why hence my request for remote debugging to trace the code path. That should be enough to figure out what is going on.
Comment 12 gregh 2015-08-01 10:49:21 UTC
I have setup eclispe to debug tomcat running on the server, (not done much remote debugging, so bear with me) when I attach the debugger the with a breakpoint it stops as expected, if I then remove and add it again and refresh the application it hangs and does not go back to the breakpoint in eclipse?  I was expecting it to go back and allow me to step through the source.

Are there any more detail instruction how to do this?


Go to "Run->Debug Configurations...". Click on "Remote Java Applications", then click "New". Type in the title. Note that port 8000 from the Tomcat instructions. Save and run.
Eclipse will connect to the JVM that Tomcat is running under. Wow, that was easy! 

Thanks Greg
Comment 13 Milo 2015-08-01 11:08:27 UTC
What do you mean with "trace the code path". Isn't that the stacktrace? But that is already given so what exactly do you mean?
Comment 14 gregh 2015-08-02 08:43:16 UTC
Hello,

It started again and was on thread 3187 (see tomcatdump-8.0.24-2015-08-02.txt)

3187runner2003828m707m13mS16.518.56:38.29java	6:38.29 java		"ajp-apr-8009-Poller" #28 daemon prio=5 os_prio=0 tid=0x00007f0be4626000 nid=0xc73 runnable [0x00007f0b92745000]



"ajp-apr-8009-Poller" #28 daemon prio=5 os_prio=0 tid=0x00007f0be4626000 nid=0xc73 runnable [0x00007f0b92745000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.tomcat.jni.Poll.poll(Native Method)
	at org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1810)
	at java.lang.Thread.run(Thread.java:745)


I managed to get the debugger attached and a breakpoint in AprEndpoint class line 1802 (below),  it seemed to be going in a loop which was consuming the cpu.


// Poll for the specified interval
for (int i = 0; i < pollers.length; i++) {
..
}

the variable pollers.length == 1

For some reason it stopped breaking and the CPU went back to normal, after I held it on the breakpoint for some time as I was looking at the source, so not sure?

Cheers Greg
Comment 15 gregh 2015-08-02 08:44:09 UTC
Created attachment 32951 [details]
Thread 3187 dump
Comment 16 gregh 2015-08-02 09:06:21 UTC
Checking the source from 7.0.61 which works ok with 8.0.24 there are some  differences in the run method, maybe these need checking???

removeFromPoller(info.socket); added
rv = mergeDescriptors(desc, rv); added

differences

// Process socket timeouts
if (getSoTimeout() > 0 && maintain++ > 1000 && pollerRunning) {
 // This works and uses only one timeout mechanism for everything, but the
 // non event poller might be a bit faster by using the old maintain.
 maintain = 0;
 maintain();
}

// Process socket timeouts
if (getSoTimeout() > 0 && pollerRunning) {
 // This works and uses only one timeout mechanism for everything, but the
 // non event poller might be a bit faster by using the old maintain.
 maintain();
}

Just an idea?

Cheers Greg
Comment 17 Mark Thomas 2015-08-03 20:34:17 UTC
The stack trace only shows where execution is at a single point in time. It doesn't show how execution reached that point or where execution will go afterwards.

I suspect that something is putting a socket in an odd/bad state. The code then enters the following loop:
- the socket is queued to be added to the poller
- the socket is added to the poller
- poll() finds an event for the socket
- the event is processed
- processing terminates early because of the odd/bad state

What we need to figure out is what is this odd/bad state so we can detect it and ensure the socket is correctly processed (probably closed).

It is possible that a timeout fired while you were looking at the code. I'd recommend increasing the socket timeout to 5-10 minutes while you are debugging this.
Comment 18 gregh 2015-08-04 07:57:55 UTC
Mark,

I do not think I will be able to debug the run() {..}  Trying to check whether the odd/bad state when I really have no idea what it is supposed to do.


if (((desc[n*2] & Poll.APR_POLLHUP) == Poll.APR_POLLHUP)|| ((desc[n*2] & Poll.APR_POLLERR) == Poll.APR_POLLERR) || ((desc[n*2] & Poll.APR_POLLNVAL) == Poll.APR_POLLNVAL)) {
  if (!processSocket(desc[n*2+1], SocketStatus.ERROR)) {
    // Close socket and clear pool
    closeSocket(desc[n*2+1]);
  }
}

Would the randomness of this be concurrency issues?  I am running on slower hardware (although it is happening on my new i5 box) had this before, adding and removing at the same time, causing deadlocks.

I don't mind making changes to the source ie backtracing the mods to see which is causing the run() to loop continuously.

Cheers Greg
Comment 19 gregh 2015-08-04 08:04:31 UTC
if you check the private void add{..}

what did the ok do?

boolean ok = false;
synchronized (this) {
    // Add socket to the list. Newly added sockets will wait
    // at most for pollTime before being polled. Don't add the
    // socket once the poller has stopped but destroy it straight
    // away
    if (pollerRunning && addList.add(socket, timeout, flags)) {
        ok = true;
        this.notify();
    }
}
if (!ok) {
    // Can't do anything: close the socket right away
    boolean comet = connections.get(
    Long.valueOf(socket)).isComet();
    if (!comet || (comet && !processSocket(
        socket, SocketStatus.ERROR))) {
        closeSocket(socket);
    }
}
Comment 20 Mark Thomas 2015-08-04 10:02:37 UTC
Don't worry too much about why the loop is happening. Once we know the route execution is taking we'll probably be able to figure out the why. As a starting point, would it be possible for you to find out the following when the loop happens:
1. Put a break point on AprEndpoint.Poller.add(long, int, int) and tell us the full stack trace for the code that adds the socket to the Poller

2. Put as break point on ApeEndpoint.processSocket(long, status) and (for the looping socket) tell us what the value of the status parameter is.

The above two bits of information should allow us to get a little further.
Comment 21 Milo 2015-08-04 10:19:36 UTC
Mark, is it helpfull if I try to setup desktop sharing while the bug is happening? Then you could debug whatever you like :)

But I'll also add the stacktraces as soon as I'm able to grap them.
Comment 22 Milo 2015-08-04 22:48:33 UTC
Breakpoint 1:
AprEndpoint$Poller.add(139982580808944, -1, 1) line: 1485	
AprEndpoint$Poller.access$500(AprEndpoint$Poller, long, int, int) line: 1251	
AprEndpoint$AprSocketWrapper.registerforEvent(int, boolean, boolean) line: 2516	
AjpAprProtocol$AjpConnectionHandler.release(SocketWrapper<Long>, Processor<Long>, boolean, boolean) line: 136	
AjpAprProtocol$AjpConnectionHandler(AbstractProtocol$AbstractConnectionHandler<S,P>).process(SocketWrapper<S>, SocketStatus) line: 722	
AprEndpoint$SocketProcessor.doRun() line: 2463	
AprEndpoint$SocketProcessor.run() line: 2452	
ThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1142	
ThreadPoolExecutor$Worker.run() line: 617	
TaskThread$WrappingRunnable.run() line: not available	
TaskThread(Thread).run() line: 745	

The socket number changes every call to this method.
==================
Breakpoint 2:
AprEndpoint.processSocket(long, OPEN_READ) line: 887	
AprEndpoint$Poller.run() line: 1884	
Thread.run() line: 745	

I set a conditional breakpoint (return !status.equals(SocketStatus.OPEN_READ);) and I've never seen anything else as OPEN_READ.

Hope this helps...
Comment 23 Milo 2015-08-05 12:15:27 UTC
I found that completely stopping apache triggers the bug instantly. The number of context switches rises from about 5000 cs/sec to 500000 cs/sec.

When I then start apache cs/sec goes down to about 15000 cs/sec.
Comment 24 Mark Thomas 2015-08-05 15:29:27 UTC
Thanks for all the pointers. I'm able to reproduce behavior that looks like the behavior described with high CPU on a single java thread. I can't be sure that I am seeing the same problem but I am seeing a problem which I should be able to fix. I'll update this ticket as I make progress. I might have some dev 8.0.x builds for people to test.
Comment 25 Mark Thomas 2015-08-05 19:44:00 UTC
This should be fixed now in 8.0.x/trunk.

The problem was that a return value of zero was treated as a non-blocking read that returned no data rather than as EOF. That meant that the socket was put straight back into the Poller only for the Poller to trigger a read event immediately for EOF and so the loop continued.
Comment 26 Milo 2015-08-05 19:57:53 UTC
Thankyou very much.
Comment 27 gregh 2015-08-06 07:22:25 UTC
Excellent, I will apply the change in AjpAprProcessor locally to see if its the same issue (I do not stop and restart tomcat for it to happen). Thanks!

Greg
Comment 28 Milo 2015-08-06 07:34:36 UTC
@Gregh: not restarting Tomcat but restarting Apache triggers the bug in my setup. Stopping Apache kills one end of the socket and so Tomcat starts the loop.
Comment 29 gregh 2015-08-16 08:31:36 UTC
Seems fixed with the AjpAprProcessor patch, been over a week without the fans starting!!!  Thanks @Milo (& Mark).

Greg :-)