Bug 60578

Summary: Server CPU maxed out (100% per core) randomly after a few hours
Product: Tomcat 7 Reporter: Ralf Hauser <hauser>
Component: ConnectorsAssignee: Tomcat Developers Mailing List <dev>
Severity: normal    
Priority: P2    
Version: 7.0.93   
Target Milestone: ---   
Hardware: PC   

Description Ralf Hauser 2017-01-12 12:17:17 UTC
On Debian stable since a few days the tomcat all of a sudden goes to next to 100% CPU.

Not much traffic is seen on the firewall .
Is this related to Bug 57544 ?

For the amount of cpu used, tomcat is astonishingly responsive.

Some have speculated that this is a subtle kind of DOS attack and with a 

   kill -QUIT 

I get dozens of the two threads I don't before the cpu went high:

"http-nio-" #13579 daemon prio=5 os_prio=0 tid=0x00007fa68c040000 nid=0x2352 runnable [0x00007fa614461000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.coyote.http11.AbstractInputBuffer.nextRequest(AbstractInputBuffer.java:244)
	at org.apache.coyote.http11.AbstractNioInputBuffer.nextRequest(AbstractNioInputBuffer.java:151)
	at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1152)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:658)
	at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:222)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1523)
	- locked <0x0000000095c30eb8> (a org.apache.tomcat.util.net.SecureNioChannel)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-" #112 daemon prio=5 os_prio=0 tid=0x00007fa6b4e84000 nid=0x45db runnable [0x00007fa61aff8000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x00000000866957c0> (a sun.nio.ch.Util$3)
	- locked <0x00000000866957b0> (a java.util.Collections$UnmodifiableSet)
	- locked <0x0000000086695688> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1050)
	at java.lang.Thread.run(Thread.java:745)

The same version of tomcat was in use without the phenomenon for weeks
Comment 1 Mark Thomas 2017-01-12 13:25:38 UTC
As far as I can tell from http://metadata.ftp-master.debian.org/changelogs/main/t/tomcat8/tomcat8_8.0.14-1+deb8u6_changelog the fix for BZ57544 was not back-ported.

You'll need to contact the Debian maintainers for more info.

*** This bug has been marked as a duplicate of bug 57544 ***
Comment 2 Emmanuel Bourg 2017-01-12 16:07:58 UTC
I'll take care of backporting the fix to Debian. In the meantime, you can install the latest version of the tomcat8 package from the jessie-backports repository. You'll get the latest 8.5.9 version with the fix included.
Comment 3 Ralf Hauser 2017-01-16 10:09:22 UTC
thanks, the backport 8.5.9 appears to solve the problem (albeit not that long observation period).

One side-effect was that the Bug 60126 came.
  <<The code of method _jspService(HttpServletRequest, HttpServletResponse) is exceeding the 65535 bytes limit>>



to [Tomcat_Home]/conf/web.xml as per https://www.assetbank.co.uk/support/documentation/knowledge-base/byte-limit-exceeded-error/
solved that for me, but this might probably not work for everyone with the same problem
Comment 4 Mark Thomas 2017-02-14 11:22:28 UTC
Note the root cause of this in Debian, Ubuntu etc. was back-porting the security fix for CVE-2016-6816 without back-porting the 57544 fix. This made it trivial to trigger the loop described in bug 57544.

Without the back-port of the CVE-2016-6816 the loop described in bug 57544 was significantly harder to trigger. The root cause of 57544 has not been identified. It may have been user triggered but it may also have been triggered by an application bug.
Comment 5 Jerome Terry 2017-02-16 00:06:11 UTC
I have experienced what appears to be the same issue on Ubuntu 14.04 with Tomcat 7.0.52. Here's a link to tweets containing the diagnostics I performed. https://twitter.com/jeromeleoterry/status/831865811962908672

In my use case, a Nessus scan on ports 8080 and 8009 was triggering the CPU to get maxed out. I was able to reproduce this issue in a QA environment with no load applied to the tomcat, then triggered an Nessus scan. Nessus scan with only the HTTPS connector enabled didn't trigger the CPU staking at 100%. 

I ran strace and the bulk of the time was being spent in futex. I also ran Linux perf, and AbstractHttp11Processor.process was consuming 49.91% of CPU, while AbstractInputBuffer.nextRequest was consuming 50.06% of the CPU. 

In Catalina.out, I saw the error messages "Invalid message received with signature" and "Error parsing HTTP request header". 

This is a nasty one. A security scan on port 8080 or 8009 can trigger all cores to max out, which is a simple way of doing a denial of service attack.