Bug 50839 - Closing connection on the client side (browser) during unfinished transmission result in 100% CPU slike for 30 sec during socket shutdownd in jk_connect.c
Summary: Closing connection on the client side (browser) during unfinished transmissio...
Alias: None
Product: Tomcat Connectors
Classification: Unclassified
Component: isapi (show other bugs)
Version: 1.2.31
Hardware: PC Windows 2000
: P2 regression with 5 votes (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
: 51232 (view as bug list)
Depends on:
Reported: 2011-02-27 12:47 UTC by Serhiy
Modified: 2011-05-21 22:48 UTC (History)
1 user (show)

Zip which contains the logfiles and source code of JSP (218.88 KB, application/zip)
2011-05-16 01:19 UTC, Konstantin Preißer
New isapi logfile (254.41 KB, application/zip)
2011-05-16 01:58 UTC, Konstantin Preißer

Note You need to log in before you can comment on or make changes to this bug.
Description Serhiy 2011-02-27 12:47:01 UTC
Windows 2000 server SP4
Tomcat 6.0.26
JK connector

All components are on the same server.

During research an issue why once in a while Tomcat on our servers starts rejecting connection on port for JK connector with error 61, we noticed that looks like it is starting with multiple 10054 errors recorded by JK connector during our regression testing. Last one could be easily reproduced on all our servers where it is installed. Error entry follows by spike in CPU utilization for 30 seconds during attempt to shutdown a socket (reporting large negative or positive number of lingering bytes read).

Here the snapshot from JK connector logs during these events:

[Sat Feb 26 19:22:19.979 2011] [2856:1508] [debug] isapi_write_client::jk_isapi_plugin.c (1204): Writing 2057 bytes of data to client
[Sat Feb 26 19:22:19.979 2011] [2856:1508] [error] isapi_write_client::jk_isapi_plugin.c (1210): WriteClient failed with 10054 (0x00002746)
[Sat Feb 26 19:22:19.979 2011] [2856:1508] [info] ajp_process_callback::jk_ajp_common.c (1885): Writing to client aborted or client network problems
[Sat Feb 26 19:22:19.979 2011] [2856:1508] [info] ajp_service::jk_ajp_common.c (2543): (worker1) sending request to tomcat failed (unrecoverable), because of client write error (attempt=1)
[Sat Feb 26 19:22:19.979 2011] [2856:1508] [info] HttpExtensionProc::jk_isapi_plugin.c (2217): service() failed because client aborted connection
[Sat Feb 26 19:22:19.979 2011] [2856:1508] [debug] ajp_reset_endpoint::jk_ajp_common.c (757): (worker1) resetting endpoint with socket 2776 (socket shutdown)
[Sat Feb 26 19:22:19.979 2011] [2856:1508] [debug] jk_shutdown_socket::jk_connect.c (726): About to shutdown socket 2776 [ ->]
[Sat Feb 26 19:22:49.010 2011] [2856:1508] [debug] jk_shutdown_socket::jk_connect.c (808): Shutdown socket 2776 [ ->] and read 1402725467 lingering bytes in 30 sec.

Other things noted: 
- download speed between webserver and client browser is about 60-80 Kb/sec when JK is used. Content served by IIS or Tomcat directly delivered with about 2Mb/sec. 
- changing packet size (connector and tomcat) from default to 32K didn't make a difference.
- Configure JK2 connector instead of JK (same box, same everything) - no issues whatsoever (no CPU spikes, no download speed issues).

All values in server.xml and workers.properties are pretty much at default values, but below is our workers.properties, just in case:


and workers2.properties

# add this section to test the proposition that it resolved the upload bug


# Define the communication channel 
info=Ajp13 forwarding over socket

# Map the Tomcat examples webapp to the Web server uri space

Here the section from server.xml regarding AJP:
    <!-- Define an AJP 1.3 Connector on port 8109 -->
    <Connector port="8109" protocol="AJP/1.3" redirectPort="8443" />

Please let me know if more information is needed.

Comment 1 Jeffrey Tacy 2011-03-15 00:57:40 UTC
Confirmed that this issue also exists when using the mod_jk 1.2.31 binary on Windows Server 2003 with httpd 2.2.14. I was unable to reproduce this problem on mod_jk 1.2.30. With only a single CPU available, this virtually disables the web server for about 30 seconds. While this problem is naturally triggered by end user behavior or unreliable client network connections, it could also easily be exploited for denial of service.
Comment 2 Konstantin Preißer 2011-05-16 01:19:08 UTC
Created attachment 26999 [details]
Zip which contains the logfiles and source code of JSP
Comment 3 Konstantin Preißer 2011-05-16 01:19:28 UTC

I could also observe this issue using Windows Server 2008 SP2 (32 bit), IIS 7.0, ISAPI Redirector 1.2.31. It happened when a Servlet was producing lots of data, a client made a request to this servlet and canceled the download a few seconds later. This was leading to 100% CPU usage for about 30 seconds.

I reported this issue on the users list [1] where Tim Whittington pointed me to this bug and asked me to attach a TRACE level log file.

To do this, I made a jsp (download3.jsp in the attachment) which produces about 32 MB of random data and writes it to the client. I also made a Downloader (SlowDownloader.java) which uses a URLConnection to make a request to the servlet and download it slowly, and abort the connection after 2400 bytes are read. The log of this Downloader is in SlowDownloader.log, the log of the JSP is in JSPLog.log.

isapi_redirect.log contains the logfile of the ISAPI redirector with TRACE log level.

[1] http://markmail.org/message/g7iztucoucze2wrw
Comment 4 Konstantin Preißer 2011-05-16 01:57:24 UTC

I just discovered that writing a byte array of 32 bytes produces a bunch of lines in the isapi log each time. I changed the jsp to write an array of 32000 bytes instead of 32 each cycle. I hope the ISAPI log it is a bit better readable now (although the file is bigger).
Comment 5 Konstantin Preißer 2011-05-16 01:58:02 UTC
Created attachment 27000 [details]
New isapi logfile
Comment 6 Tim Whittington 2011-05-16 08:25:44 UTC
The trace log from Konstantin shows that the ISAPI Redirector is spending all it's time in jk_shutdown_socket, calling jk_is_input_event - basically draining all the AJP response packets (which will all be SEND_BODY_CHUNK packets).

The Java side of the AJP connector also blindly continues writing out an entire chunk (across multiple SEND_BODY_CHUNK packets), and possibly across multiple writes (I can't quite get my head around how an AJP response stream would be terminated on a client abort on the HTTP side).

I think this behaviour was introduced when the socket shutdown was tidied up to politely drain lingering bytes on the AJP connection before closing the socket - in extreme cases this seems to allow the Java side to continue writing data.

I'm thinking at this point the simplest fix is to cap the amount of lingering bytes the AJP reset will handle on the Tomcat Connector end, and simply close the AJP connection if that is exceeded.
Comment 7 Rainer Jung 2011-05-16 08:48:21 UTC
+1 to such a solution. Limit by number of bytes plus the time we do the lingering. Break after say 32KB read or 2 seconds linger. Waiting longer or reading more is just wasting precious resources.

Comment 8 Tim Whittington 2011-05-19 00:23:12 UTC
I've changed the socket shutdown to limit the amount of lingering bytes (which in an AJP scenario == the remaining response body) to 32k and force a socket shutdown once that limit is reached.
I've also shortened the initial poll timeout and reduced the total linger time to 2s to reduce use of web server resources during socket shutdown.

Changes will be in 1.2.32
Comment 9 Mark Thomas 2011-05-19 12:06:20 UTC
*** Bug 51232 has been marked as a duplicate of this bug. ***
Comment 10 Tim Whittington 2011-05-21 09:29:41 UTC
After a hunt through the revisions between 1.2.30 and 1.2.31, it appears that [1] introduced this bug.

The counter of bytes read on each attempt to read lingering bytes was not reset in the loop after this change, resulting in the loop reading lingering bytes not exiting normally if there were >= 512 lingering bytes to start with.
This also explains the total lingering byte count wrapping into negative numbers.

I had already fixed this at the same time as introducing the lingering byte limits.

[1] https://svn.apache.org/viewvc?view=revision&sortby=date&revision=998108
Comment 11 Serhiy 2011-05-21 18:53:12 UTC

Issue was in the 1.2.31. You stated you've fixed that. Now I'm a little confused... Which version of the conector has fix?

Comment 12 Tim Whittington 2011-05-21 22:48:42 UTC
The fix for this issue and the change to cap the lingering bytes/time, will both be in 1.2.32.