Bug 52856 - High CPU load when the underlying socket error in ssl_socket_send is EAGAIN
Summary: High CPU load when the underlying socket error in ssl_socket_send is EAGAIN
Status: RESOLVED FIXED
Alias: None
Product: Tomcat Native
Classification: Unclassified
Component: Library (show other bugs)
Version: 1.1.22
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 54064 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-03-08 14:29 UTC by Dmitry Kukushkin
Modified: 2012-11-19 05:39 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Kukushkin 2012-03-08 14:29:04 UTC
How to reproduce:
Tomcat with the APR connector, using HTTPS scheme, when a client requests for some quite a big (~100K-200K) static file, and breaks the connection unexpectedly.
Shell script to reproduce:
#!/bin/bash
# A way to reproduce the bug in the APR connector.
# Must be run from a different machine, the Tomcat is running on.
# Author: Andreas Florath ( andreas.florath at external.telekom.de )


for port in 17890 17891 17892 17893 17894 17895 17896 17897 17898 17899; do
     curl --local-port $port --insecure "https://<host>:<port>/<some big file>" &
     sleep 0.1
     iptables -A OUTPUT -p tcp --sport $port -j REJECT
done


Connector enters a socket write loop, taking nearly 1 CPU core.
A snippet from the strace log:
...
[pid 32221] write(91, "\347\2\22\36\334\222\2525g\215\342\352A\324\246\340\7\265(\205n\362\215\34az\363>\37Up\213"..., 1789 <unfinished ...>
[pid 32222] <... futex resumed> )       = 0
[pid 32221] <... write resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 32222] write(94, "\276\275\263zN\n\20^\35\200\3.\5\n\35\300o\364\373\305\356\35\24{\302\200\242\346\22\241\256\326"..., 2812 <unfinished ...>
[pid 32221] write(91, "\347\2\22\36\334\222\2525g\215\342\352A\324\246\340\7\265(\205n\362\215\34az\363>\37Up\213"..., 1789 <unfinished ...>
[pid 32222] <... write resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 32221] <... write resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 32222] write(94, "\276\275\263zN\n\20^\35\200\3.\5\n\35\300o\364\373\305\356\35\24{\302\200\242\346\22\241\256\326"..., 2812 <unfinished ...>
[pid 32221] write(91, "\347\2\22\36\334\222\2525g\215\342\352A\324\246\340\7\265(\205n\362\215\34az\363>\37Up\213"..., 1789 <unfinished ...>
[pid 32222] <... write resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 32221] <... write resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
...

Possible patch:
Add polling for writing with the timeout before the next ssl socket write attempt in case of EAGAIN
--- tomcat-native-1.1.22-src/jni/native/src/sslnetwork.c
+++ tomcat-native-1.1.22-src/jni/native/src/sslnetwork.c
@@ -471,7 +471,7 @@
                 break;
                 case SSL_ERROR_WANT_READ:
                 case SSL_ERROR_WANT_WRITE:
-                    if (nb && i == SSL_ERROR_WANT_WRITE) {
+                    if (nb && i == SSL_ERROR_WANT_WRITE && !APR_STATUS_IS_EAGAIN(os) ) {
                         *len = 0;
                         return APR_SUCCESS;
                     }
Comment 1 Mladen Turk 2012-11-19 05:35:30 UTC
*** Bug 54064 has been marked as a duplicate of this bug. ***
Comment 2 Mladen Turk 2012-11-19 05:39:23 UTC
Fixed in the SVN (1.1.x branch)
Will be part of 1.1.25 release.

Note that I didn't use your patch. The entire SSL send loop was a bit bogus, so I did some serious cleanup.

However your test case helped a lot!
With 1.1.24 I see 98%CPU, with 1.1.25-dev it never goes beyond 62.5% on localhost for 80MB file.