Summary: | Tomcat randomly crashes with [libtcnative-1.so.1+0x152ca] Java_org_apache_tomcat_jni_Socket_sendbb+0x5a | ||
---|---|---|---|
Product: | Tomcat Native | Reporter: | Mike Jakubik <mike.jakubik> |
Component: | Library | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | critical | CC: | tanaka.shunsuke |
Priority: | P2 | ||
Version: | 1.1.22 | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | FreeBSD | ||
Attachments: | Patch against tcnative-1.1.x |
Description
Mike Jakubik
2011-09-14 20:25:06 UTC
This appears to be a problem with the new jpeg encoder code in OpenJDKs javax.imageio. We have changed our code to render a PNG instead, and the issue appears to be solved. There was a recent discussion on the users list where someone discovered that one of the JRE imaging APIs was keeping a reference to an OutputStream and randonly flushing it long after the request had completed. http://markmail.org/thread/pg6tcml5rvuiifv2 This wasn't causing crashes, though. I wonder if this might be a related problem: the output stream is flushed, but the response has already been sent and the thread/buffer put back into the worker queue. When another flush comes, it's in an invalid state? Mike, would you be willing to try to reproduce the problem and then wrap an OutputStream around the response's OutputStream, and pass THAT to the ImageIO? See here for an example of the kind of code I'm talking about: https://wiki.apache.org/tomcat/FAQ/KnownIssues#ImageIOIssues Hi Christopher, Thanks for the info, but it appears that simply switching to render a png instead of a jpeg has fixed our problem. Unfortunately i am not a programmer myself, and i don't have the resources to try your code. Thanks. We have another bite: http://markmail.org/message/f5idaje4a4vg7vkd Updated versions of everything, but the symptom is the same: ImageIO bug + APR + sendbb() + si_addr=0x0000000000000040 I think this ought to do it (though I'm sure there are other places in tcnative that could afford to have this same kind of checking): Index: native/src/network.c =================================================================== --- native/src/network.c (revision 1484904) +++ native/src/network.c (working copy) @@ -587,6 +587,10 @@ tcn_ThrowAPRException(e, APR_ENOTSOCK); return -(jint)APR_ENOTSOCK; } + if(!s->net) { + tcn_ThrowAPRException(e, APR_EINVALSOCK); + return -(jint)APR_EINVALSOCK; + } TCN_ASSERT(s->opaque != NULL); TCN_ASSERT(s->jsbbuff != NULL); #ifdef TCN_DO_STATISTICS (Note the above example patch is against tcnative/1.1.x branch) Created attachment 30402 [details]
Patch against tcnative-1.1.x
I'm not sure if I should be checking s->net or s->sock. They both seem to be set to NULL in the various recycle routines so it probably doesn't matter either way.
I'm also not sure if APR_EINVALSOCK is the right error code to use.
(In reply to Christopher Schultz from comment #7) > Created attachment 30402 [details] > Patch against tcnative-1.1.x > > I'm not sure if I should be checking s->net or s->sock. They both seem to be > set to NULL in the various recycle routines so it probably doesn't matter > either way. I'd go with the net member and add the check to and function of name send* and recv*, which already have the null pointer check for s (or its jlong equivalent sock). > I'm also not sure if APR_EINVALSOCK is the right error code to use. * APR_EINVALSOCK APR was given an invalid socket * APR_ENOSOCKET APR was not given a socket So APR_EINVALSOCK sounds good. I haven't though checked the code that calls the recv* and send* methods for special error handling. Regards, Rainer (In reply to Rainer Jung from comment #8) > (In reply to Christopher Schultz from comment #7) > > Created attachment 30402 [details] > > Patch against tcnative-1.1.x > > > > I'm not sure if I should be checking s->net or s->sock. They both seem to be > > set to NULL in the various recycle routines so it probably doesn't matter > > either way. > > I'd go with the net member and add the check to and function of name send* > and recv*, which already have the null pointer check for s (or its jlong > equivalent sock). I think that's pretty much what I've done. Do you think I missed anything? > > I'm also not sure if APR_EINVALSOCK is the right error code to use. > > * APR_EINVALSOCK APR was given an invalid socket > * APR_ENOSOCKET APR was not given a socket > > So APR_EINVALSOCK sounds good. Okay, I'll commit as-is, then. Here's another occurance: [junit] Running org.apache.tomcat.websocket.TestWebSocketFrameClient [junit] Aug 02, 2013 2:52:50 PM org.apache.catalina.core.AprLifecycleListener init [junit] INFO: Loaded APR based Apache Tomcat Native library 1.1.27 using APR version 1.4.8. [junit] Aug 02, 2013 2:52:50 PM org.apache.catalina.core.AprLifecycleListener init [junit] INFO: APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true]. [junit] Aug 02, 2013 2:52:50 PM org.apache.catalina.core.AprLifecycleListener initializeSSL [junit] INFO: OpenSSL successfully initialized (OpenSSL 1.0.1esp1 12 Feb 2013) [junit] Aug 02, 2013 2:52:52 PM org.apache.coyote.AbstractProtocol init [junit] INFO: Initializing ProtocolHandler ["http-apr-127.0.0.1-auto-1"] [junit] Aug 02, 2013 2:52:52 PM org.apache.catalina.core.StandardService startInternal [junit] INFO: Starting service Tomcat [junit] Aug 02, 2013 2:52:52 PM org.apache.catalina.core.StandardEngine startInternal [junit] INFO: Starting Servlet Engine: Apache Tomcat/8.0.0-RC1 [junit] Aug 02, 2013 2:52:53 PM org.apache.coyote.AbstractProtocol start [junit] INFO: Starting ProtocolHandler ["http-apr-127.0.0.1-auto-1-59353"] [junit] Aug 02, 2013 2:53:53 PM org.apache.coyote.AbstractProtocol pause [junit] INFO: Pausing ProtocolHandler ["http-apr-127.0.0.1-auto-1-59353"] [junit] Aug 02, 2013 2:53:54 PM org.apache.catalina.core.StandardService stopInternal [junit] INFO: Stopping service Tomcat [junit] Aug 02, 2013 2:53:54 PM org.apache.coyote.AbstractProtocol stop [junit] INFO: Stopping ProtocolHandler ["http-apr-127.0.0.1-auto-1-59353"] [junit] # [junit] # A fatal error has been detected by the Java Runtime Environment: [junit] # [junit] # SIGSEGV (0xb) at pc=0xb76dfe74, pid=11403, tid=26 [junit] # [junit] # JRE version: 7.0_25-b15 [junit] # Java VM: Java HotSpot(TM) Server VM (23.25-b01 mixed mode solaris-sparc ) [junit] # Problematic frame: [junit] # C [libtcnative-1.so.0.1.27+0xfe74]Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 66.655 sec [junit] Java_org_apache_tomcat_jni_Socket_recv+0x34 [junit] # [junit] # Core dump written. Default location: /shared/build/dev/tomcat/incoming/tc8.0.x/8.0.0-RC1/tmp-8.0.0-RC1/apache-tomcat-8.0.0-RC1-src-gz/core or core.11403 [junit] # [junit] # An error report file with more information is saved as: [junit] # /shared/build/dev/tomcat/incoming/tc8.0.x/8.0.0-RC1/tmp-8.0.0-RC1/apache-tomcat-8.0.0-RC1-src-gz/hs_err_pid11403.log [junit] ------------- [junit] 0 (0xb7af8558) [b6|00| 24] [junit] [ 0x00000000] [junit] [ 0xb794be80] [junit] [ 0x0c800001] [junit] ------------- [junit] 1 (0xb7af8568) [00|b7| 25] [junit] [ 0xb7af6f78] [junit] [ 0x00000000] [junit] [ 0x80800001] [junit] ------------- b76dfe74 Java_org_apache_tomcat_jni_Socket_recv (30b528, b30ff188, b30fd09c, b29500, b30ff214, 0) + 34 fbc0f780 * org/apache/tomcat/jni/Socket.recv(J[BII)I+32362 fbc0f72c * org/apache/tomcat/jni/Socket.recv(J[BII)I+0 fbc068ec * org/apache/coyote/http11/upgrade/AprServletInputStream.doRead(Z[BII)I+73 (line 101) fbc068ec * org/apache/coyote/http11/upgrade/AbstractServletInputStream.read([BII)I+20 (line 229) fbc068ec * org/apache/tomcat/websocket/server/WsFrameServer.onDataAvailable()V+46 (line 89) fbc06b84 * org/apache/tomcat/websocket/server/WsHttpUpgradeHandler$WsReadListener.onDataAvailable()V+4 (line 384) fbc0771c * org/apache/coyote/http11/upgrade/AbstractServletInputStream.onDataAvailable()V+11 (line 337) fbc06b84 * org/apache/coyote/http11/upgrade/AbstractProcessor.upgradeDispatch(Lorg/apache/tomcat/util/net/SocketStatus;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState;+11 (line 189) fbc073e0 * org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(Lorg/apache/tomcat/util/net/SocketWrapper;Lorg/apache/tomcat/util/net/SocketStatus;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState;+171 (line 1221) fbc06848 * org/apache/coyote/http11/Http11AprProtocol$Http11ConnectionHandler.process(Lorg/apache/tomcat/util/net/SocketWrapper;Lorg/apache/tomcat/util/net/SocketStatus;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState;+76 (line 546) fbc073e0 * org/apache/tomcat/util/net/AprEndpoint$SocketProcessor.doRun()V+26 (line 4492) fbc06b84 * org/apache/tomcat/util/net/AprEndpoint$SocketProcessor.run()V+55 (line 4468) ... GDB tells my: #7 0xb76dfe74 in Java_org_apache_tomcat_jni_Socket_recv (e=0x30b528, o=0xb30ff188, sock=<optimized out>, buf=0xb30ff214, offset=0, toread=8192) at src/network.c:733 733 if ((ss = (*s->net->recv)(s->opaque, sb, &nbytes)) == APR_SUCCESS) #7 0xb76dfe74 in Java_org_apache_tomcat_jni_Socket_recv (e=0x30b528, o=0xb30ff188, sock=<optimized out>, buf=0xb30ff214, offset=0, toread=8192) at src/network.c:733 sb = "\000\006\026\200\000\fÞ̳\017Ñ\000·\205{Í·\205{س\017Ñ\234³\017Ñ\030³\017Ш·\227(p»\204H\200ò\032\226x\000\000\000\023\000\000\000\000ûÁ\227hò\032\227\020³\017Ñ8³\017Ñ8ûÀhH\000\000\b¹³\017Ñh³\017ÑHûÀhH³\017ÑP\000\000\002°·\211+Ø\000\000\000u\000\000\000\000ò\032\227 ò\032\226\210ò\032\226\210\000\000\000u\000\000\000\000³\017ÑhûÀhH»\204Hh¼Ð¬h\000\000\001\032\000\000\000\000³\017Ñ\230·\225x'·\225xH³\017Ò$³\017Ñ ³\017Ñ8·\225}ø\000\060´\000\000\000\000\000\000\000\000¶\000\000\000\000ûÁÄÀ"... s = 0xb29500 nbytes = 8192 ss = <optimized out> (gdb) print *s $5 = {pool = 0xb294c0, child = 0x0, sock = 0x0, opaque = 0xb297d8, jsbbuff = 0x330398 "HTTP/1.1 101 Switching Protocols\r\nServer: Apache-Coyote/1.1\r\nUpgrade: websocket\r\nConnection: upgrade\r\nSec-WebSocket-Accept: tPEj68f24a5073zco+NHsEUCKpY=\r\nDate: Fri, 02 Aug 2013 12:52:53 GMT\r\n\r\n", jrbbuff = 0xbfbc78 "GET /firehose HTTP/1.1\r\nSec-WebSocket-Version: 13\r\nUpgrade: websocket\r\nHost: localhost:59353\r\nSec-WebSocket-Key: qzlX6/aQxs4TAKyKCb+XLg==\r\nConnection: upgrade\r\n\r\n", net = 0x0, pe = 0xb87958, last_active = 0, timeout = -2} The reason for the crash is likely: (gdb) print s->net $7 = (tcn_nlayer_t *) 0x0 So dereferencing it in (*s->net->recv) gives a segfault. Fixed in 1.1.x in r1518225, will be available in tcnative-1.1.28. Note that I did not add further checking for s->net->send and/or s->net->recv. Please re-open if it appears that will be necessary. *** Bug 57521 has been marked as a duplicate of this bug. *** |