Bug 48584 - Error or ACCESS_VIOLATION on shutdown
Error or ACCESS_VIOLATION on shutdown
Status: RESOLVED FIXED
Product: Tomcat Native
Classification: Unclassified
Component: Library
1.1.20
PC All
: P2 minor (vote)
: ---
Assigned To: Tomcat Developers Mailing List
:
: 48596 (view as bug list)
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2010-01-20 14:06 UTC by Konstantin Kolinko
Modified: 2010-09-15 10:22 UTC (History)
1 user (show)



Attachments
hs_err_pid5364.log (11.27 KB, text/plain)
2010-01-20 14:14 UTC, Konstantin Kolinko
Details
cycle.bat, Calls catalina start, catalina stop in a loop (117 bytes, text/plain)
2010-01-29 05:13 UTC, Konstantin Kolinko
Details
AprLifecycleListener_messages.patch, adds diagnostic messages to AprLifecycleListener#terminateAPR() (763 bytes, text/plain)
2010-01-29 05:16 UTC, Konstantin Kolinko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Konstantin Kolinko 2010-01-20 14:06:01 UTC
Microsoft Windows XP Professional, 5.1.2600 Service Pack 3 Build 2600
Localized (Russian)

Steps to reproduce:
1. install 6.0.24, bundled with TC-Native 1.1.19
2. define JRE_HOME variable. I used JRE 6u17.
3. goto %CATALINA_HOME%/bin and run  startup.bat
4. wait a while and run  shutdown.bat
5. one of the following behaviours is observed:
a) a message is written to catalina.<date>.log after all other shutdown sequence messages:

20.01.2010 5:37:28 org.apache.coyote.http11.Http11AprProtocol pause
INFO: Pausing Coyote HTTP/1.1 on http-8080
20.01.2010 5:37:28 org.apache.coyote.ajp.AjpAprProtocol pause
INFO: Pausing Coyote AJP/1.3 on ajp-8009
20.01.2010 5:37:29 org.apache.catalina.core.StandardService stop
INFO: Stopping service Catalina
20.01.2010 5:37:29 org.apache.coyote.http11.Http11AprProtocol destroy
INFO: Stopping Coyote HTTP/1.1 on http-8080
20.01.2010 5:37:29 org.apache.coyote.ajp.AjpAprProtocol destroy
INFO: Stopping Coyote AJP/1.3 on ajp-8009
20.01.2010 5:37:29 org.apache.tomcat.util.net.AprEndpoint$Acceptor run
SEVERE: Socket accept failed
org.apache.tomcat.jni.Error: ???????? ???????????? ???????? ??????? WSACancelBlockingCall.  
	at org.apache.tomcat.jni.Socket.accept(Native Method)
	at org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:1156)
	at java.lang.Thread.run(Unknown Source)

Inability to display the localized message should be another issue. Disregard it for now.


b) no message is printed, but a  hs_err_pid<number>.log file is created in %CATALINA_HOME%/bin.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x100045c7, pid=4396, tid=480
#
# JRE version: 6.0_17-b04
# Java VM: Java HotSpot(TM) Client VM (14.3-b01 mixed mode windows-x86 )
# Problematic frame:
# C  [tcnative-1.dll+0x45c7]
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  T H R E A D  ---------------

Current thread (0x0b153c00):  JavaThread "ajp-8009-Acceptor-0" daemon [_thread_in_native, id=480, stack(0x0c780000,0x0c7d0000)]

siginfo: ExceptionCode=0xc0000005, reading address 0x0b549488

Registers:
EAX=0x0c156298, EBX=0x0b549480, ECX=0x0c7cfdd8, EDX=0x00000004
ESP=0x0c7cfc48, EBP=0x0c7cfc58, ESI=0x0c1562d8, EDI=0x0b153c00
EIP=0x100045c7, EFLAGS=0x00010202

Top of Stack: (sp=0x0c7cfc48)
0x0c7cfc48:   0709a7c8 0709a7d0 00000000 0c156298
0x0c7cfc58:   0c7cfc9c 00a09e37 0b153d10 0c7cfca4
0x0c7cfc68:   0b549480 00000000 002b0000 0c7cfa6c
0x0c7cfc78:   7c911028 0c7cfc7c 00000000 0c7cfcb0
0x0c7cfc88:   0709b7c0 00000000 0709a7d0 00000000
0x0c7cfc98:   0c7cfcac 0c7cfcd4 00a02e83 0709b760
0x0c7cfca8:   00a08269 0b549480 00000000 0c7cfcb4
0x0c7cfcb8:   072a2148 0c7cfce4 072a23f0 00000000 

Instructions: (pc=0x100045c7)
0x100045b7:   52 e8 93 d1 ff ff 83 c4 08 8b 45 fc 85 c0 74 0d
0x100045c7:   8b 4b 08 85 c9 74 06 50 e8 cc af 00 00 33 c0 5e 


Stack: [0x0c780000,0x0c7d0000],  sp=0x0c7cfc48,  free space=319k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [tcnative-1.dll+0x45c7]
j  org.apache.tomcat.jni.Socket.accept(J)J+0
j  org.apache.tomcat.util.net.AprEndpoint$Acceptor.run()V+40
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
V  [jvm.dll+0xecf9c]
V  [jvm.dll+0x1741d1]
V  [jvm.dll+0xed167]
V  [jvm.dll+0xed1dd]
V  [jvm.dll+0x116290]
V  [jvm.dll+0x1d0414]
V  [jvm.dll+0x173e4c]
C  [msvcr71.dll+0x9565]
C  [kernel32.dll+0xb729]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.apache.tomcat.jni.Socket.accept(J)J+0
j  org.apache.tomcat.util.net.AprEndpoint$Acceptor.run()V+40
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub

(I am skipping the rest of the file...)
Comment 1 Konstantin Kolinko 2010-01-20 14:14:10 UTC
Created attachment 24870 [details]
hs_err_pid5364.log
Comment 2 Konstantin Kolinko 2010-01-21 19:25:54 UTC
*** Bug 48596 has been marked as a duplicate of this bug. ***
Comment 3 Konstantin Kolinko 2010-01-21 20:24:06 UTC
Re Bug 48596:
I tried version 1.1.18 of windows 32-bit dll, but it also crashes in exactly the same fashion.
Comment 4 Mark Eggers 2010-01-22 02:29:57 UTC
(In reply to comment #1)
> Created an attachment (id=24870) [details]
> hs_err_pid5364.log

Microsoft Windows XP Professional 5.01.2600 Service Pack 3 US English

java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) Client VM (build 16.0-b13, mixed mode, sharing)

I never see an hs_err_pid<number>.log file in $CATALINA_HOME\bin, but otherwise the behavior when running Tomcat 6.0.14 from catalina.bat is the same.

However when I run Tomcat as a service, I do not see this error. I set the logging to debug, and checked $CATALINA\logs and the event viewer. The service starts and stops without an error.
Comment 5 Rainer Jung 2010-01-24 01:41:27 UTC
I get the same result as Mark Eggers, i.e. the Exception in the log but no hotspot error file.

Microsoft Windows XP Professional
Version	5.1.2600 Service Pack 3 Build 2600
German

JRE 1.6.0_18

Interestingly in my case there is a localized message:

24.01.2010 10:38:45 org.apache.tomcat.util.net.AprEndpoint$Acceptor run
SCHWERWIEGEND: Socket accept failed
org.apache.tomcat.jni.Error: Ein Blockierungsvorgang wurde durch einen Aufruf von WSACancelBlockingCall unterbrochen.
        at org.apache.tomcat.jni.Socket.accept(Native Method)
        at org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:1156)
        at java.lang.Thread.run(Unknown Source)

So maybe it has to do with the Russian localization?

I also tried once with the security manager, same behavior.
Comment 6 Konstantin Kolinko 2010-01-29 05:13:15 UTC
Created attachment 24903 [details]
cycle.bat, Calls catalina start, catalina stop in a loop
Comment 7 Konstantin Kolinko 2010-01-29 05:16:41 UTC
Created attachment 24904 [details]
AprLifecycleListener_messages.patch, adds diagnostic messages to AprLifecycleListener#terminateAPR()
Comment 8 Konstantin Kolinko 2010-01-29 06:02:09 UTC
1. When the crash happens, no event is recorded in the Event Viewer. I think that the crash is intercepted by JVM (that writes that crash report file) so the system does not know about it. The crash report file is written to the current working directory of java process. In my case it was %CATALINA_HOME%/bin, but it can be elsewhere. Though it might be, indeed, that 6.0.14 does not experience the crash. Thank you for the report.


2. Attachment 24903 [details] - a batch file, that calls catalina start/stop in a cycle with some small delays.

Even if I cannot always reproduce the error with several tries, I can reproduce it with the batch.


3. The message, that is printed by org.apache.tomcat.jni.Error, in English will be
"A blocking operation was interrupted by a call to WSACancelBlockingCall"

It is the message for the WSAEINTR socket error code, and means that some function call was interrupted, see

http://support.microsoft.com/kb/819124

After playing a bit with the ".encoding" option of a FileHandler (in logging.properties) I was able to read it, but, indeed, TC-Native (or APR functions that it calls) does not respect system encoding. The message was processed with direct byte->char conversion, as if were in ISO-8859-1. By setting "1catalina.org.apache.juli.FileHandler.encoding = ISO-8859-1" and reading the log file with windows-1251 I was able to read it. It is wrong, but it is another issue.

So, it looks that this error message is expected. What is wrong is that it generates a crash.


4. I tested with TC 7(trunk), where I added diagnostic messages around the AprLifecycleListener.terminateAPR() method - see attachment 24904 [details].

In the logs I see four variations of how the events happen:

a)
29.01.2010 6:26:16 org.apache.coyote.ajp.AjpAprProtocol destroy
INFO: Stopping Coyote AJP/1.3 on ajp-8009
29.01.2010 6:26:16 org.apache.catalina.core.AprLifecycleListener terminateAPR
INFO: Terminating APR
29.01.2010 6:26:16 org.apache.tomcat.util.net.AprEndpoint$Acceptor run
SEVERE: Socket accept failed
org.apache.tomcat.jni.Error: ???????? ???????????? ???????? ??????? WSACancelBlockingCall.  
	at org.apache.tomcat.jni.Socket.accept(Native Method)
	at org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:777)
	at java.lang.Thread.run(Unknown Source)
29.01.2010 6:26:16 org.apache.catalina.core.AprLifecycleListener terminateAPR
INFO: Terminated APR

b)
29.01.2010 6:27:16 org.apache.coyote.ajp.AjpAprProtocol destroy
INFO: Stopping Coyote AJP/1.3 on ajp-8009
29.01.2010 6:27:17 org.apache.tomcat.util.net.AprEndpoint$Acceptor run
SEVERE: Socket accept failed
org.apache.tomcat.jni.Error: ???????? ???????????? ???????? ??????? WSACancelBlockingCall.  
	at org.apache.tomcat.jni.Socket.accept(Native Method)
	at org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:777)
	at java.lang.Thread.run(Unknown Source)
29.01.2010 6:27:17 org.apache.catalina.core.AprLifecycleListener terminateAPR
INFO: Terminating APR
29.01.2010 6:27:17 org.apache.catalina.core.AprLifecycleListener terminateAPR
INFO: Terminated APR

c)
29.01.2010 6:34:19 org.apache.coyote.ajp.AjpAprProtocol destroy
INFO: Stopping Coyote AJP/1.3 on ajp-8009
29.01.2010 6:34:19 org.apache.catalina.core.AprLifecycleListener terminateAPR
INFO: Terminating APR
29.01.2010 6:34:19 org.apache.catalina.core.AprLifecycleListener terminateAPR
INFO: Terminated APR

d)
29.01.2010 6:40:45 org.apache.coyote.ajp.AjpAprProtocol destroy
INFO: Stopping Coyote AJP/1.3 on ajp-8009
29.01.2010 6:40:45 org.apache.catalina.core.AprLifecycleListener terminateAPR
INFO: Terminating APR
29.01.2010 6:40:45 org.apache.catalina.core.AprLifecycleListener terminateAPR
INFO: Terminated APR
29.01.2010 6:40:45 org.apache.tomcat.util.net.AprEndpoint$Acceptor run
SEVERE: Socket accept failed
org.apache.tomcat.jni.Error: ???????? ???????????? ???????? ??????? WSACancelBlockingCall.  
	at org.apache.tomcat.jni.Socket.accept(Native Method)
	at org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:777)
	at java.lang.Thread.run(Unknown Source)


When c) happens, a crash report file is generated.

The AprLifecycleListener.terminateAPR() method calls o.a.t.jni.Library#terminate() that is implemented as a method in jnilib.c.

TCN_IMPLEMENT_CALL(void, Library, terminate)(TCN_STDARGS)
calls
        apr_pool_destroy(p);
        apr_terminate();

so if return from
TCN_IMPLEMENT_CALL(jlong, Socket, accept)(TCN_STDARGS, jlong sock)
of network.c happens after that call ends, as in d), it will face a destroyed pool and terminated APR. Thus it is no wonder that EXCEPTION_ACCESS_VIOLATION is reported.

5. So, the question is why 
org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:1156)
was still running when AprLifecycleListener.terminateAPR() was already called.
Comment 9 Mladen Turk 2010-02-07 23:42:23 UTC
Fixed in the SVN with r907567

I'm leaving the issue open until release so we can have a confirmation
Comment 10 Konstantin Kolinko 2010-02-08 04:46:37 UTC
I still wonder, what is causing this at the Java side:

(see 5. in Comment 8): what is closing those sockets and whether it is possible to wait for termination of those Acceptor threads before going on and calling Library.terminate()?


Due to asynchronous nature of this issue, there is still a time frame in r907567 between if(tcn_global_pool) check and apr_pool_destroy(p) call when apr termination can occur.  Though that is probably negligible. It is better to go with r907567 than without.

The TCN_THROW_IF_ERR can still be called with terminated APR, but that is probably safe -- I think that apr_strerror() call (inside tcn_ThrowAPRException() that is called from TCN_THROW_IF_ERR macro) should still work even after APR is terminated.
Comment 11 Mladen Turk 2010-02-08 04:55:03 UTC
It is caused by calling Library.terminate before all sockets are closed,
Checking for tcn_global_pool is safe because it is set to NULL before
it is actually destroyed.
Comment 12 Konstantin Kolinko 2010-03-31 20:09:19 UTC
The native side of this issue was fixed in r907567 and is included in TC-Native 1.1.20 onwards.