Issue 116126 - OnLogRotateThread::run crash during shutdown
Summary: OnLogRotateThread::run crash during shutdown
Status: CLOSED FIXED
Alias: None
Product: General
Classification: Code
Component: code (show other issues)
Version: DEV300m95
Hardware: All All
: P3 Trivial (vote)
Target Milestone: 3.4.0
Assignee: Stephan Bergmann
QA Contact: issues@framework
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-14 15:46 UTC by Stephan Bergmann
Modified: 2017-05-20 10:22 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Stephan Bergmann 2010-12-14 15:46:34 UTC
On DEV300_m95 based CWS sb135, unxlngx6.pro, subsequenttests once crashed with

[...]
24: LOG>     disposing xMathDoc 
24: ***** State for sm.XMLSettingsImporter ******
24: Whole component: PASSED.OK
24: *********************************************
24: 
24: Failures that appeared during scenario execution:
24: 0 of 9 tests failed
24: Job run took: 28235ms  [00:00:28]
24: Job -sce sm.sce done
24: soffice.bin: tpp.c:63: __pthread_tpp_change_priority: Assertion `new_prio ==
-1 || (new_prio >= __sched_fifo_min_prio && new_prio <= __sched_fifo_max_prio)'
failed.
24: E
24: Time: 36.184
24: There was 1 failure:
24: 1) test(org.openoffice.starmath.qa.unoapi.Test)
24: java.lang.AssertionError: expected:<0> but was:<134>
24: 	at org.junit.Assert.fail(Assert.java:91)
24: 	at org.junit.Assert.failNotEquals(Assert.java:645)
24: 	at org.junit.Assert.assertEquals(Assert.java:126)
24: 	at org.junit.Assert.assertEquals(Assert.java:470)
24: 	at org.junit.Assert.assertEquals(Assert.java:454)
24: 	at org.openoffice.test.OfficeConnection.tearDown(OfficeConnection.java:133)
24: 	at org.openoffice.starmath.qa.unoapi.Test.tearDown(Test.java:40)
24: 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
24: 	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
24: 	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
24: 	at java.lang.reflect.Method.invoke(Method.java:585)
24: 	at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
24: 	at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
24: 	at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
24: 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
24: 	at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
24: 	at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
24: 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
24: 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
24: 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
24: 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
24: 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
24: 	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
24: 	at org.junit.runners.Suite.runChild(Suite.java:128)
24: 	at org.junit.runners.Suite.runChild(Suite.java:24)
24: 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
24: 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
24: 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
24: 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
24: 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
24: 	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
24: 	at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
24: 	at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
24: 	at org.junit.runner.JUnitCore.run(JUnitCore.java:117)
24: 	at org.junit.runner.JUnitCore.runMain(JUnitCore.java:98)
24: 	at org.junit.runner.JUnitCore.runMainAndExit(JUnitCore.java:53)
24: 	at org.junit.runner.JUnitCore.main(JUnitCore.java:45)
24: 
24: FAILURES!!!
24: Tests run: 1,  Failures: 1
24: 
24: dmake:  Error code 1, while making 'javatest'

at

(gdb) where
#0  0x00007f63f310ca75 in *__GI_raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f63f31105c0 in *__GI_abort () at abort.c:92
#2  0x00007f63f3105941 in *__GI___assert_fail (assertion=0x7f63f2cc9f90
"new_prio == -1 || (new_prio >= __sched_fifo_min_prio && new_prio <=
__sched_fifo_max_prio)", file=<value optimized out>, line=63,
function=0x7f63f2cca060 "__pthread_tpp_change_priority") at assert.c:81
#3  0x00007f63f2cc878e in __pthread_tpp_change_priority (previous_prio=-1,
new_prio=7304) at tpp.c:61
#4  0x00007f63f2cc0f6e in __pthread_mutex_lock_full (mutex=0x7f63deb540f0) at
pthread_mutex_lock.c:415
#5  0x00007f63f43f6700 in osl_acquireMutex () from
/net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/ure-link/lib/libuno_sal.so.3
#6  0x00007f63de7bfc4c in (anonymous namespace)::OnLogRotateThread::run() ()
from
/net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/liboooimprovementlx.so
#7  0x00007f63de7bfe9a in threadFunc () from
/net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/liboooimprovementlx.so
#8  0x00007f63f43f7458 in osl_thread_start_Impl () from
/net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/ure-link/lib/libuno_sal.so.3
#9  0x00007f63f2cbe9ca in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#10 0x00007f63f31bf70d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()

while the main thread was already running exit handlers at

(gdb) where
#0  0x00007f63f440c2f3 in rtl_ustr_hashCode_WithLength () from
/net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/ure-link/lib/libuno_sal.so.3
#1  0x00007f63f2443958 in __gnu_cxx::hashtable<std::pair<unsigned short const*
const, _typelib_TypeDescriptionReference*>, unsigned short const*, hashStr_Impl,
std::_Select1st<std::pair<unsigned short const* const,
_typelib_TypeDescriptionReference*> >, equalStr_Impl,
std::allocator<_typelib_TypeDescriptionReference*>
>::erase(__gnu_cxx::_Hashtable_iterator<std::pair<unsigned short const* const,
_typelib_TypeDescriptionReference*>, unsigned short const*, hashStr_Impl,
std::_Select1st<std::pair<unsigned short const* const,
_typelib_TypeDescriptionReference*> >, equalStr_Impl,
std::allocator<_typelib_TypeDescriptionReference*> > const&) () from
/net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/../ure-link/lib/libuno_cppu.so.3
#2  0x00007f63f243e2e3 in typelib_typedescriptionreference_release () from
/net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/../ure-link/lib/libuno_cppu.so.3
#3  0x00007f63f243f2b3 in TypeDescriptor_Init_Impl::~TypeDescriptor_Init_Impl()
() from
/net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/../ure-link/lib/libuno_cppu.so.3
#4  0x00007f63f243f360 in __tcf_0 () from
/net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/../ure-link/lib/libuno_cppu.so.3
#5  0x00007f63f3112262 in __run_exit_handlers (status=0) at exit.c:78
#6  *__GI_exit (status=0) at exit.c:100
#7  0x00007f63f30f7c54 in __libc_start_main (main=<value optimized out>,
argc=<value optimized out>, ubp_av=<value optimized out>, init=<value optimized
out>, fini=<value optimized out>, rtld_fini=<value optimized out>,
stack_end=0x7fffc2375248) at libc-start.c:258
#8  0x0000000000400f39 in _start ()
Comment 1 bjoern.michaelsen 2010-12-14 22:07:51 UTC
Sweet irony: It looks like the mutex that is protecting the service factory from
being used while being destroyed and vice versa at:
http://svn.services.openoffice.org/opengrok/xref/Current%20%28trunk%29/extensions/source/oooimprovement/onlogrotate_job.cxx#130
is itself already unusable at this point in the shutdown.
Proposed fix: add another
  if(m_ServiceFactory.is())
before trying to setup the guard for the lock. There still will be a race
condition between this check and having the lock, but how should that be avoided
when locks and mutexes are already unavailable at this point?

Comment 2 Stephan Bergmann 2010-12-15 08:14:14 UTC
@b_michaelsen:  Without actually looking at the code, I would say there should
be no problems and races if whichever entity that caused the OnLogRotateThread
to spawn would also make sure to join it during application shutdown (well
before exit).
Comment 3 bjoern.michaelsen 2010-12-15 09:42:38 UTC
@sb: Yes. I figured that out too yesterday, but did not want to reply to myself.
The problem seems to be that the TerminateListener:
- calls disposing, which is mutex-protected (good) and then calls terminate
without mutexprotection (bad)
- it does not call join after calling terminate
- in addition, it seems Windows might need special care as per:
http://svn.services.openoffice.org/opengrok/xref/Current%20%28trunk%29/sal/qa/osl/process/osl_Thread.cxx#termAndJoinThread
Comment 4 bjoern.michaelsen 2010-12-17 15:15:26 UTC
started in cws oooimprovement7
Comment 5 bjoern.michaelsen 2010-12-21 12:59:44 UTC
setting target 3.4, fixed with changeset e78625a0bf41 in cws oooimprovement7
Comment 6 bjoern.michaelsen 2010-12-21 13:02:28 UTC
@sb: please verify
Comment 7 Stephan Bergmann 2010-12-21 13:26:13 UTC
verified by code review