Apache OpenOffice (AOO) Bugzilla – Issue 116126
OnLogRotateThread::run crash during shutdown
Last modified: 2017-05-20 10:22:30 UTC
On DEV300_m95 based CWS sb135, unxlngx6.pro, subsequenttests once crashed with [...] 24: LOG> disposing xMathDoc 24: ***** State for sm.XMLSettingsImporter ****** 24: Whole component: PASSED.OK 24: ********************************************* 24: 24: Failures that appeared during scenario execution: 24: 0 of 9 tests failed 24: Job run took: 28235ms [00:00:28] 24: Job -sce sm.sce done 24: soffice.bin: tpp.c:63: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= __sched_fifo_min_prio && new_prio <= __sched_fifo_max_prio)' failed. 24: E 24: Time: 36.184 24: There was 1 failure: 24: 1) test(org.openoffice.starmath.qa.unoapi.Test) 24: java.lang.AssertionError: expected:<0> but was:<134> 24: at org.junit.Assert.fail(Assert.java:91) 24: at org.junit.Assert.failNotEquals(Assert.java:645) 24: at org.junit.Assert.assertEquals(Assert.java:126) 24: at org.junit.Assert.assertEquals(Assert.java:470) 24: at org.junit.Assert.assertEquals(Assert.java:454) 24: at org.openoffice.test.OfficeConnection.tearDown(OfficeConnection.java:133) 24: at org.openoffice.starmath.qa.unoapi.Test.tearDown(Test.java:40) 24: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 24: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 24: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 24: at java.lang.reflect.Method.invoke(Method.java:585) 24: at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) 24: at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) 24: at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) 24: at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) 24: at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) 24: at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) 24: at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) 24: at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) 24: at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) 24: at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) 24: at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) 24: at org.junit.runners.ParentRunner.run(ParentRunner.java:236) 24: at org.junit.runners.Suite.runChild(Suite.java:128) 24: at org.junit.runners.Suite.runChild(Suite.java:24) 24: at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) 24: at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) 24: at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) 24: at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) 24: at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) 24: at org.junit.runners.ParentRunner.run(ParentRunner.java:236) 24: at org.junit.runner.JUnitCore.run(JUnitCore.java:157) 24: at org.junit.runner.JUnitCore.run(JUnitCore.java:136) 24: at org.junit.runner.JUnitCore.run(JUnitCore.java:117) 24: at org.junit.runner.JUnitCore.runMain(JUnitCore.java:98) 24: at org.junit.runner.JUnitCore.runMainAndExit(JUnitCore.java:53) 24: at org.junit.runner.JUnitCore.main(JUnitCore.java:45) 24: 24: FAILURES!!! 24: Tests run: 1, Failures: 1 24: 24: dmake: Error code 1, while making 'javatest' at (gdb) where #0 0x00007f63f310ca75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f63f31105c0 in *__GI_abort () at abort.c:92 #2 0x00007f63f3105941 in *__GI___assert_fail (assertion=0x7f63f2cc9f90 "new_prio == -1 || (new_prio >= __sched_fifo_min_prio && new_prio <= __sched_fifo_max_prio)", file=<value optimized out>, line=63, function=0x7f63f2cca060 "__pthread_tpp_change_priority") at assert.c:81 #3 0x00007f63f2cc878e in __pthread_tpp_change_priority (previous_prio=-1, new_prio=7304) at tpp.c:61 #4 0x00007f63f2cc0f6e in __pthread_mutex_lock_full (mutex=0x7f63deb540f0) at pthread_mutex_lock.c:415 #5 0x00007f63f43f6700 in osl_acquireMutex () from /net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/ure-link/lib/libuno_sal.so.3 #6 0x00007f63de7bfc4c in (anonymous namespace)::OnLogRotateThread::run() () from /net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/liboooimprovementlx.so #7 0x00007f63de7bfe9a in threadFunc () from /net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/liboooimprovementlx.so #8 0x00007f63f43f7458 in osl_thread_start_Impl () from /net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/ure-link/lib/libuno_sal.so.3 #9 0x00007f63f2cbe9ca in start_thread (arg=<value optimized out>) at pthread_create.c:300 #10 0x00007f63f31bf70d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #11 0x0000000000000000 in ?? () while the main thread was already running exit handlers at (gdb) where #0 0x00007f63f440c2f3 in rtl_ustr_hashCode_WithLength () from /net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/ure-link/lib/libuno_sal.so.3 #1 0x00007f63f2443958 in __gnu_cxx::hashtable<std::pair<unsigned short const* const, _typelib_TypeDescriptionReference*>, unsigned short const*, hashStr_Impl, std::_Select1st<std::pair<unsigned short const* const, _typelib_TypeDescriptionReference*> >, equalStr_Impl, std::allocator<_typelib_TypeDescriptionReference*> >::erase(__gnu_cxx::_Hashtable_iterator<std::pair<unsigned short const* const, _typelib_TypeDescriptionReference*>, unsigned short const*, hashStr_Impl, std::_Select1st<std::pair<unsigned short const* const, _typelib_TypeDescriptionReference*> >, equalStr_Impl, std::allocator<_typelib_TypeDescriptionReference*> > const&) () from /net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/../ure-link/lib/libuno_cppu.so.3 #2 0x00007f63f243e2e3 in typelib_typedescriptionreference_release () from /net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/../ure-link/lib/libuno_cppu.so.3 #3 0x00007f63f243f2b3 in TypeDescriptor_Init_Impl::~TypeDescriptor_Init_Impl() () from /net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/../ure-link/lib/libuno_cppu.so.3 #4 0x00007f63f243f360 in __tcf_0 () from /net/so-cwsserv03/export/cws/sb135/DEV300/unxlngx6.pro/installation/opt/openoffice.org3/program/../basis-link/program/../ure-link/lib/libuno_cppu.so.3 #5 0x00007f63f3112262 in __run_exit_handlers (status=0) at exit.c:78 #6 *__GI_exit (status=0) at exit.c:100 #7 0x00007f63f30f7c54 in __libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fffc2375248) at libc-start.c:258 #8 0x0000000000400f39 in _start ()
Sweet irony: It looks like the mutex that is protecting the service factory from being used while being destroyed and vice versa at: http://svn.services.openoffice.org/opengrok/xref/Current%20%28trunk%29/extensions/source/oooimprovement/onlogrotate_job.cxx#130 is itself already unusable at this point in the shutdown. Proposed fix: add another if(m_ServiceFactory.is()) before trying to setup the guard for the lock. There still will be a race condition between this check and having the lock, but how should that be avoided when locks and mutexes are already unavailable at this point?
@b_michaelsen: Without actually looking at the code, I would say there should be no problems and races if whichever entity that caused the OnLogRotateThread to spawn would also make sure to join it during application shutdown (well before exit).
@sb: Yes. I figured that out too yesterday, but did not want to reply to myself. The problem seems to be that the TerminateListener: - calls disposing, which is mutex-protected (good) and then calls terminate without mutexprotection (bad) - it does not call join after calling terminate - in addition, it seems Windows might need special care as per: http://svn.services.openoffice.org/opengrok/xref/Current%20%28trunk%29/sal/qa/osl/process/osl_Thread.cxx#termAndJoinThread
started in cws oooimprovement7
setting target 3.4, fixed with changeset e78625a0bf41 in cws oooimprovement7
@sb: please verify
verified by code review