Apache OpenOffice (AOO) Bugzilla – Issue 93051
deadlock at office start
Last modified: 2017-05-20 10:47:55 UTC
> > main thread: > > > ntdll.dll!KiFastSystemCallRet() [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll] ntdll.dll!NtWaitForSingleObject() + 0xc bytes ntdll.dll!RtlEnterCriticalSection() + 0x46 bytes configmgr2.uno.dll!configmgr::UnoApiLock::acquire() Line 46 + 0xb bytes C++ > configmgr2.uno.dll!configmgr::OConfigurationProvider::queryInterface(const com::sun::star::uno::Type & rType=) Line 521 + 0x5 bytes C++ > updatefeed.uno.dll!component_getFactory() + 0xb0d bytes updatefeed.uno.dll!component_getFactory() + 0x15ac bytes updatefeed.uno.dll!component_getFactory() + 0x30df bytes updatefeed.uno.dll!component_getFactory() + 0x4893 bytes updatefeed.uno.dll!component_getFactory() + 0x5097 bytes updatefeed.uno.dll!component_getFactory() + 0x5779 bytes cppuhelper3MSC.dll!cppu::createSingleComponentFactory() + 0xb11 bytes cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0xb8d bytes cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0x145a bytes cppuhelper3MSC.dll!cppu::createSingleComponentFactory() + 0x19e1 bytes cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0xb8d bytes cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0x145a bytes bootstrap.uno.dll!component_getFactory() + 0x220e5 bytes deploymentmi.uno.dll!GetVersionInfo() + 0x2cc3 bytes deploymentmi.uno.dll!GetVersionInfo() + 0x2e15 bytes deploymentmi.uno.dll!GetVersionInfo() + 0x3403 bytes deploymentmi.uno.dll!GetVersionInfo() + 0x3ac2 bytes deploymentmi.uno.dll!GetVersionInfo() + 0x4396 bytes deploymentmi.uno.dll!GetVersionInfo() + 0x43df bytes comphelp4MSC.dll!comphelper::service_decl::ServiceDecl::getFactory() + 0xa5 bytes comphelp4MSC.dll!comphelper::service_decl::ServiceDecl::getFactory() + 0x157 bytes cppuhelper3MSC.dll!cppu::createSingleComponentFactory() + 0x19e1 bytes cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0xb8d bytes cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0x145a bytes bootstrap.uno.dll!component_getFactory() + 0x220e5 bytes msci_uno.dll!61861cab() msci_uno.dll!61861fbf() msci_uno.dll!618623da() purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x5a9 bytes cppu3.dll!uno_getCurrentEnvironment() + 0x2eb bytes cppu3.dll!uno_EnvDcp_getPurpose() + 0x67d bytes unsafe_uno_uno.dll!GetVersionInfo() + 0x28a bytes unsafe_uno_uno.dll!GetVersionInfo() + 0x1ad bytes purpenvhelper3MSC.dll!GetVersionInfo() + 0x4ea bytes purpenvhelper3MSC.dll!GetVersionInfo() + 0xce bytes cppu3.dll!uno_EnvDcp_getPurpose() + 0x625 bytes cppu3.dll!uno_EnvDcp_getPurpose() + 0x649 bytes cppu3.dll!uno_getCurrentEnvironment() + 0x30d bytes cppu3.dll!uno_Environment_invoke_v() + 0x17 bytes cppu3.dll!uno_Environment_invoke() + 0x19 bytes purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x779 bytes purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x941 bytes msci_uno.dll!61861472() sal3.dll!osl_acquireMutex(_oslMutexImpl * Mutex=0x014af188) Line 146 C > sal3.dll!osl_releaseMutex(_oslMutexImpl * Mutex=0x00000000) Line 194 + 0x3 bytes C > ffffffff() > configmgr-writer-service: > > > ntdll.dll!KiFastSystemCallRet() [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll] ntdll.dll!NtWaitForSingleObject() + 0xc bytes ntdll.dll!RtlEnterCriticalSection() + 0x46 bytes unsafe_uno_uno.dll!GetVersionInfo() + 0x2af bytes unsafe_uno_uno.dll!GetVersionInfo() + 0x19a bytes purpenvhelper3MSC.dll!GetVersionInfo() + 0x4d3 bytes purpenvhelper3MSC.dll!GetVersionInfo() + 0xbb bytes cppu3.dll!uno_EnvDcp_getPurpose() + 0x24a bytes cppu3.dll!uno_EnvDcp_getPurpose() + 0x5ea bytes cppu3.dll!uno_EnvDcp_getPurpose() + 0x60e bytes cppu3.dll!uno_getCurrentEnvironment() + 0x2d5 bytes cppu3.dll!uno_Environment_invoke_v() + 0x17 bytes cppu3.dll!uno_Environment_invoke() + 0x19 bytes purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x779 bytes purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x941 bytes msci_uno.dll!61861472() sal3.dll!osl_acquireMutex(_oslMutexImpl * Mutex=0x0423f7c0) Line 146 C > sal3.dll!osl_releaseMutex(_oslMutexImpl * Mutex=0x027c7d04) Line 194 + 0x3 bytes C > configmgr2.uno.dll!configmgr::xml::WriterService<com::sun::star::configuration::backend::XLayerHandler>::WriterService<com::sun::star::configuration::backend::XLayerHandler>(const com::sun::star::uno::Reference<com::sun::star::uno::XComponentContext> & _xContext=) Line 69 + 0x53 bytes C++ > 8b0c428d() > > systray worker thread: > > > ntdll.dll!KiFastSystemCallRet() [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll] ntdll.dll!NtWaitForSingleObject() + 0xc bytes ntdll.dll!RtlEnterCriticalSection() + 0x46 bytes vos3MSC.dll!vos::OMutex::acquire() + 0xe bytes vclmi.dll!SalYieldMutex::acquire() Line 119 C++ > sfxmi.dll!ShutdownIcon::GetResString(int id=652) Line 326 + 0x21 bytes C++ > sfxmi.dll!createSystrayMenu() Line 273 + 0x14 bytes C++ > sfxmi.dll!listenerWndProc(HWND__ * hWnd=0x003b01be, unsigned int uMsg=1, unsigned int wParam=0, long lParam=380238216) Line 382 + 0x5 bytes C++ > user32.dll!GetDC() + 0x6d bytes user32.dll!EnumDisplayMonitors() + 0xf8 bytes user32.dll!DefWindowProcW() + 0x184 bytes user32.dll!CallNextHookEx() + 0x1a3 bytes ntdll.dll!KiUserCallbackDispatcher() + 0x13 bytes user32.dll!CreateWindowExW() + 0x2a7 bytes user32.dll!CreateWindowExA() + 0x33 bytes sfxmi.dll!SystrayThread(void * __formal=0x00000000) Line 586 + 0x37 bytes C++ > kernel32.dll!GetModuleFileNameA() + 0x1b4 bytes
I could not reproduce the above stacks on mh's machine. What I *could* reproduce there is the following: If swriter, scalc, ... are started and during first start wizard personal data are transferred from mh's OOo 2 installation, then no document window appears after splash screen is gone, and soffice.bin idles in Application::Yield. If, on the other hand, either soffice is started (instead of swriter, scalc, ...) or during first start wizard personal data is not transferred, then everything is OK.
The problem described as <#desc2> (which for me was 100% reproducible on mh's machine, and I hence assume that it was *that* problem that gave mh the experience that his soffice regularly deadlocks at startup, and in turn was the motivation for target OOo 3.0) is reportedly (cd) now covered by a separate issue (for which I have no issue number available). I set *this* issue (alleged configmgr/updatefeed/unsafe_uno_uno deadlock) to target OOo 3.x.
The deadlock is as follows: The "main thread" obviously from ComponentContext::lookupMap (cppuhelper/source/component_context.cxx:1.33), with the UnsafeBridge::m_mutex (cppu/source/UnsafeBridge/UnsafeBridge.cxx) locked, calls createInstanceWithContext to instantiate the com.sun.star.deployment.PackageInformationProvider singleton, which in turn calls OConfigurationProvider::queryInterface (configmgr/source/api2/confprovider2.cxx:1.33) and tries to lock the configmgr's UnoApiLock. The "configmgr-writer-service" thread, obviously with the configmgr's UnoApiLock locked, from the WriterService ctor (configmgr/soruce/xml/writersvc.cxx:1.11) calls ComponentContext::getServiceManager (cppuhelper/source/component_context.cxx:1.33), which blocks trying to acquire the UnsafeBridge::m_mutex. @jb: Do you have any recollection what the UnoApiLock is good for? It causes problems like this deadlock, by making configmgr code call external code (e.g., XComponentContext::getServiceManager, service manager's createInstance) with the configmgr's UnoApiLock locked.
When I left configmgr it was very much threading-capable. The UnoApiLock was obviously introduced by mmeeks in the configrefactor01 CWS. FWIW: I would not have accepted any attempt do introduce such a global lock for everything, completely eliminating thread awareness in configmgr, had I still been the maintainer. While the original implementation may have had overengineered concurrency support, a lock that is even shared between independent ConfigurationProvider instances runs completely counter to the design of the component. Taking a lock just for queryInterface is obviously broken, but I suppose getting rid of that won't help much if the calling thread also wants to invoke actual API functionality. Having the separate cache write-back thread perform its (complex) work under a global lock that excludes any API activity is also utterly broken, but it may take more analysis to determine whether it is safe to get rid of that lock (I also haven't found the place where that thread takes the global lock).
issue changed target => removed from 3.0 blocker issue
@mmeeks: please take over
> jb wrote: > When I left configmgr it was very much threading-capable. The UnoApiLock was > obviously introduced by mmeeks in the configrefactor01 CWS. nonsense - it was riddled with lock takes & releases, certainly - but any threaded code actively using the configmgr deadlocked or crashed within a few seconds: cf. my regression test. This is now no longer the case - and yes, the threading model is -far- simpler now. Of course - that means it can sometimes deadlock - but at least we get a clear view of the interactions between pieces of code; and the existing lock hierarchy problems around the place. The UnoApiLock is a simple code lock; we count take / releases on it - and we drop this lock as we emit callbacks & call out of the code [ at least in theory of course ]. All that should be necessary is to create an UnoApiLockReleaser over the service factory instantiation; and isolate the interaction carefully to make sure it's safe.
So - a little more analysis; I imagine that the deadlock has always been there, but perhaps masked by the previous, more granular locking: you would have had to be writing the same piece of the tree that an activating component was wanting to read from previously to trigger it; now any write has the effect. IMHO the solution is not more granular locking to re-conceal the problem however. Perhaps I'm mis-understanding something, is it the case that the UnsafeBridge mutex is part of the new Kay Ramme threading framework ? if so, is it the case that almost whenever an UNO method is invoked on some code, that this lock is taken ? eg. if the configmgr wants to use the ServiceManager to instantiate a new service - then this code is deemed not-thread-safe, and thus the one-big-mutex is taken ? If so - we need to put configmgr inside that same appartment; since clearly it uses components outside it's scope and visibility to do (eg.) I/O - or even simple things like activating a service ;-) Of course, we can still leave the UnoApiLock construct against a time when the subsystems that configmgr relies on are thread-safe: the service manager, the explicit external services we use are: com.sun.star.io.DataOutputStream com.sun.star.xml.sax.Writer com.sun.star.script.Converter com.sun.star.comp.stoc.OServiceManagerWrapper none of which looks like it should be too horrific. Presumably things like the UCB and sax code are still viewed as thread-safe & not in the global apartment ? Ultimately - it's a simple lock ordering problem as Stefan points out. Code we call assumes that this UnsafeBridge mutex can be taken, but we have the UnoApiLock; while other code assumes that the UnsafeBridge mutex is already held and that we can then take the UnoApiLock. Ultimately IMHO, the best fix is to make these locks truly independent by killing all takes of the UnsafeBridge mutex in the (fairly basic surely) code that configmgr itself uses. Kay ?
Classically, no Uno object is allowed to keep a lock while calling out ... means, that we have two bugs here, - the component context keeps its (unsafe env. provided) lock, as well as - the configmgr keeping its lock. If only one kept a lock, there obviously wouldn't a problem ;-) Putting the component context into the "unsafe" environment was a starting point to roll out the threading stuff into the office, I mistakenly thought, that the component context actually does not reach out for other objects. Quick fix is to just "disarm" it by removing the "unsafe" declaration. Plan basically is (as stated somewhere in the wiki), to move all but some office objects into an "unsafe" environment. The ones not to be moved just ensuring some basic parallelism e.g. for I/O or configuration access. I am somehow sure that there are other Uno object implementations keeping locks while calling out, thus I suggest to fix both, the component context as well as the config manager. -> Michael, do you see a chance to release the config manager lock while calling out?
Hi Kay - thanks for the input: > Classically, no Uno object is allowed to keep a lock while calling out ... > means, that we have two bugs here, Seems reasonable. On the other hand, presumably there are some parts of the system so basic & fundamental that presumably we don't want to have this requirement. One might assume calling out to this would be one of these ? > If only one kept a lock, there obviously wouldn't a problem ;-) Sure. > Quick fix is to just "disarm" it by removing the "unsafe" declaration. Sounds good, where is that decl ? > Plan basically is (as stated somewhere in the wiki), to move all but some > office objects into an "unsafe" environment. The ones not to be moved > just ensuring some basic parallelism e.g. for I/O or configuration access. Sure, I'm just concerned that we keep the core safe. > -> Michael, do you see a chance to release the config manager lock > while calling out? Well that bit is at least possible - just add "UnoApiLockReleaser aReleaser;" type instances around the place. The more tricky piece is verifying that that is 100% correct in all cases, and of course - we don't want to be releasing the lock each time we write bytes to a stream eg. ;-) - do we really have to drop it on -every- call out ? or can we be fairly happy that simple I/O writes to a file will not be blocking on configuration access ?
Reset assigne to the default "issues@openoffice.apache.org".