Issue 93051

Summary: deadlock at office start
Product: General Reporter: Martin Hollmichel <nesshof>
Component: codeAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P2 CC: carsten.driesner, dirk.voelzke, issues, joerg.barfurth, kay.ramme, nesshof, stephan.bergmann.secondary, tj, uwe.luebbers
Version: OOO300m3   
Target Milestone: ---   
Hardware: Unknown   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---

Description Martin Hollmichel 2008-08-23 18:09:13 UTC
>
> main thread:
>
> > ntdll.dll!KiFastSystemCallRet() [Frames below may be incorrect and/or
missing, no symbols loaded for ntdll.dll] ntdll.dll!NtWaitForSingleObject() +
0xc bytes ntdll.dll!RtlEnterCriticalSection() + 0x46 bytes
configmgr2.uno.dll!configmgr::UnoApiLock::acquire() Line 46 + 0xb bytes C++
> configmgr2.uno.dll!configmgr::OConfigurationProvider::queryInterface(const
com::sun::star::uno::Type & rType=) Line 521 + 0x5 bytes C++
> updatefeed.uno.dll!component_getFactory() + 0xb0d bytes
updatefeed.uno.dll!component_getFactory() + 0x15ac bytes
updatefeed.uno.dll!component_getFactory() + 0x30df bytes
updatefeed.uno.dll!component_getFactory() + 0x4893 bytes
updatefeed.uno.dll!component_getFactory() + 0x5097 bytes
updatefeed.uno.dll!component_getFactory() + 0x5779 bytes
cppuhelper3MSC.dll!cppu::createSingleComponentFactory() + 0xb11 bytes
cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0xb8d bytes
cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0x145a bytes
cppuhelper3MSC.dll!cppu::createSingleComponentFactory() + 0x19e1 bytes
cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0xb8d bytes
cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0x145a bytes
bootstrap.uno.dll!component_getFactory() + 0x220e5 bytes
deploymentmi.uno.dll!GetVersionInfo() + 0x2cc3 bytes
deploymentmi.uno.dll!GetVersionInfo() + 0x2e15 bytes
deploymentmi.uno.dll!GetVersionInfo() + 0x3403 bytes
deploymentmi.uno.dll!GetVersionInfo() + 0x3ac2 bytes
deploymentmi.uno.dll!GetVersionInfo() + 0x4396 bytes
deploymentmi.uno.dll!GetVersionInfo() + 0x43df bytes
comphelp4MSC.dll!comphelper::service_decl::ServiceDecl::getFactory() + 0xa5
bytes comphelp4MSC.dll!comphelper::service_decl::ServiceDecl::getFactory() +
0x157 bytes cppuhelper3MSC.dll!cppu::createSingleComponentFactory() + 0x19e1
bytes cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0xb8d bytes
cppuhelper3MSC.dll!cppu::createStandardClassWithSequence() + 0x145a bytes
bootstrap.uno.dll!component_getFactory() + 0x220e5 bytes msci_uno.dll!61861cab()
msci_uno.dll!61861fbf() msci_uno.dll!618623da()
purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x5a9 bytes
cppu3.dll!uno_getCurrentEnvironment() + 0x2eb bytes
cppu3.dll!uno_EnvDcp_getPurpose() + 0x67d bytes
unsafe_uno_uno.dll!GetVersionInfo() + 0x28a bytes
unsafe_uno_uno.dll!GetVersionInfo() + 0x1ad bytes
purpenvhelper3MSC.dll!GetVersionInfo() + 0x4ea bytes
purpenvhelper3MSC.dll!GetVersionInfo() + 0xce bytes
cppu3.dll!uno_EnvDcp_getPurpose() + 0x625 bytes
cppu3.dll!uno_EnvDcp_getPurpose() + 0x649 bytes
cppu3.dll!uno_getCurrentEnvironment() + 0x30d bytes
cppu3.dll!uno_Environment_invoke_v() + 0x17 bytes
cppu3.dll!uno_Environment_invoke() + 0x19 bytes
purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x779 bytes
purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x941 bytes
msci_uno.dll!61861472() sal3.dll!osl_acquireMutex(_oslMutexImpl *
Mutex=0x014af188) Line 146 C
> sal3.dll!osl_releaseMutex(_oslMutexImpl * Mutex=0x00000000) Line 194 + 0x3 bytes C
> ffffffff()
> configmgr-writer-service:
>
> > ntdll.dll!KiFastSystemCallRet() [Frames below may be incorrect and/or
missing, no symbols loaded for ntdll.dll] ntdll.dll!NtWaitForSingleObject() +
0xc bytes ntdll.dll!RtlEnterCriticalSection() + 0x46 bytes
unsafe_uno_uno.dll!GetVersionInfo() + 0x2af bytes
unsafe_uno_uno.dll!GetVersionInfo() + 0x19a bytes
purpenvhelper3MSC.dll!GetVersionInfo() + 0x4d3 bytes
purpenvhelper3MSC.dll!GetVersionInfo() + 0xbb bytes
cppu3.dll!uno_EnvDcp_getPurpose() + 0x24a bytes
cppu3.dll!uno_EnvDcp_getPurpose() + 0x5ea bytes
cppu3.dll!uno_EnvDcp_getPurpose() + 0x60e bytes
cppu3.dll!uno_getCurrentEnvironment() + 0x2d5 bytes
cppu3.dll!uno_Environment_invoke_v() + 0x17 bytes
cppu3.dll!uno_Environment_invoke() + 0x19 bytes
purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x779 bytes
purpenvhelper3MSC.dll!cppu::helper::purpenv::createMapping() + 0x941 bytes
msci_uno.dll!61861472() sal3.dll!osl_acquireMutex(_oslMutexImpl *
Mutex=0x0423f7c0) Line 146 C
> sal3.dll!osl_releaseMutex(_oslMutexImpl * Mutex=0x027c7d04) Line 194 + 0x3 bytes C
>
configmgr2.uno.dll!configmgr::xml::WriterService<com::sun::star::configuration::backend::XLayerHandler>::WriterService<com::sun::star::configuration::backend::XLayerHandler>(const
com::sun::star::uno::Reference<com::sun::star::uno::XComponentContext> &
_xContext=) Line 69 + 0x53 bytes C++
> 8b0c428d()
>
> systray worker thread:
>
> > ntdll.dll!KiFastSystemCallRet() [Frames below may be incorrect and/or
missing, no symbols loaded for ntdll.dll] ntdll.dll!NtWaitForSingleObject() +
0xc bytes ntdll.dll!RtlEnterCriticalSection() + 0x46 bytes
vos3MSC.dll!vos::OMutex::acquire() + 0xe bytes
vclmi.dll!SalYieldMutex::acquire() Line 119 C++
> sfxmi.dll!ShutdownIcon::GetResString(int id=652) Line 326 + 0x21 bytes C++
> sfxmi.dll!createSystrayMenu() Line 273 + 0x14 bytes C++
> sfxmi.dll!listenerWndProc(HWND__ * hWnd=0x003b01be, unsigned int uMsg=1,
unsigned int wParam=0, long lParam=380238216) Line 382 + 0x5 bytes C++
> user32.dll!GetDC() + 0x6d bytes user32.dll!EnumDisplayMonitors() + 0xf8 bytes
user32.dll!DefWindowProcW() + 0x184 bytes user32.dll!CallNextHookEx() + 0x1a3
bytes ntdll.dll!KiUserCallbackDispatcher() + 0x13 bytes
user32.dll!CreateWindowExW() + 0x2a7 bytes user32.dll!CreateWindowExA() + 0x33
bytes sfxmi.dll!SystrayThread(void * __formal=0x00000000) Line 586 + 0x37 bytes C++
> kernel32.dll!GetModuleFileNameA() + 0x1b4 bytes
Comment 1 Stephan Bergmann 2008-08-25 10:53:34 UTC
I could not reproduce the above stacks on mh's machine.  What I *could*
reproduce there is the following:  If swriter, scalc, ... are started and during
first start wizard personal data are transferred from mh's OOo 2 installation,
then no document window appears after splash screen is gone, and soffice.bin
idles in Application::Yield.  If, on the other hand, either soffice is started
(instead of swriter, scalc, ...) or during first start wizard personal data is
not transferred, then everything is OK.
Comment 2 Stephan Bergmann 2008-08-26 12:52:17 UTC
The problem described as <#desc2> (which for me was 100% reproducible on mh's
machine, and I hence assume that it was *that* problem that gave mh the
experience that his soffice regularly deadlocks at startup, and in turn was the
motivation for target OOo 3.0) is reportedly (cd) now covered by a separate
issue (for which I have no issue number available).  I set *this* issue (alleged
configmgr/updatefeed/unsafe_uno_uno deadlock) to target OOo 3.x.
Comment 3 Stephan Bergmann 2008-08-27 10:30:57 UTC
The deadlock is as follows:  The "main thread" obviously from
ComponentContext::lookupMap (cppuhelper/source/component_context.cxx:1.33), with
the UnsafeBridge::m_mutex (cppu/source/UnsafeBridge/UnsafeBridge.cxx) locked,
calls createInstanceWithContext to instantiate the
com.sun.star.deployment.PackageInformationProvider singleton, which in turn
calls OConfigurationProvider::queryInterface
(configmgr/source/api2/confprovider2.cxx:1.33) and tries to lock the configmgr's
UnoApiLock.  The "configmgr-writer-service" thread, obviously with the
configmgr's UnoApiLock locked, from the WriterService ctor
(configmgr/soruce/xml/writersvc.cxx:1.11) calls
ComponentContext::getServiceManager
(cppuhelper/source/component_context.cxx:1.33), which blocks trying to acquire
the UnsafeBridge::m_mutex.

@jb:  Do you have any recollection what the UnoApiLock is good for?  It causes
problems like this deadlock, by making configmgr code call external code (e.g.,
XComponentContext::getServiceManager, service manager's createInstance) with the
configmgr's UnoApiLock locked.
Comment 4 joerg.barfurth 2008-08-27 11:09:41 UTC
When I left configmgr it was very much threading-capable. The UnoApiLock was
obviously introduced by mmeeks in the configrefactor01 CWS.

FWIW: I would not have accepted any attempt do introduce such a global lock for
everything, completely eliminating thread awareness in configmgr, had I still
been the maintainer. While the original implementation may have had
overengineered concurrency support, a lock that is even shared between
independent ConfigurationProvider instances runs completely counter to the
design of the component.

Taking a lock just for queryInterface is obviously broken, but I suppose getting
rid of that won't help much if the calling thread also wants to invoke actual
API functionality. Having the separate cache write-back thread perform its
(complex) work under a global lock that excludes any API activity is also
utterly broken, but it may take more analysis to determine whether it is safe to
get rid of that lock (I also haven't found the place where that thread takes the
global lock).


 
Comment 5 uwe.luebbers 2008-08-27 11:41:07 UTC
issue changed target => removed from 3.0 blocker issue
Comment 6 Stephan Bergmann 2008-08-27 11:45:51 UTC
@mmeeks: please take over
Comment 7 mmeeks 2008-08-27 13:03:10 UTC
> jb wrote:
> When I left configmgr it was very much threading-capable. The UnoApiLock was
> obviously introduced by mmeeks in the configrefactor01 CWS.

nonsense - it was riddled with lock takes & releases, certainly - but any
threaded code actively using the configmgr deadlocked or crashed within a few
seconds: cf. my regression test. This is now no longer the case - and yes, the
threading model is -far- simpler now.

Of course - that means it can sometimes deadlock - but at least we get a clear
view of the interactions between pieces of code; and the existing lock hierarchy
problems around the place.

The UnoApiLock is a simple code lock; we count take / releases on it - and we
drop this lock as we emit callbacks & call out of the code [ at least in theory
of course ].

All that should be necessary is to create an UnoApiLockReleaser over the service
factory instantiation; and isolate the interaction carefully to make sure it's safe.
Comment 8 mmeeks 2008-08-28 11:54:22 UTC
So - a little more analysis; I imagine that the deadlock has always been there,
but perhaps masked by the previous, more granular locking: you would have had to
be writing the same piece of the tree that an activating component was wanting
to read from previously to trigger it; now any write has the effect. IMHO the
solution is not more granular locking to re-conceal the problem however.

Perhaps I'm mis-understanding something, is it the case that the UnsafeBridge
mutex is part of the new Kay Ramme threading framework ? if so, is it the case
that almost whenever an UNO method is invoked on some code, that this lock is
taken ?

eg. if the configmgr wants to use the ServiceManager to instantiate a new
service - then this code is deemed not-thread-safe, and thus the one-big-mutex
is taken ?

If so - we need to put configmgr inside that same appartment; since clearly it
uses components outside it's scope and visibility to do (eg.) I/O - or even
simple things like activating a service ;-)

Of course, we can still leave the UnoApiLock construct against a time when the
subsystems that configmgr relies on are thread-safe: the service manager, the
explicit external services we use are:

com.sun.star.io.DataOutputStream
com.sun.star.xml.sax.Writer
com.sun.star.script.Converter
com.sun.star.comp.stoc.OServiceManagerWrapper

none of which looks like it should be too horrific. Presumably things like the
UCB and sax code are still viewed as thread-safe & not in the global apartment ?

Ultimately - it's a simple lock ordering problem as Stefan points out. Code we
call assumes that this UnsafeBridge mutex can be taken, but we have the
UnoApiLock; while other code assumes that the UnsafeBridge mutex is already held
and that we can then take the UnoApiLock. Ultimately IMHO, the best fix is to
make these locks truly independent by killing all takes of the UnsafeBridge
mutex in the (fairly basic surely) code that configmgr itself uses.

Kay ?
Comment 9 kay.ramme 2008-08-29 08:37:05 UTC
Classically, no Uno object is allowed to keep a lock while calling out ...
means, that we have two bugs here,
- the component context keeps its (unsafe env. provided) lock, as well as
- the configmgr keeping its lock.

If only one kept a lock, there obviously wouldn't a problem ;-)

Putting the component context into the "unsafe" environment was a starting point
to roll out the threading stuff into the office, I mistakenly thought, that the
component context actually does not reach out for other objects. Quick fix is to
just "disarm" it by removing the "unsafe" declaration.

Plan basically is (as stated somewhere in the wiki), to move all but some office
objects into an "unsafe" environment. The ones not to be moved just ensuring
some basic parallelism e.g. for I/O or configuration access.

I am somehow sure that there are other Uno object implementations keeping locks
while calling out, thus I suggest to fix both, the component context as well as
the config manager.

-> Michael, do you see a chance to release the config manager lock while calling
out?
Comment 10 mmeeks 2008-08-29 09:52:23 UTC
Hi Kay - thanks for the input:

> Classically, no Uno object is allowed to keep a lock while calling out ...
> means, that we have two bugs here,

    Seems reasonable. On the other hand, presumably there are some parts of the
system so basic & fundamental that presumably we don't want to have this
requirement. One might assume calling out to this would be one of these ?

> If only one kept a lock, there obviously wouldn't a problem ;-)

    Sure.

> Quick fix is to just "disarm" it by removing the "unsafe" declaration.

    Sounds good, where is that decl ?

> Plan basically is (as stated somewhere in the wiki), to move all but some 
> office objects into an "unsafe" environment. The ones not to be moved
> just ensuring some basic parallelism e.g. for I/O or configuration access.

    Sure, I'm just concerned that we keep the core safe.

> -> Michael, do you see a chance to release the config manager lock
> while calling out?

    Well that bit is at least possible - just add "UnoApiLockReleaser
aReleaser;" type instances around the place. The more tricky piece is verifying
that that is 100% correct in all cases, and of course - we don't want to be
releasing the lock each time we write bytes to a stream eg. ;-) - do we really
have to drop it on -every- call out ? or can we be fairly happy that simple I/O
writes to a file will not be blocking on configuration access ?
Comment 11 Marcus 2017-05-20 10:47:55 UTC
Reset assigne to the default "issues@openoffice.apache.org".