Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||[sw] crash during shutdown: no SolarMutex|
|Component:||code||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Priority:||P4||CC:||caolanm, dtardon, issues, kay.ramme, Mathias_Bauer, peter.jentsch, philipp.lohmann, stephan.bergmann.secondary|
|Issue Type:||DEFECT||Latest Confirmation in:||---|
Description mst.ooo 2010-06-22 16:03:51 UTC
just captured this most peculiar critter: alDisplay::~SalDisplay() SalDisplay::DeInitRandR() GtkXLib::~GtkXLib() ==26317== Thread 7: ==26317== Invalid read of size 4 ==26317== at 0xA6B2081: Application::GetSolarMutex() (svapp.cxx:528) ==26317== by 0x1DE39D19: sw::UnoImplPtr<SwXTextCursor::Impl>::~UnoImplPtr() (unobaseclass.hxx:126) ==26317== by 0x1DE34AF1: SwXTextCursor::~SwXTextCursor() (unoobj.cxx:919) ==26317== by 0x74F7C0D: cppu::OWeakObject::release() (in inst/OO_m80_pooltest_li/OOo_DEV300m80_Linux_x86_install-arc_en-US/openoffice.org/ure/lib/libuno_cppuhelpergcc3.so.3) ==26317== by 0x1DE39A38: cppu::WeakImplHelper12<com::sun::star::lang::XServiceInfo, com::sun::star::beans::XPropertySet, com::sun::star::beans::XPropertyState, com::sun::star::beans::XMultiPropertyStates, com::sun::star::container::XEnumerationAccess, com::sun::star::container::XContentEnumerationAccess, com::sun::star::util::XSortable, com::sun::star::document::XDocumentInsertable, com::sun::star::text::XSentenceCursor, com::sun::star::text::XWordCursor, com::sun::star::text::XParagraphCursor, com::sun::star::text::XRedline>::release() (implbase12.hxx:120) ==26317== by 0x1DE2C714: SwXTextCursor::release() (unoobj.cxx:3199) ==26317== by 0x101932F7: ??? (in inst/OO_m80_pooltest_li/OOo_DEV300m80_Linux_x86_install-arc_en-US/openoffice.org/ure/lib/libgcc3_uno.so) ==26317== by 0x75ABAA9: ??? (in inst/OO_m80_pooltest_li/OOo_DEV300m80_Linux_x86_install-arc_en-US/openoffice.org/ure/lib/libuno_cppu.so.3) ==26317== by 0x75B4108: ??? (in inst/OO_m80_pooltest_li/OOo_DEV300m80_Linux_x86_install-arc_en-US/openoffice.org/ure/lib/libuno_cppu.so.3) ==26317== by 0x75B4181: uno_Environment_invoke_v (in inst/OO_m80_pooltest_li/OOo_DEV300m80_Linux_x86_install-arc_en-US/openoffice.org/ure/lib/libuno_cppu.so.3) ==26317== by 0x75B41B4: uno_Environment_invoke (in inst/OO_m80_pooltest_li/OOo_DEV300m80_Linux_x86_install-arc_en-US/openoffice.org/ure/lib/libuno_cppu.so.3) ==26317== by 0x75A73A9: ??? (in inst/OO_m80_pooltest_li/OOo_DEV300m80_Linux_x86_install-arc_en-US/openoffice.org/ure/lib/libuno_cppu.so.3) ==26317== Address 0x4 is not stack'd, malloc'd or (recently) free'd ==26317== ==26317== ==26317== Process terminating with default action of signal 11 (SIGSEGV) ==26317== Access not within mapped region at address 0x4 ==26317== at 0xA6B2081: Application::GetSolarMutex() (svapp.cxx:528) ==26317== by 0x1DE39D19: sw::UnoImplPtr<SwXTextCursor::Impl>::~UnoImplPtr() (unobaseclass.hxx:126) ==26317== by 0x1DE34AF1: SwXTextCursor::~SwXTextCursor() (unoobj.cxx:919) so it seems that during shutdown the SolarMutex may actually be 0... but Application::GetSolarMutex() returns a reference, not a pointer... [ but a crash during shutdown won't impact the user much and isn't so important... ]
Comment 1 philipp.lohmann 2010-06-23 07:58:51 UTC
Reference or no; any call to any vcl class or even object after DeInitVCL has been called is an error and likely to crash. Your problem is more that there is a reference of SwXTextCursor flying around after the application (including the framework) is already gone.
Comment 2 Mathias_Bauer 2010-06-23 08:43:32 UTC
It can't be avoided that such references exist - if they are held in a Java program it's the garbage collection that decides when they are released. The real problem is that we never implemented a life time control of UNO objects. VCL shouldn't be shut down as long as any external reference exist. COM has such a mechanism and many years ago I suggested something similar to Markus Meyer for UNO objects and services. Unfortunately that wasn't seen as important enough to judge the effort of its implementation. Alternatively the life time of the SolarMutex could be controlled outside of VCL. But AFAIK and as you probably know even better, that most probably also isn't easy.
Comment 3 philipp.lohmann 2010-06-23 09:03:17 UTC
easy would not be the problem. Just that would be no solution. So you have a SolarMutex outside vcl's lifetime and then what ? The mutex is just for protecting the underlying vcl (or vcl derived) object, which you would probably want to access -> crash in object instead of solar mutex. No gain.
Comment 4 Mathias_Bauer 2010-06-23 10:00:29 UTC
The SolarMutex protects all code in VCL and all other code where the developers where too lazy or not able to create their own mutex. You are right if the object in question has some dependency on VCL code. But if not (and that might be something easier to achieve as we are only talking about the code in a dtor), a SolarMutex outside of VCL could help. Besides that, I still would prefer the other solution: if a refcounted object requires something from VCL (e.g. SolarMutex) there must be a way to avoid VCL shutdown as long as the object exists. A possible way could be to add a member to all UNO objects using the SolarMutex that "refcounts" the VCL based application and prevents that it is shut down. We also could fix that in UNO - every bridge can hold such a reference so that the application can't go down as long as an external bridge exist. That would be something as the COM solution I mentioned. In the olden times we counted an reference on our Application class for every COM connection plus one reference if the application also was started by a user action. Even if the user called "File-Exit", the application did not terminate until all external connections had been released.
Comment 5 philipp.lohmann 2010-06-23 10:10:15 UTC
So any number of code gone wrong (hanging server, forgotten refcount) can then prevent the application from ever quitting again. Just saying. However preventing vcl "shutdown" is simple, just don't call DeInitVCL until you mean it. That is prevent framework from calling DeInitVCL before all such references are cleared.
Comment 6 Mathias_Bauer 2010-06-23 10:25:42 UTC
> So any number of code gone wrong (hanging server, forgotten refcount) can then > prevent the application from ever quitting again. Just saying. Sure. So as all code in office that loops prevents the office from being shut down. > That is prevent framework from calling DeInitVCL before all such references are > cleared. I would even prevent OOo from leaving the message loop as then it still can receive user requests. That's exactly as we did it when the only external connections we had were COM connections.
Comment 7 philipp.lohmann 2010-06-23 10:43:51 UTC
Just that looping code (aka bugs) is still possible should not prevent us from being at least moderately safe from forgotten references; no need to make the situation worse IMHO. And WRT "exactly as we did", why should external connections preventing the user from exiting the application a good thing ? He'll invariably think that OOo hangs. If this "exit prevention" need be I'd at least want to have some kind of message box querying "Connections are still open, do you really want to quit" or some such.
Comment 8 Mathias_Bauer 2010-06-23 10:54:07 UTC
The user will still be able to close the application (means: all its windows) - but it will stay in memory (and in its message loop) until all external connections are closed. In the meantime the user also will be able to open new documents from his desktop, thus again establishing the "internal" refcount on the application. So a hanging external connection in the worst case would just prevent OOo from being removed from memory - it would act as a UI less quickstarter. ;-)
Comment 9 philipp.lohmann 2010-06-23 12:31:05 UTC
... with the added benefit that you cannot logout or shutdown anymore because of a still running application. Great :-(
Comment 10 caolanm 2010-06-24 09:25:37 UTC
My own 2 cents is that I'd like to see a much longer stack to see some more details of this particular example, i.e. --num-callers=50 or something. We get quite a few crash-on-shutdown traces in fedora as our crash-reporting infrastructure is global over all apps and over their entire lifetime. And they fall into two main categories. I've seen ones like this one with a11y shutdown where the a11y shutdown and release of uno references is called effectively from "spi_atk_bridge_exit_func" which nightmarishly is called from atexit from the accessibility libs which makes avoiding this tricky. The other family is due to the current cppu shutdown being crippled to attempt to "leak" resources to avoid accessing dead ones, but this has the side effect of not calling the dtors which cleanly bring down the remotebridge reader/writer threads leaving them generally blocked on mutexes and then after those global mutex gets destroyed on main processes exit the threads unblock and trample all over the corrupt corpse of OOo. The common case is unopkg crashes which are no fault of unopkg. See issue 107490 for some thoughts on fixing that up.
Comment 11 kay.ramme 2010-06-24 09:35:16 UTC
... somehow I have been added to this issue. Regarding remote references to (Uno) objects in the office process, it is either correct to * programmatically shutdown any remote bridges (including in process runtimes such as Java), or * to ensure that reachable objects are valid. We need to define the meaning of "file/exit" accordingly, as already suggested. The object life cycle in OOo is (partly) ref-count based, objects reachable are alive, objects becoming unreachable are (mostly synchronously) destroyed. Ideally VCL would follow this approach, e.g. create a "Display" with some controls on it, the controls hold references to the "Display", ensuring that it is kept alive at least wrt memory. Philipp is right about VCL's SolarMutex, the SolarMutex as well as VCL must not be in use after DeInitVCL (by they way, wasn't there any debugging aid to ensure this?). This bears the question what the sense of Tools SolarMutex is. Either use the VCL SolarMutex or the OSL GlobalMutex.
Comment 12 Mathias_Bauer 2010-06-24 10:01:01 UTC
The reason for tools::SolarMutex is that you can use it without linking against vcl. This was necessary to remove vcl build dependencies from some libraries.
Comment 13 kay.ramme 2010-06-24 10:09:05 UTC
Matthias, thanks for clarifying. That's what I expected. What about doing it "right" and utilizing the OSL GlobalMutex for VCL and friends? The minimum would be to move the implementation of the VCL SolarMutex to the Tools, AFAIR VCL depends on Tools anyway.
Comment 14 Mathias_Bauer 2010-06-24 10:32:01 UTC
That would indeed be a very welcomed change. AFAIR the implementation if SolarMutex is special and can't be changed that way. pl should know that better than anyone else. :-)
Comment 15 kay.ramme 2010-06-24 10:44:25 UTC
I remember, you are right. The SolarMutex is platform native implemented, ...
Comment 16 mst.ooo 2010-06-24 10:53:47 UTC
just for the record: the sw::UnoImplPtr that takes the SolarMutex is new in DEV300m71. i've introduced it to solve issue 105557, which led to crashes due to race conditions. the entire UNO API implementation of the writer uses the SolarMutex for locking. and this crash doesn't seem to happen "often", i think i've seen it only once. hmmm... i wonder if the mythical UNO threading framework would solve this problem :)
Comment 17 kay.ramme 2010-06-24 12:22:08 UTC
It could, the steps are: -1- consolidate / reduce the SalYieldMutex implementations in VCL: There is still some craft in there from a mythical Java client. -2- Create a Uno Purpose Bridge for the SolarMutex: http://wiki.services.openoffice.org/wiki/Uno/Cpp/Spec/Purpose_Bridge_Implementat ion_Helper (seems that some of may pages have been deleted by WikiBot). -3- Adapt the belonging components to be loaded into the "solarmutex" environment. -4- Remove any left overs, including all the guards etc. Step -1- is not trivial and error prone.
Comment 18 kay.ramme 2010-06-24 12:26:55 UTC
SolarMutex implementations can be found in: vcl/os2/source/app/salinst.cxx vcl/unx/headless/svpinst.cxx vcl/unx/source/app/salinst.cxx vcl/unx/gtk/app/gtkinst.cxx vcl/win/source/app/salinst.cxx vcl/aqua/source/app/salinst.cxx