Issue 126586 - automation deadlock: osl_closeSocket() doesn't wake up thread stuck in accept()
Summary: automation deadlock: osl_closeSocket() doesn't wake up thread stuck in accept()
Status: CLOSED FIXED
Alias: None
Product: General
Classification: Code
Component: code (show other issues)
Version: 4.1.1
Hardware: All FreeBSD
: P5 (lowest) Normal (vote)
Target Milestone: 4.1.2
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-13 18:14 UTC by damjan
Modified: 2016-08-30 21:34 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: 4.1.2-dev
Developer Difficulty: ---
pescetti: 4.1.2_release_blocker+


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description damjan 2015-10-13 18:14:25 UTC
On FreeBSD (and I presume other *BSDs), when running the graphical tests (ie. run "ant" in the aoo/tests directory), there are many test errors and after the tests several instances of AOO are running in the background and need to be killed.

I ran AOO exactly how the tests run it on Linux and on FreeBSD and compared the difference:

"/AOO/main/instsetoo_native/unxfbsdx/Apache_OpenOffice/installed/install/en-US/openoffice4/program/soffice" "-automationport=12479" "-enableautomation" "-86fe1abe16f242ccb73ee2bad8cbcffa" "-nofirststartwizard" "-norestore" "-quickstart=no"

On Linux exiting by clicking the "X" in the top-right corner of the window causes it to immediately exit and control to return to the command prompt. On FreeBSD that doesn't happen: control doesn't return to the command prompt, AOO is **DEADLOCKED**.

Running "thread apply all bt" in gdb shows why: in the main/automation module, one thread closes a socket and is trying to join another, but that other is still stuck in accept() on that closed socket, hopelessly waiting for clients to connect. Digging deeper, I saw why: when one thread closes a socket, and another thread is in accept() on that socket, that other thread doesn't return from accept(). The socket code in main/sal/osl/unx has a workaround for this, but it's "#if defined(LINUX)".
Comment 1 damjan 2015-10-13 18:31:26 UTC
I've committed a patch that generalizes the #if defined(LINUX) to FreeBSD and (by extrapolation only) NetBSD, and not only gets AOO to exit when run with automation enabled as described, but also gets all tests to run, and even gets as many tests to pass on FreeBSD as pass on Linux (though different ones fail and for different reasons lol).

I am thus proposing this fix as a 4.1.2 release blocker.
Comment 2 Andrea Pescetti 2015-10-15 07:04:59 UTC
@damjan: Do you have the revision number? SVN robot is apparently broken these days. And, reading your description, the scope of this fix is limited to BSD, right? Meaning that nothing changes for other platforms.

This is for properly evaluating impact of this bug and deciding whther to include the fix if we go for a RC3. Thanks.
Comment 3 damjan 2015-10-15 07:38:55 UTC
Hi Andrea, it's revisions 1708477 and 1708483, and yes only FreeBSD and NetBSD.
Comment 4 Andrea Pescetti 2015-10-19 12:13:15 UTC
This is merged to AOO410 for OpenOffice 4.1.2-RC3.
Comment 5 Andrea Pescetti 2015-10-19 12:16:25 UTC
For reference, trunk revision 1708477 and trunk revision 1708483 were merged as revision 1709400 on AOO410.
Comment 6 Kay 2016-08-30 21:34:53 UTC
Closing.