Issue 65788 - bus error in udkapi
Summary: bus error in udkapi
Status: CONFIRMED
Alias: None
Product: porting
Classification: Code
Component: code (show other issues)
Version: OOo 2.0.4
Hardware: Sun Linux, all
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-26 10:36 UTC by sparcmoz
Modified: 2013-02-07 21:55 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
some debug info (13.08 KB, text/plain)
2006-05-26 11:33 UTC, sparcmoz
no flags Details
output from typesconfig (7.96 KB, text/plain)
2006-05-26 13:37 UTC, sparcmoz
no flags Details
to build m170 - proof of concept only - not a solution (770 bytes, patch)
2006-05-30 11:56 UTC, sparcmoz
no flags Details | Diff
files moved to avoid bus error (1.73 KB, patch)
2006-05-31 08:45 UTC, sparcmoz
no flags Details | Diff
Bus Error build log (1.58 KB, text/plain)
2006-06-12 04:26 UTC, sparcmoz
no flags Details
patches for tracing dump and produce build.log (1.97 KB, patch)
2006-06-12 04:27 UTC, sparcmoz
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description sparcmoz 2006-05-26 10:36:10 UTC
m165 builds OK but this bus error when building m166 on GNU/Linux SPARC with
gcc4.1.1:
=============
Building project udkapi
=============
/home/jim/m165/udkapi/com/sun/star/uno
mkout -- version: 1.6
/home/jim/m165/udkapi/com/sun/star/unodmake: Executing shell macro: +echo
$(IDLPACKAGE) | $(SED) 's/\\/\//g'
idlc @/tmp/mkXMx5Jz
idlc: compile 'Exception.idl' ...
idlc: compile 'NamingService.idl' ...
idlc: compile 'RuntimeException.idl' ...
idlc: compile 'SecurityException.idl' ...
idlc: compile 'DeploymentException.idl' ...
idlc: compile 'TypeClass.idl' ...
Bus error
dmake:  Error code 138, while making '../../../../unxlngs.pro/misc/urd_cssuno.don'
'---* tg_merge.mk *---'

ERROR: Error 65280 occurred while making /home/jim/m165/udkapi/com/sun/star/uno
jim@sun:~/m165/udkapi$

If the problem file is skipped the same error occurs again later with other
files. If i revert only module sal to m165 the problem is fixed. I suppose this
relates to cws_src680_mhu12
Comment 1 sparcmoz 2006-05-26 11:33:36 UTC
Created attachment 36728 [details]
some debug info
Comment 2 sparcmoz 2006-05-26 13:37:24 UTC
Created attachment 36734 [details]
output from typesconfig
Comment 3 sparcmoz 2006-05-26 23:14:37 UTC
bus error on sparc is a hint of memory alignment problems. Browsing the changes
reveals alignment code in alloc_impl.h and so this:
jim@sun:~/m165/sal$ grep -r SAL_TYPES_ALIGNMENT8 *
inc/sal/types.h:  #define SAL_TYPES_ALIGNMENT8          1
rtl/source/alloc_impl.h:#if SAL_TYPES_ALIGNMENT8 > 1
rtl/source/alloc_impl.h:#define RTL_MEMORY_ALIGNMENT_8 SAL_TYPES_ALIGNMENT8
rtl/source/alloc_impl.h:#endif /* SAL_TYPES_ALIGNMENT8 */
unxlngs.pro/inc/sal/typesizes.h:#define SAL_TYPES_ALIGNMENT8    4

This last one looks odd, if I change 4 to be 8, touch rtl/source/alloc_impl.h
and build again, then udkapi builds without errors. So it looks like the
function GetAlignment or Description_Ctor in sal/typesconfig/typesconfig.c does
not return a correct value?
Comment 4 sparcmoz 2006-05-26 23:19:59 UTC
mhu: please comment
Comment 5 matthias.huetsch 2006-05-29 11:52:13 UTC
mhu->sparcmoz: According to your description, it looks like some 8 byte (or
larger) type cannot cope with a 4 byte alignment (resulting in SIGBUS). As the
'typesconfig' program does test with 'double', which can be 4 byte aligned (same
as on Solaris Sparc 32bit), it would be interesting to find out what type
exactly is causing the SIGBUS (and possibly add such a test to the 'typesconfig'
program).

And yes, this SAL_TYPES_ALIGNMENT8 = 4 works on Solaris Sparc 32bit; only in
64bit mode we have a SAL_TYPES_ALIGNMENT8 = 8, here.

So, can you please try to find out which type exactly is causing the SIGBUS here?

Thanks,
Matthias
Comment 6 sparcmoz 2006-05-30 11:45:55 UTC
sparcmoz-->mhu: I tried deleting various types from
com/sun/star/uno/TypeClass.idl but it is not that easy - there is some kind of
interaction between the different types so that some combination of types is
involved, and that makes a large number of trials, so I need a better strategy... 

I am attaching a patch, which I used to build and run m170, but this is NOT
suggested as a fix, it is just filed here for ease of finding later...
Comment 7 sparcmoz 2006-05-30 11:56:08 UTC
Created attachment 36794 [details]
to build m170 - proof of concept only - not a solution
Comment 8 sparcmoz 2006-05-31 08:44:32 UTC
sparcmoz->mhu: some more random bits of information. 

In case of com/sun/star/uno/TypeClass.idl I can remove the bus error by
compiling that file before the others in com/sun/star/uno, by simply changing
the sequence in makefile.mk

Something similar can be seen in com/sun/star/lang, but when some files are
moved to compile sooner, then the bus error comes at a different file which had
previously not had a bus error.

I attach a diff that shows the files that have built OK after moving up in the
compile sequence.

Within TypeClass.idl the bus error may be overcome by deleting for example all
types after ARRAY, but I cannot narrow it down to any single type or group in
that file. 

If each of the types within TypeClass.idl is placed alone in a file, then there
is no bus error.

If i run idlc directly on any file having bus error, from the command line, I
get idlc: returned successful

I have now completely built and run m170 with SAL_TYPES_ALIGNMENT8 = 8. As I am
using gcc4.1.1 I have to use cws_src680_warnings01 for bridges.
Comment 9 sparcmoz 2006-05-31 08:45:52 UTC
Created attachment 36829 [details]
files moved to avoid bus error
Comment 10 sparcmoz 2006-06-12 04:23:58 UTC
I did some tracing in idlc and have the following observations
(a) Bus Error occurs sometimes, but only when trying to execute this command in
idlc/source/idlcproduce.cxx 
        // produce registry file
        if ( !idlc()->getRoot()->dump(rootKey) )
Further testing shows that idlc()->getRoot() is OK but an error occurs in
dump(rootKey).

(b) dump(rootKey) is implemented in idlc/source/astdeclaration.cxx at row 172
function 
sal_Bool AstDeclaration::dump(RegistryKey& rKey)

(c) This function is recursive by including the following statement in a while loop:
bRet = pDecl->dump(rKey);

(d) Tryin g to understand what dump does, it appears the rKey has a variable
number of members and the getRoot checks if the last member is included in a
list of known types such as NT_module.

(e) With normal operation the function is re-entered 5 times, the first 4 times
finding type NT_module and the 5th time finding a different type. After the 5th
re-entry then the while loop is completed and the dump function returns 5 times.

(f) In failure operation the function enters 5 times and identifies the 5th type
but never returns, as the Bus Error occurs at that point.

(g) The first Bus Error is noted with type NT_enum, and if that is bypassed as
described in earlier comments, then the error occurs next with NT_struct.

I will attach a log of running udkapi and the patches to idlc that print out
that log. 

From reading about recursion it appears a useful test would be to implement dump
with some kind of loop so it is not recursive but I have no idea yet how to do that.

I have not figured how to run this with gdb yet, I can see that idlccpp is
called by execv from sal.

This is very slow work for me, and as it works with alignment 8, I wonder if it
is worth any more work at all?
Comment 11 sparcmoz 2006-06-12 04:26:15 UTC
Created attachment 37060 [details]
Bus Error build log
Comment 12 sparcmoz 2006-06-12 04:27:51 UTC
Created attachment 37061 [details]
patches for tracing dump and produce build.log
Comment 13 sparcmoz 2006-10-28 13:04:58 UTC
This error is hidden if module sal is rebuilt with the environment set with
ALLOC=SYS_ALLOC, for example by configure --with-alloc=system. In that case it
does not matter for udkapi, if ALLOC is set or not. So I guess the problem might
be found in the sal code where ALLOC is not set. I like to try and build
everything without-system, but not sure if that should be done in this case.
Comment 14 sparcmoz 2008-07-04 06:54:03 UTC
todo: investigate typesconfig patch from IA64 porting issue 84999
Comment 15 sparcmoz 2008-07-04 07:06:56 UTC
first step is testing - possible fix already integrated from issue 86955.