Bug 44639 - Segmentation fault 11 errors right after server started up
Summary: Segmentation fault 11 errors right after server started up
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: All (show other bugs)
Version: 2.2.3
Hardware: Sun Solaris
: P2 critical (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: ErrorMessage
Depends on:
Blocks:
 
Reported: 2008-03-19 11:42 UTC by Dave
Modified: 2008-05-19 14:57 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dave 2008-03-19 11:42:50 UTC
Description:
I am running Apache 2.2.3 on Solaris 5.9 and the MPM is prefork. I got this Segmentation fault 11 error right after the Apache started up and the errors kept going forever.

The only workaround is to start the Apache as non-root user on the port other than 80, say 8080.

Note, my Solaris uses LDAP as system level user/group authentication (not the Apache's modules, mod_ldap or mod_auth_ldap for HTTP basic auth). And everything was working fine using LDAP in previous Apache 1.3. So I wonder if this issue has anything to do with the support of LDAP as system level authentication in Apache 2.2.3.

Observation:
The parent process stayed up and the 5 child processes errored out with Segfault 11 and then another 5 new child processes were created, then errored out again...

----httpd-errors log
[Wed Mar 05 14:56:24 2008] [notice] Apache/2.2.3 (Unix) mod_ssl/2.2.3 OpenSSL/0.9.8d configured -- resuming normal operations
[Wed Mar 05 14:56:24 2008] [info] Server built: Oct 31 2007 06:21:59
[Wed Mar 05 14:56:24 2008] [debug] prefork.c(991): AcceptMutex: fcntl (default: fcntl)
[Wed Mar 05 14:56:25 2008] [notice] child pid 19514 exit signal Segmentation fault (11), possible coredump in /opt/Apache2
[Wed Mar 05 14:56:25 2008] [notice] child pid 19513 exit signal Segmentation fault (11), possible coredump in /opt/Apache2
[Wed Mar 05 14:56:25 2008] [notice] child pid 19516 exit signal Segmentation fault (11)
:

I added some debuggings in Apache sources and found it failed in getting the system group privileges (native call - nss_search()). So I am guessing that each child process was trying to verify the user's group privileges by calling the native nss_search() but something went wrong during nss_search() that the child was forced to segfault out.

My questions are:
1. Is there a known issue in Apache 2.x with ldap as os or system level authentication? Or is it supported?  (Again, it was working fine in previous Apache 1.3)
2. If I rebuild Apache2 with configure option like "--with-ldap" or "--enable-ldap", would that help? My understanding of mod_ldap or mod_auth_ldap is for Apache application level of http basic auth. But curious if this config option, --with-ldap, would do anything when deal with system level authentication.

 I appreciate if any folk here can provide any pointer related to this issue.
 I might be complete wrong with ldap as root cause but curious if anyone encounter the same error and also using ldap.

thanks,
dave

--- /etc/nsswitch.conf ---------
:
:
passwd:  files  ldap
group:   files  ldap
:
:

---Apache Configure options used ---------------------------------
	./configure \
	--prefix=$TOP/sys/toolkits/http/apache2/installation \
        --with-ssl =/sys/toolkits/http/apache2/openssl/sol \
	--enable-usertrack --enable-unique-id \
	--enable-most --enable-so --enable-vhost-alias \
	--enable-info --enable-access --enable-expires --enable-headers \
	--enable-imagemap --enable-rewrite --enable-speling --enable-cern-meta \
	--disable-proxy --disable-log-forensic --enable-mime-magic --enable-ssl
Comment 1 Ruediger Pluem 2008-03-19 14:01:49 UTC
Please provide a gdb backtrace of the crashed process (http://httpd.apache.org/dev/debugging.html)
Comment 2 Dave 2008-03-19 14:53:42 UTC
Here you go. Thanks, dave


  main(0x3, 0xffbff07c, 0xffbff08c, 0x3a6c00, 0x0, 0x0)
   ap_mpm_run(0x3ddaf0, 0x41fbf8, 0x3e4298, 0x3e4298, 0x0, 0x0)
   perform_idle_server_maintenance(0x3ddaf0, 0xffbfef44, 0xffbfef30, 0x3ddaf0,           0x3e4298, 0x377990)
   make_child(0x3e4298, 0x0, 0xffbfef48, 0x1, 0x3ddaf0, 0x348864)
   child_main(0x0, 0x2367b8, 0x1, 0x0, 0xffbfee50, 0xffbfee40)
   unixd_setup_child(0x0, 0x520440, 0x567090, 0x0, 0x0, 0x0)
   set_group_privs(0x0, 0x567090, 0x520440, 0x0, 0x0, 0x0)
   _initgroups(0x4efbb0, 0x65, 0x0, 0x0, 0x0, 0x0)
   _getgroupsbymember(0x4efbb0, 0x560658, 0x10, 0x1, 0x0, 0x5606a1)
   _nss_search(0x2, 0xfeeb614c, 0x56b090, 0xffbfeb44, 0x4e455449, 0xff03f084)
   0xfeeb6178(0x55fee8, 0xffbfeb44, 0x30, 0x56b0e8, 0x72000000, 0x72000000)
   0xfeeb2e50(0x2, 0xff000000, 0x0, 0x56b144, 0x0, 0x0)
   0xfee89d90(0x0, 0x0, 0x0, 0xfeebbea8, 0xfeebbeb8, 0xfeece478)
   0xfee89768(0x4debd8, 0x566b98, 0x4dd5f8, 0x0, 0x0, 0xffbfe520)
   0xfee888ac(0xffffffff, 0x566b98, 0x1, 0xffbfe524, 0x0, 0x40)
   0xfee88170(0x0, 0x3, 0xfeea1f78, 0xfeea2304, 0xad8, 0x1)
   0x0(0x0, 0x800, 0x0, 0x0, 0x800, 0x0)

Comment 3 Ruediger Pluem 2008-03-19 15:00:51 UTC
Please do *not* change the assignment of bugs. This removes it from other developers scope as they no longer get notifications about this bug.
Comment 4 Ruediger Pluem 2008-03-19 15:10:40 UTC
(In reply to comment #2)
> Here you go. Thanks, dave
> 
> 
>   main(0x3, 0xffbff07c, 0xffbff08c, 0x3a6c00, 0x0, 0x0)
>    ap_mpm_run(0x3ddaf0, 0x41fbf8, 0x3e4298, 0x3e4298, 0x0, 0x0)
>    perform_idle_server_maintenance(0x3ddaf0, 0xffbfef44, 0xffbfef30, 0x3ddaf0, 
>          0x3e4298, 0x377990)
>    make_child(0x3e4298, 0x0, 0xffbfef48, 0x1, 0x3ddaf0, 0x348864)
>    child_main(0x0, 0x2367b8, 0x1, 0x0, 0xffbfee50, 0xffbfee40)
>    unixd_setup_child(0x0, 0x520440, 0x567090, 0x0, 0x0, 0x0)
>    set_group_privs(0x0, 0x567090, 0x520440, 0x0, 0x0, 0x0)
>    _initgroups(0x4efbb0, 0x65, 0x0, 0x0, 0x0, 0x0)
>    _getgroupsbymember(0x4efbb0, 0x560658, 0x10, 0x1, 0x0, 0x5606a1)
>    _nss_search(0x2, 0xfeeb614c, 0x56b090, 0xffbfeb44, 0x4e455449, 0xff03f084)
>    0xfeeb6178(0x55fee8, 0xffbfeb44, 0x30, 0x56b0e8, 0x72000000, 0x72000000)
>    0xfeeb2e50(0x2, 0xff000000, 0x0, 0x56b144, 0x0, 0x0)
>    0xfee89d90(0x0, 0x0, 0x0, 0xfeebbea8, 0xfeebbeb8, 0xfeece478)
>    0xfee89768(0x4debd8, 0x566b98, 0x4dd5f8, 0x0, 0x0, 0xffbfe520)
>    0xfee888ac(0xffffffff, 0x566b98, 0x1, 0xffbfe524, 0x0, 0x40)
>    0xfee88170(0x0, 0x3, 0xfeea1f78, 0xfeea2304, 0xad8, 0x1)
>    0x0(0x0, 0x800, 0x0, 0x0, 0x800, 0x0)
> 

Could you please provide a gdb backtrace? It should give us more information.
Comment 5 Rainer Jung 2008-03-19 15:30:47 UTC
Your description reminds me of a similar problem. Does your ldap client configuration contain two ldap servers? If so, does the problem go away, if you se only one ldap server in the ldap client configuration?
Comment 6 Eric Covener 2008-03-19 18:25:52 UTC
Is the LDAP in /etc/nsswitch.conf ultimately loading some other version of openssl?  Can you link httpd with the system SSL instead of the alternate one?
Comment 7 Dave 2008-03-20 11:06:40 UTC
(In reply to comment #4)
> (In reply to comment #2)
> > Here you go. Thanks, dave
> > 
> > 
> >   main(0x3, 0xffbff07c, 0xffbff08c, 0x3a6c00, 0x0, 0x0)
> >    ap_mpm_run(0x3ddaf0, 0x41fbf8, 0x3e4298, 0x3e4298, 0x0, 0x0)
> >    perform_idle_server_maintenance(0x3ddaf0, 0xffbfef44, 0xffbfef30, 0x3ddaf0, 
> >          0x3e4298, 0x377990)
> >    make_child(0x3e4298, 0x0, 0xffbfef48, 0x1, 0x3ddaf0, 0x348864)
> >    child_main(0x0, 0x2367b8, 0x1, 0x0, 0xffbfee50, 0xffbfee40)
> >    unixd_setup_child(0x0, 0x520440, 0x567090, 0x0, 0x0, 0x0)
> >    set_group_privs(0x0, 0x567090, 0x520440, 0x0, 0x0, 0x0)
> >    _initgroups(0x4efbb0, 0x65, 0x0, 0x0, 0x0, 0x0)
> >    _getgroupsbymember(0x4efbb0, 0x560658, 0x10, 0x1, 0x0, 0x5606a1)
> >    _nss_search(0x2, 0xfeeb614c, 0x56b090, 0xffbfeb44, 0x4e455449, 0xff03f084)
> >    0xfeeb6178(0x55fee8, 0xffbfeb44, 0x30, 0x56b0e8, 0x72000000, 0x72000000)
> >    0xfeeb2e50(0x2, 0xff000000, 0x0, 0x56b144, 0x0, 0x0)
> >    0xfee89d90(0x0, 0x0, 0x0, 0xfeebbea8, 0xfeebbeb8, 0xfeece478)
> >    0xfee89768(0x4debd8, 0x566b98, 0x4dd5f8, 0x0, 0x0, 0xffbfe520)
> >    0xfee888ac(0xffffffff, 0x566b98, 0x1, 0xffbfe524, 0x0, 0x40)
> >    0xfee88170(0x0, 0x3, 0xfeea1f78, 0xfeea2304, 0xad8, 0x1)
> >    0x0(0x0, 0x800, 0x0, 0x0, 0x800, 0x0)
> > 
> 
> Could you please provide a gdb backtrace? It should give us more information.
> 

This issue occurred on a Customer production box that does not have debug tool like gdb.
Also, the Apache we built is using Sun's cc compiler, not the g++. Doesn't gdb only debug symbols that are generated by g++? The stack traces I provided above was from a core dump using Sun's workshop (dbx). I can email you the core file if you need.
thanks,
dave


Comment 8 Dave 2008-03-20 11:10:25 UTC
(In reply to comment #5)
> Your description reminds me of a similar problem. Does your ldap client
> configuration contain two ldap servers? If so, does the problem go away, if you
> se only one ldap server in the ldap client configuration?
> 
It is only configured to use one ldap server. Again, if I flip back to use Apache1.3 on the same machine, it works fine. It seems the underline os authentication is not transparent to Apache application layer. 
thanks,
dave
Comment 9 Dave 2008-03-20 11:14:38 UTC
(In reply to comment #6)
> Is the LDAP in /etc/nsswitch.conf ultimately loading some other version of
> openssl?  Can you link httpd with the system SSL instead of the alternate one?
> 
I didn't see the nsswitch.conf has the configuration to load ssl stuffs. Although the Apache was built the mod_ssl static linked into it, the httpd.conf is not configured to use SSL. 
thanks,
dave 
Comment 10 Dave 2008-03-20 12:14:47 UTC
(In reply to comment #7)
> (In reply to comment #4)
> > (In reply to comment #2)
> > > Here you go. Thanks, dave
> > > 
> > > 
> > >   main(0x3, 0xffbff07c, 0xffbff08c, 0x3a6c00, 0x0, 0x0)
> > >    ap_mpm_run(0x3ddaf0, 0x41fbf8, 0x3e4298, 0x3e4298, 0x0, 0x0)
> > >    perform_idle_server_maintenance(0x3ddaf0, 0xffbfef44, 0xffbfef30, 0x3ddaf0, 
> > >          0x3e4298, 0x377990)
> > >    make_child(0x3e4298, 0x0, 0xffbfef48, 0x1, 0x3ddaf0, 0x348864)
> > >    child_main(0x0, 0x2367b8, 0x1, 0x0, 0xffbfee50, 0xffbfee40)
> > >    unixd_setup_child(0x0, 0x520440, 0x567090, 0x0, 0x0, 0x0)
> > >    set_group_privs(0x0, 0x567090, 0x520440, 0x0, 0x0, 0x0)
> > >    _initgroups(0x4efbb0, 0x65, 0x0, 0x0, 0x0, 0x0)
> > >    _getgroupsbymember(0x4efbb0, 0x560658, 0x10, 0x1, 0x0, 0x5606a1)
> > >    _nss_search(0x2, 0xfeeb614c, 0x56b090, 0xffbfeb44, 0x4e455449, 0xff03f084)
> > >    0xfeeb6178(0x55fee8, 0xffbfeb44, 0x30, 0x56b0e8, 0x72000000, 0x72000000) > > >    0xfeeb2e50(0x2, 0xff000000, 0x0, 0x56b144, 0x0, 0x0)
> > >    0xfee89d90(0x0, 0x0, 0x0, 0xfeebbea8, 0xfeebbeb8, 0xfeece478)
> > >    0xfee89768(0x4debd8, 0x566b98, 0x4dd5f8, 0x0, 0x0, 0xffbfe520)
> > >    0xfee888ac(0xffffffff, 0x566b98, 0x1, 0xffbfe524, 0x0, 0x40)
> > >    0xfee88170(0x0, 0x3, 0xfeea1f78, 0xfeea2304, 0xad8, 0x1)
> > >    0x0(0x0, 0x800, 0x0, 0x0, 0x800, 0x0)
> > > 
> > 
> > Could you please provide a gdb backtrace? It should give us more information.
> > 
> 
> This issue occurred on a Customer production box that does not have debug tool
> like gdb.
> Also, the Apache we built is using Sun's cc compiler, not the g++. Doesn't gdb
> only debug symbols that are generated by g++? The stack traces I provided above
> was from a core dump using Sun's workshop (dbx). I can email you the core file
> if you need.
> thanks,
> dave
> 

I took the core and ran it with the gdb on my local box, it gives the following: (doesn't seem to much help though.)
   (gdb) core-file core
   Core was generated by `/opt/Apache/httpd/bin/nhihttpd -f /opt/ Apache/httpd/conf/http'.
   Program terminated with signal 11, Segmentation fault.
   #0  0x00000000 in ?? ()
Comment 11 Ruediger Pluem 2008-03-20 13:40:37 UTC
(In reply to comment #10)
> (In reply to comment #7)
> > (In reply to comment #4)
> > > (In reply to comment #2)
> > > > Here you go. Thanks, dave
> > > > 
> > > > 
> > > >   main(0x3, 0xffbff07c, 0xffbff08c, 0x3a6c00, 0x0, 0x0)
> > > >    ap_mpm_run(0x3ddaf0, 0x41fbf8, 0x3e4298, 0x3e4298, 0x0, 0x0)
> > > >    perform_idle_server_maintenance(0x3ddaf0, 0xffbfef44, 0xffbfef30, 0x3ddaf0, 
> > > >          0x3e4298, 0x377990)
> > > >    make_child(0x3e4298, 0x0, 0xffbfef48, 0x1, 0x3ddaf0, 0x348864)
> > > >    child_main(0x0, 0x2367b8, 0x1, 0x0, 0xffbfee50, 0xffbfee40)
> > > >    unixd_setup_child(0x0, 0x520440, 0x567090, 0x0, 0x0, 0x0)
> > > >    set_group_privs(0x0, 0x567090, 0x520440, 0x0, 0x0, 0x0)
> > > >    _initgroups(0x4efbb0, 0x65, 0x0, 0x0, 0x0, 0x0)
> > > >    _getgroupsbymember(0x4efbb0, 0x560658, 0x10, 0x1, 0x0, 0x5606a1)
> > > >    _nss_search(0x2, 0xfeeb614c, 0x56b090, 0xffbfeb44, 0x4e455449, 0xff03f084)
> > > >    0xfeeb6178(0x55fee8, 0xffbfeb44, 0x30, 0x56b0e8, 0x72000000, 0x72000000) > > >    0xfeeb2e50(0x2, 0xff000000, 0x0, 0x56b144, 0x0, 0x0)
> > > >    0xfee89d90(0x0, 0x0, 0x0, 0xfeebbea8, 0xfeebbeb8, 0xfeece478)
> > > >    0xfee89768(0x4debd8, 0x566b98, 0x4dd5f8, 0x0, 0x0, 0xffbfe520)
> > > >    0xfee888ac(0xffffffff, 0x566b98, 0x1, 0xffbfe524, 0x0, 0x40)
> > > >    0xfee88170(0x0, 0x3, 0xfeea1f78, 0xfeea2304, 0xad8, 0x1)
> > > >    0x0(0x0, 0x800, 0x0, 0x0, 0x800, 0x0)
> > > > 
> > > 
> > > Could you please provide a gdb backtrace? It should give us more information.
> > > 
> > 
> > This issue occurred on a Customer production box that does not have debug tool
> > like gdb.
> > Also, the Apache we built is using Sun's cc compiler, not the g++. Doesn't gdb
> > only debug symbols that are generated by g++? The stack traces I provided above
> > was from a core dump using Sun's workshop (dbx). I can email you the core file
> > if you need.
> > thanks,
> > dave
> > 
> 
> I took the core and ran it with the gdb on my local box, it gives the
> following: (doesn't seem to much help though.)
>    (gdb) core-file core
>    Core was generated by `/opt/Apache/httpd/bin/nhihttpd -f /opt/
> Apache/httpd/conf/http'.
>    Program terminated with signal 11, Segmentation fault.
>    #0  0x00000000 in ?? ()


You cannot do this on your local box. This only works on the box that generated the core unless your box is a perfect clone of the box where the core dump happened :-).
Nevertheless, googling around I found similar reports of crashes in Samba and other applications that all fail when using initgroups on Solaris 9, so this could be a Solaris 9 bug.
I am no dbx expert, but it would be nice to know what string is stored at 0x4efbb0.
Furthermore please provide the settings of the User and Group directives in your httpd.conf and the name of the group with the id 101.
Comment 12 Rainer Jung 2008-03-20 14:06:13 UTC
Two more ideas:

One from sunsolve: does the ldap client configuration contain NS_LDAP_SERVER_PREF? Sun Bug ID seems to indicate a problem with that.

The other from my own experience: Sun changed the interaction between libsldap and libldap in an incompatible way between Solaris 8 and 9. I assume you compile the httpd on Solaris 9? If you run pmap against the core, which ldap libraries get listed (libldap, libsldap), especially which versions of them? If you can see libldap.so.4, then that might be the reason for the crash. It should be libldap.so.5 (and of course no ldap libs from non syytem paths).
Comment 13 Dave 2008-03-20 15:14:39 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #7)
> > > (In reply to comment #4)
> > > > (In reply to comment #2)

> > > 
> > > This issue occurred on a Customer production box that does not have debug tool
> > > like gdb.
> > > Also, the Apache we built is using Sun's cc compiler, not the g++. Doesn't gdb
> > > only debug symbols that are generated by g++? The stack traces I provided above
> > > was from a core dump using Sun's workshop (dbx). I can email you the core 
> 
> You cannot do this on your local box. This only works on the box that generated
> the core unless your box is a perfect clone of the box where the core dump
> happened :-).
> Nevertheless, googling around I found similar reports of crashes in Samba and
> other applications that all fail when using initgroups on Solaris 9, so this
> could be a Solaris 9 bug.
> I am no dbx expert, but it would be nice to know what string is stored at
> 0x4efbb0.
> Furthermore please provide the settings of the User and Group directives in
> your httpd.conf and the name of the group with the id 101.
> 

Yes, my box is not quite the same although both are Solaris 9. Also mine uses NIS rather than LDAP.

Here are the settings of User and Group in httpd.conf
User       ehealth
Group      ehealth

The group id 101 (x65) is ehealth, so that does match the one in httpd.conf.

As for Solaris 9, but why the Apache 1.3 works fine on the same server? That's why I questioned that the os level of authentication is not transparent to Apache.


Comment 14 Dave 2008-03-20 15:20:10 UTC
(In reply to comment #12)
> Two more ideas:
> 
> One from sunsolve: does the ldap client configuration contain
> NS_LDAP_SERVER_PREF? Sun Bug ID seems to indicate a problem with that.
> 
> The other from my own experience: Sun changed the interaction between libsldap
> and libldap in an incompatible way between Solaris 8 and 9. I assume you
> compile the httpd on Solaris 9? If you run pmap against the core, which ldap
> libraries get listed (libldap, libsldap), especially which versions of them? If
> you can see libldap.so.4, then that might be the reason for the crash. It
> should be libldap.so.5 (and of course no ldap libs from non syytem paths).
> 

Yes I built the httpd on Solaris 9. But I guess I need to run pmap against the core file on customer's box to find out which ldap libs it tries to use. 
But thanks for the info, I will try to get the pmap info from customer's box.

However, again, if this is ldap shared libs issue, why the Apache 1.3 works?

thanks,
dave
Comment 15 Rainer Jung 2008-03-21 03:39:07 UTC
Concerning differences to httpd 1.3: 2.2 uses apr helper libraries, which encapsulate platform specific implementation details of several general purpose APIs. On of those libs is libaprutil-1.so, which has dependencies against ldap. It could be, that your httpd 2.2 got compiled against a pre existing ldap enabled libaprutil-1.so?

You will see from the pmap, where the apr libs came from, and if something loaded a wrong version 4 ldap lib. If so, you can check via ldd, if the dependency comes from httpd (unlikely because of your configure flag) or from some other of the libs included in the pmap.
Comment 16 Dave 2008-03-21 13:46:39 UTC
(In reply to comment #15)
> Concerning differences to httpd 1.3: 2.2 uses apr helper libraries, which
> encapsulate platform specific implementation details of several general purpose
> APIs. On of those libs is libaprutil-1.so, which has dependencies against ldap.
> It could be, that your httpd 2.2 got compiled against a pre existing ldap
> enabled libaprutil-1.so?
> 
> You will see from the pmap, where the apr libs came from, and if something
> loaded a wrong version 4 ldap lib. If so, you can check via ldd, if the
> dependency comes from httpd (unlikely because of your configure flag) or from
> some other of the libs included in the pmap.
> 

Hi,
 We compile/build libapprutil-1.so as part of building httpd, so a pre existing ldap enabled libaprutil-1.so may not be the case.

Anyway, I ran a couple of tests and here are the results:

Test 1: I rebuilt httpd with  "--with-ldap" and "--enable-ldap" in configure (although the whole issue is not from mod_ldap or mod_auth_ldap, no ldap directives in httpd.conf)

  Result: It dies at beginning. A symbol ldapssl_client_init is not found in libaprutil-1.so.0.

----- from httpd-errors file------------
[Fri Mar 21 10:55:10 2008] [info] Init: Seeding PRNG with 0 bytes of entropy
[Fri Mar 21 10:55:10 2008] [info] Init: Generating temporary RSA private keys (512/1024 bits)
[Fri Mar 21 10:55:10 2008] [info] Init: Generating temporary DH parameters (512/1024 bits)
[Fri Mar 21 10:55:10 2008] [warn] Init: Session Cache is not configured [hint: SSLSessionCache]
[Fri Mar 21 10:55:10 2008] [info] Init: Initializing (virtual) servers for SSL
[Fri Mar 21 10:55:10 2008] [info] Server: Apache/2.2.3, Interface: mod_ssl/2.2.3, Library: OpenSSL/0.9.8d
[Fri Mar 21 10:55:10 2008] [info] Init: Seeding PRNG with 0 bytes of entropy
[Fri Mar 21 10:55:10 2008] [info] Init: Generating temporary RSA private keys (512/1024 bits)
[Fri Mar 21 10:55:10 2008] [info] Init: Generating temporary DH parameters (512/1024 bits)
[Fri Mar 21 10:55:10 2008] [info] Init: Initializing (virtual) servers for SSL
[Fri Mar 21 10:55:10 2008] [info] Server: Apache/2.2.3, Interface: mod_ssl/2.2.3, Library: OpenSSL/0.9.8d
[Fri Mar 21 10:55:10 2008] [info] APR LDAP: Built with Sun Microsystems Inc. LDAP SDK
ld.so.1: httpd: fatal: relocation error: file /opt/Apache/web/httpd/bin/libaprutil-1.so.0: symbol ldapssl_client_init: referenced symbol not found


Test 2 - pmap the core file
   The results indicate the following two ldap related shared libs.   
/app/local/padl/lib/libldap.so.2.0.124
/app/local/padl/lib/nss_ldap.so.1

-----------pmap result
bash-2.05$ pmap core
core 'core.1' of 19809: /opt/Apache/web/httpd/bin/httpd -f /opt/Apache/web/httpd/

00010000    3600K r-x--  /opt/Apache/web/httpd/bin/httpd
003A2000     168K rwx--  /opt/Apache/web/httpd/bin/httpd
003CC000    1688K rwx--    [ heap ]
FEAE0000      32K r-x--  /app/local/lib/libgcc_s.so.1
FEC34000       8K rwx--  /app/local/ssl/lib/libcrypto.so.0.9.7
FEC72000      16K rwx--  /app/local/padl/lib/libsasl2.so.2.0.18
FEC80000      24K r-x--  /usr/lib/libgen.so.1
FECEE000      16K rwx--  /usr/lib/libresolv.so.2
FED00000     768K r-x--  /app/local/padl/lib/libdb-4.2.so
FEE22000       8K rwx--  /app/local/ssl/lib/libssl.so.0.9.7
FEE30000       8K rwx--
FEE5A000       8K rwx--  /app/local/padl/lib/liblber.so.2.0.124
FEE60000     192K r-x--  /app/local/padl/lib/libldap.so.2.0.124
FEE9E000      24K rwx--  /app/local/padl/lib/libldap.so.2.0.124
FEEB0000      56K r-x--  /app/local/padl/lib/nss_ldap.so.1
FEECC000       8K rwx--  /app/local/padl/lib/nss_ldap.so.1
FEECE000      40K rwx--  /app/local/padl/lib/nss_ldap.so.1
FEEE0000      24K r-x--  /usr/lib/nss_files.so.1
FEEF6000       8K rwx--  /usr/lib/nss_files.so.1
FEF00000      16K rw---
:
FF1A6000       8K rwx--  /usr/lib/libpthread.so.1
FF1B0000      40K r-x--  /usr/lib/libsocket.so.1
FF1E6000       8K rwx--  /usr/lib/librt.so.1
FF202000       8K rwx--  /usr/lib/libsendfile.so.1
FF210000     288K r-x--  /opt/Apache/web/httpd/bin/libapr-1.so.0
FF2BE000      16K rwx--  /opt/Apache/web/httpd/bin/libexpat.so.0
FF2FA000      16K rwx--  /opt/Apache/web/httpd/bin/libapriconv.so.0
FF310000     176K r-x--  /opt/Apache/web/httpd/bin/libaprutil-1.so.0

FF396000       8K rwx--  /usr/lib/libm.so.1
FF3A0000       8K r-x--  /usr/platform/sun4u-us3/lib/libc_psr.so.1

FF3F0000       8K rwx--  /usr/lib/ld.so.1
FF3FA000       8K rwx--  /usr/lib/libdl.so.1
FFBF8000      32K rwx--    [ stack ]
 total     11616K
--------------------------------------

Still doesn't have much clue on this. Do Apache 1.3 and Apache2.x might link against different ldap or nss related shared libs?

thanks,
dave


Comment 17 Rainer Jung 2008-03-21 14:14:51 UTC
So we see, that you are loading "foreign" ldap libs. It's likely that they are not compatible with the Solaris nsswitch.

If you want to understand how to avoid it (and why this doesn't happen for httpd 1.3) you might need to find out, why the libs from /app/local/padl/lib get loaded. You could e.g. use ldd or better "ldd -v" against the various libs in pmap to find out, which one loads those unwanted libs.

If you think you don't need them and you don't want them, make sure that you don't have /app/local/padl/lib in your LD_LIBRARY_PATH (not only during runtime, but also during compilation). Then those libs should not have a chance to get included.

The only exception would be if one of the three other none system libs, that you use during build

/app/local/lib/libgcc_s.so.1
/app/local/ssl/lib/libcrypto.so.0.9.7
/app/local/ssl/lib/libssl.so.0.9.7

were previously compiled with a dependency against any lib in /app/local/padl/lib. But those usually don't have a foreign dependency and also I assume you use the same ones for building your httpd 1.3.
Comment 18 Dave 2008-03-24 12:29:54 UTC
(In reply to comment #17)
> So we see, that you are loading "foreign" ldap libs. It's likely that they are
> not compatible with the Solaris nsswitch.
> 
> If you want to understand how to avoid it (and why this doesn't happen for
> httpd 1.3) you might need to find out, why the libs from /app/local/padl/lib
> get loaded. You could e.g. use ldd or better "ldd -v" against the various libs
> in pmap to find out, which one loads those unwanted libs.
> 
> If you think you don't need them and you don't want them, make sure that you
> don't have /app/local/padl/lib in your LD_LIBRARY_PATH (not only during
> runtime, but also during compilation). Then those libs should not have a chance
> to get included.
> 
> The only exception would be if one of the three other none system libs, that
> you use during build
> 
> /app/local/lib/libgcc_s.so.1
> /app/local/ssl/lib/libcrypto.so.0.9.7
> /app/local/ssl/lib/libssl.so.0.9.7
> 
> were previously compiled with a dependency against any lib in
> /app/local/padl/lib. But those usually don't have a foreign dependency and also
> I assume you use the same ones for building your httpd 1.3.
> 

Thanks for the hints. Yes those libs in /app/local/padl/lib do look suspicous. 
The Apache I built is against the Solaris 9 /usr/lib/libldap.so.5. The /app/local/padl/lib is not in the LD_LIBRARY_PATH nor in compilation. I wonder if other PATHs get set besides LD_LIBRARY_PATH when Apache is started by root but ran as the user specified in the "User" directive in httpd.conf.

As I mentioned, the current work around is to start the Apache as non-root user on port other than 80. So I think either the non-root user has different LD_LIBRARY_PATH set comparing to root or it didn't even (need to) contact ldap server for group privilege validation.
But again, as for Apache 1.3, it's still unclear to me. Maybe Apache1.3 is loading the Solaris system libldap.so or nss_ldap from /usr/lib.

To clarify this, I am asking the customer to run the pmap against a child process id in the following working scenarios:
1. Run as non-root user on port 8080
2. Run Apache 1.3
Hopefully the pmap results will indicate different libldap.so or nss_ldap are used and that should explain the cause of this issue.
thanks,
dave
Comment 19 Nick Kew 2008-05-11 09:33:49 UTC
Have you run ldd on the relevant binaries (httpd and libaprutil and indeed mod_ldap) to check what ldap libs are bing loaded?

And indeed whether you get the same outcome for the case that crashes and the normal-user-on-8080 case that works, bearing in mind that the two might have different paths, library paths, or even mounts.
Comment 20 Dave 2008-05-19 14:57:18 UTC
(In reply to comment #19)
> Have you run ldd on the relevant binaries (httpd and libaprutil and indeed
> mod_ldap) to check what ldap libs are bing loaded?
> 
> And indeed whether you get the same outcome for the case that crashes and the
> normal-user-on-8080 case that works, bearing in mind that the two might have
> different paths, library paths, or even mounts.
> 

The issue turned out is the older version of ldap libs (libldap.so.2.0.124 and nss_ldap.so.1) from PADL is replacing the newer ones coming from Solaris 9.  And those ldap libs were built against older libc.so. 
So the issue is the mismatched shared lib that is causing the Apache to fail in calling the system call nss_search() -- the nss_search() in turns calls the nss_ldap.so.1 to connect the LDAP server.
  The non-admin user worked, I guess is because the Apache doesn't need to find the group privileges for that user so it didn't even invoke the ldap shared libs.

Thanks for all the responses.
dave