Bug 52865 - Crash by segmentation fault in mod_authn_core in Apache-2.4.1
Summary: Crash by segmentation fault in mod_authn_core in Apache-2.4.1
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_authn_core (show other bugs)
Version: 2.4.43
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-09 10:08 UTC by Tianyin Xu
Modified: 2020-08-31 16:01 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tianyin Xu 2012-03-09 10:08:37 UTC
To replay it, set the following configuration entries in the httpd.conf:

LoadModule authn_core_module modules/mod_authn_core.so
<AuthnProviderAlias file file1>
AuthName "dfdf"
</AuthnProviderAlias>

Start server and you will see the segmentation fault.

I don't quite understand the problem. 

I put some printf() in the invoke_cmd() function. It seems that the segfault occurs when it's executing the AuthName directive. The code reaches “return cmd->AP_TAKE1(parms, mconfig, w);” but does not reach the handler function of the AuthName directive -- set_authname().
 
Please check it. 

Thanks a lot!!!
Comment 1 Tianyin Xu 2012-03-09 10:41:42 UTC
Oh, sorry, I missed sth in the previous email.

To replay it, using the following configurations (have to load both modules):

LoadModule authn_core_module modules/mod_authn_core.so
LoadModule auth_digest_module modules/mod_auth_digest.so
<AuthnProviderAlias file file1>
AuthName "dfdf"
</AuthnProviderAlias>

It seems the two module has some conflicts?



(In reply to comment #0)
> To replay it, set the following configuration entries in the httpd.conf:
> 
> LoadModule authn_core_module modules/mod_authn_core.so
> <AuthnProviderAlias file file1>
> AuthName "dfdf"
> </AuthnProviderAlias>
> 
> Start server and you will see the segmentation fault.
> 
> I don't quite understand the problem. 
> 
> I put some printf() in the invoke_cmd() function. It seems that the segfault
> occurs when it's executing the AuthName directive. The code reaches “return
> cmd->AP_TAKE1(parms, mconfig, w);” but does not reach the handler function of
> the AuthName directive -- set_authname().
> 
> Please check it. 
> 
> Thanks a lot!!!
Comment 2 Stefan Fritsch 2012-03-12 01:06:14 UTC
It seems AuthnProviderAlias breaks some assumption in create_digest_dir_config(). The crash does not happen if I remove these lines:

--- a/modules/aaa/mod_auth_digest.c
+++ b/modules/aaa/mod_auth_digest.c
@@ -454,10 +454,6 @@ static void *create_digest_dir_config(apr_pool_t *p, char *dir)
 {
     digest_config_rec *conf;
 
-    if (dir == NULL) {
-        return NULL;
-    }
-
     conf = (digest_config_rec *) apr_pcalloc(p, sizeof(digest_config_rec));
     if (conf) {
         conf->qop_list       = apr_palloc(p, sizeof(char*));


I haven't tested if this makes AuthnProviderAlias actually work, though. Can you try it?
Comment 3 Tianyin Xu 2012-03-12 09:04:58 UTC
(In reply to comment #2)
> It seems AuthnProviderAlias breaks some assumption in
> create_digest_dir_config(). The crash does not happen if I remove these lines:
> 
> --- a/modules/aaa/mod_auth_digest.c
> +++ b/modules/aaa/mod_auth_digest.c
> @@ -454,10 +454,6 @@ static void *create_digest_dir_config(apr_pool_t *p, char
> *dir)
>  {
>      digest_config_rec *conf;
> 
> -    if (dir == NULL) {
> -        return NULL;
> -    }
> -
>      conf = (digest_config_rec *) apr_pcalloc(p, sizeof(digest_config_rec));
>      if (conf) {
>          conf->qop_list       = apr_palloc(p, sizeof(char*));
> 
> 
> I haven't tested if this makes AuthnProviderAlias actually work, though. Can
> you try it?


Yes, I tried. Now there's no segfault any more.
But actually directives like AuthName and AuthType has no effect in the <AuthnProviderAlias> block.
Comment 4 Tianyin Xu 2012-03-12 10:02:03 UTC
(In reply to comment #2)
> It seems AuthnProviderAlias breaks some assumption in
> create_digest_dir_config(). The crash does not happen if I remove these lines:
> 
> --- a/modules/aaa/mod_auth_digest.c
> +++ b/modules/aaa/mod_auth_digest.c
> @@ -454,10 +454,6 @@ static void *create_digest_dir_config(apr_pool_t *p, char
> *dir)
>  {
>      digest_config_rec *conf;
> 
> -    if (dir == NULL) {
> -        return NULL;
> -    }
> -
>      conf = (digest_config_rec *) apr_pcalloc(p, sizeof(digest_config_rec));
>      if (conf) {
>          conf->qop_list       = apr_palloc(p, sizeof(char*));
> 
> 
> I haven't tested if this makes AuthnProviderAlias actually work, though. Can
> you try it?

by the way, could you also explain a little bit about the problem?
thanks a lot!
Comment 5 Stefan Fritsch 2012-03-12 19:44:30 UTC
mod_auth_digest tries to avoid allocating memory for its own config struct in global server context because AuthDigestShmemSize, which is its only directive allowed in that context, doesn't need the struct. This optimization breaks with AuthnProviderAlias.

I don't know yet if the correct fix is to make AuthnProviderAlias simulate per-directory context, or if mod_auth_digest should be changed to either not make that optimization, or to detect global server context in a different way.

Also, I am not familiar enough with AuthnProviderAlias to say if it should support AuthName and AuthType. If yes, then this is probably a different bug than the segfault. If no, AuthnProviderAlias should log an error if these directives are used. Maybe someone more familiar with AuthnProviderAlias could comment?
Comment 6 Tianyin Xu 2012-03-13 00:28:50 UTC
(In reply to comment #5)
> mod_auth_digest tries to avoid allocating memory for its own config struct in
> global server context because AuthDigestShmemSize, which is its only directive
> allowed in that context, doesn't need the struct. This optimization breaks with
> AuthnProviderAlias.
> 
> I don't know yet if the correct fix is to make AuthnProviderAlias simulate
> per-directory context, or if mod_auth_digest should be changed to either not
> make that optimization, or to detect global server context in a different way.
> 

Vielen Dank, Stefan!

I will take a look at this issue. Your information is helpful.

> Also, I am not familiar enough with AuthnProviderAlias to say if it should
> support AuthName and AuthType. If yes, then this is probably a different bug
> than the segfault. If no, AuthnProviderAlias should log an error if these
> directives are used. Maybe someone more familiar with AuthnProviderAlias could
> comment?

Hmmm... this should not be a big thing. There are already too many silent behavior in current Apache :P
Comment 7 Cedric 2019-03-07 13:14:53 UTC
Seems to have reproduced the crash with the following configuration .

<AuthnProviderAlias ldap world_company >
        AuthName "LDAP_world_company"
        AuthLDAPBindDN "CN=xxx xxx,OU=yyy,OU=zzz,OU=People,DC=company,DC=world"
        AuthLDAPBindPassword "c0ma!"
        AuthLDAPURL ldap://*****:389/****?sAMAccountName
        Require valid-user
</AuthnProviderAlias>


I hardly managed to get the following stack trace:

#0  0x00007ffff7c00a30 in set_realm (cmd=<optimized out>, config=0x0, realm=0x7ffff43485b8 "LDAP_world_company") at mod_auth_digest.c:493
#1  0x00005555555ae3e2 in invoke_cmd (cmd=0x7ffff7c07a00 <digest_cmds>, parms=parms@entry=0x7fffffffd030, mconfig=0x0, args=<optimized out>) at config.c:928
#2  0x00005555555b0a69 in ap_walk_config_sub (section_vector=0x7ffff4348410, parms=0x7fffffffd030, current=0x7ffff436b398) at config.c:1339
#3  ap_walk_config (current=0x7ffff436b398, parms=parms@entry=0x7fffffffd030, section_vector=section_vector@entry=0x7ffff4348410) at config.c:1372
#4  0x00007ffff7bf876f in authaliassection (cmd=0x7fffffffd030, mconfig=<optimized out>, arg=0x7ffff436b380 "ldap world_company >") at mod_authn_core.c:257
#5  0x00005555555ae2af in invoke_cmd (cmd=0x7ffff7bfac90 <authn_cmds+80>, parms=parms@entry=0x7fffffffd030, mconfig=0x7ffff7bfd448, args=<optimized out>)
    at config.c:895
#6  0x00005555555b0a69 in ap_walk_config_sub (section_vector=0x7ffff7c25540, parms=0x7fffffffd030, current=0x7ffff436b338) at config.c:1339
#7  ap_walk_config (current=0x7ffff436b338, parms=parms@entry=0x7fffffffd030, section_vector=0x7ffff7c25540) at config.c:1372
#8  0x00005555555b1ec5 in ap_process_config_tree (s=<optimized out>, conftree=<optimized out>, p=0x7ffff7fc6028, ptemp=<optimized out>) at config.c:2156
#9  0x000055555558abfa in main (argc=<optimized out>, argv=<optimized out>) at main.c:686


Vars at #3:

(gdb) info args
current = 0x7ffff436b340
parms = 0x7fffffffd030
section_vector = 0x7ffff4348400
(gdb) print *current
$9 = {
  directive = 0x7ffff7bf9090 "AuthName", 
  args = 0x7ffff436b388 "\"LDAP_world_company\"", 
  next = 0x7ffff436b398, 
  first_child = 0x0, 
  parent = 0x7ffff436b2e0, 
  data = 0x0, 
  filename = 0x7ffff436b058 "/etc/apache2/sites-enabled/world-company-site.conf", 
  line_num = 25, 
  last = 0x0
}
(gdb) print *parms
$10 = {
  info = 0x0, 
  override = 72, 
  override_opts = 239, 
  override_list = 0x0, 
  limited = -1, 
  limited_xmethods = 0x0, 
  xlimited = 0x0, 
  config_file = 0x0, 
  directive = 0x7ffff436b340, 
  pool = 0x7ffff7fc6028, 
  temp_pool = 0x7ffff7c26028, 
  server = 0x7ffff7c28ac0, 
  path = 0x0, 
  cmd = 0x7ffff7c07a00 <digest_cmds>, 
  context = 0x7ffff4348400, 
  err_directive = 0x0
}

Server version: Apache/2.4.34 (Ubuntu)
Server built:   2018-10-03T13:57:22
Comment 8 janani 2020-08-31 16:01:24 UTC
Apache crash was first noticed in 2.4.41 .Error stack looked  similar to the  bug which was fixed in 2.4.43 but still the issue was seen in 2.4.43.We made system level memory setting changes even after that we could see that the issue was happening . 

# vi rc.local
   # (Add the lines below to the end. Replace eth0, eth1 with the actual names)
   #          sysctl -w net.core.rmem_max=16777216
   #          sysctl -w net.core.wmem_max=16777216
   #          sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
   #          sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
   #          sysctl -w net.ipv4.tcp_fin_timeout=10
   #          ifconfig eth0 txqueuelen 10000
   #          ifconfig eth2 txqueuelen 10000

   # reboot
14) Configure new ulimit settings
   # pbrun su-root
   # cd /etc/security
   # vi limits.conf
   # (Add these lines to the end)
   #          root             hard    nofile          16384
   #          root             soft    nofile          16384
   #          apache           hard    nofile          16384
   #          apache           soft    nofile          16384
   # :wq
   # reboot

We suspected  that the load on the server is more which may be due increase in the number of application users . We increased the apache  mpm values and since we are using worker settings which deals with increase load on the server with respect to the increased connection . 

/apps/hrs/tt/bin/ ./apachectl -V |grep MPM
Server MPM:     worker

             Server limit set was  300 .We are increasing the value to 600. 

             Start servers from 200 to 500  and Max client to 15000 
             Minspare and Maxspare thread values are changed as per the recommendation(75 and 250) from the apache documentation.




    ServerLimit          600
    StartServers         500
    MaxClients           15000
    MinSpareThreads      75 (recommended value)
    MaxSpareThreads      250 (recommended value)
    ThreadsPerChild      25
    MaxRequestsPerChild   0


But the issue was still seen with below exception post implementing the change 

I was researching on this issue  again and fortunately got the source code for apache .

Please find the source code link for apache.

people.apache.org/~igalic/checks/httpd/2012-09-14-1/report-69Y1He.html

After the below exception Server gets hung and cannot serve any request.

[Fri Aug 07 16:12:12.684173 2020] [mpm_event:debug] [pid 19580:tid 139740527363840] event.c(1810): Too many open connections (25), not accepting new conns in this process

This exception is seen from the time we enabled debugs . 

[Fri Aug 07 16:12:12.889710 2020] [mpm_event:debug] [pid 19609:tid 139740820498176] event.c(2314): AH02471: start_threads: Using epoll (wakeable)
[Fri Aug 07 16:12:12.889934 2020] [mpm_event:debug] [pid 19610:tid 139740820498176] event.c(2314): AH02471: start_threads: Using epoll (wakeable)

Lines in the source code referring to the exception 

1805                 else if (connections_above_limit()) {
1806                     disable_listensocks();
1807                     ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf,
1808                                  "Too many open connections (%u), "
1809                                  "not accepting new conns in this process",
1810                                  apr_atomic_read32(&connection_count));
1811                     ap_log_error(APLOG_MARK, APLOG_TRACE1, 0, ap_server_conf,
1812                                  "Idle workers: %u",

This %u refers to the threads per child value in mpm which we is the default recommended value but looks like that request on server is high where there is lack of this parameter . Increased the thread per child to  50  and made other parameter changes to match the value .but still the  issue was seen .

From

    ServerLimit          600
    StartServers         500
    MaxClients           15000
    MinSpareThreads      75
    MaxSpareThreads      250
    ThreadsPerChild      25

New value :

ServerLimit          300
    StartServers         200
    MaxClients           15000
    MinSpareThreads      75
    MaxSpareThreads      250
    ThreadsPerChild      50
                
we were getting other exception post the changes as below 

AH00486: server seems busy, (you may need to increase StartServers, ThreadsPerChild or Min/MaxSpareThreads), spawning 8 children, there are around 54 idle threads, 6 active children, and 6 children that are shutting down.

Made the threads per child to 40 and min and max spare s 25 and 75 respectively with the semaphore changes at the system  level but still the issue is seen 

Changes made on the server level

   # 17) Increase semaphore limits
   # pbrun su-root
   # vi /etc/sysctl.conf
   # Add this line:
   #    # Add additional semaphores for Channel Secure and mod_rewrite
   #    kernel.sem = 4096 512000 1600 9000
   # :wq
   # sysctl -p /etc/sysctl.conf
   
   
 What I notice is webagent fails to initialise all the time and the server loses the connection ( no connections hit from F5 or the server does not take u the connection which  like very low count of user when netstat performed say 25 and then the apacheURL goes down for the individual apache server though apache process is up .