Bug 63687

Summary: High Memory usage after upgrade to 2.4.41
Product: Apache httpd-2 Reporter: nitop <it>
Component: CoreAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: major CC: curtis.wilson, gabrielh.pineda, it, renato.alves.nogueira
Priority: P2 Keywords: FixedInTrunk
Version: 2.4.41   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: High Memory usage
apache2.conf
Apache2.4.41 Memory Usage without APR- and pcre-update
CPU usage drop after Apache Downgrade
Response time drop after Apache Downgrade
gdb_dump_all_pools
gdb_dump_all_pools_latestinit
Dump of thread
Data with mod_rbld disabled
Vhosts
Fresh child dump
Pmaps of fressh child and main
ssl stapling leak fix
simplified patch

Description nitop 2019-08-23 06:56:28 UTC
Hello,

after we upgraded 7 of our systems to apr-1.7.0 (from 1.6.5), pcre-8.43 (from 8.42) and apache 2.4.41 (from 2.4.39) we have a hugh memory problem - you can see this in the screenshot attached. Upgrade was performed on 20 Aug.

We do not changed anything in the config-files.
After rolling back to the old versions, everything is fine.

Any advice?
Comment 1 nitop 2019-08-23 06:57:09 UTC
Created attachment 36732 [details]
High Memory usage

High Memory usage after upgrading
Comment 2 nitop 2019-08-23 07:00:20 UTC
Additional details:
All systems are running with Debian and Kernel "4.4.186".
Comment 3 nitop 2019-08-23 14:29:00 UTC
Created attachment 36733 [details]
apache2.conf

httpd/apache2 config
Comment 4 Joe Orton 2019-08-28 12:22:38 UTC
You've made three changes at the same time which increases the difficult in diagnosing this.  Can you try httpd 2.4.41 on the old APR/PCRE versions, and see if that also has the same memory problem?
Comment 5 Curtis Wilson 2019-08-29 03:42:08 UTC
I have run into the same issue on my servers, they are all Centos 6 and are running cPanel. After the updates I found that I was getting constant issues with memory use from apache in the worker mpm, most times when I would get to them all httpd processes were using 8-12% memory, and causing the boxes to overcommit. 

They updated to 
Apache 2.4.41 (8/22/2019)
APR 1.7 (07/03/2019)
pcre is at 7.8-7

Apache is being obtained from the cPanel easyapache repository.
Comment 6 Curtis Wilson 2019-08-29 03:43:17 UTC
Also to note, until I can figure out the cause as to why this is happening, I have had to downgrade them all to Apache 2.4.39.
Comment 7 Ruediger Pluem 2019-08-29 08:00:18 UTC
(In reply to Curtis Wilson from comment #5)
> I have run into the same issue on my servers, they are all Centos 6 and are
> running cPanel. After the updates I found that I was getting constant issues
> with memory use from apache in the worker mpm, most times when I would get
> to them all httpd processes were using 8-12% memory, and causing the boxes
> to overcommit. 
> 
> They updated to 
> Apache 2.4.41 (8/22/2019)
> APR 1.7 (07/03/2019)
> pcre is at 7.8-7

So only httpd and APR where updated, correct? pcre remained unchanged?

Like with the other reporter, can you just update Apache to isolate the component that causes this?
Comment 8 Ruediger Pluem 2019-08-29 08:07:00 UTC
(In reply to nitop from comment #3)
> Created attachment 36733 [details]
> apache2.conf
> 
> httpd/apache2 config

The given configuration does not show me which MPM you are using. It may be configured in
/etc/apache2/mods-enabled/*.load
/etc/apache2/mods-enabled/*.conf
Which MPM do you use?
Comment 9 nitop 2019-08-29 08:41:04 UTC
Hello,

"Which MPM do you use?"

-> We use worker.

I've now just updated apache2 to 2.4.41 - so far no problems with Memory BUT, I saw this:

# apache2 -V
Server version: Apache/2.4.41 (Unix)
Server loaded:  APR 1.7.0, APR-UTIL 1.6.1
Compiled using: APR 1.6.5, APR-UTIL 1.6.1

Why is this different?
We've compiled APR 1.6.5 and NOT 1.7.0.

Thank you!
Comment 10 Ruediger Pluem 2019-08-29 09:40:40 UTC
(In reply to nitop from comment #9)
> Hello,
> 
> "Which MPM do you use?"
> 
> -> We use worker.
> 
> I've now just updated apache2 to 2.4.41 - so far no problems with Memory
> BUT, I saw this:
> 
> # apache2 -V
> Server version: Apache/2.4.41 (Unix)
> Server loaded:  APR 1.7.0, APR-UTIL 1.6.1
> Compiled using: APR 1.6.5, APR-UTIL 1.6.1
> 
> Why is this different?
> We've compiled APR 1.6.5 and NOT 1.7.0.

This is because you complied it against 1.6.5, but when started the httpd process finds 1.7.0 first and hence loads it.
Comment 11 nitop 2019-08-29 10:14:07 UTC
It's definitly Apache 2.4.41. I've updated only Apache2 this morning at ~9:00am - without APR or PCRE:
Please see attached Memory-Graph (mem_usage_ONLY_Apache2_4_41.PNG)
Comment 12 nitop 2019-08-29 10:15:26 UTC
Created attachment 36743 [details]
Apache2.4.41 Memory Usage without APR- and pcre-update
Comment 13 Ruediger Pluem 2019-08-29 15:41:15 UTC
Have you compiled your Apache with debugging symbols?
Are you able to attach to such a memory consuming process with gdb?
If this is the case it would be helpful if you could use the following .gdbinit for your gdb session http://svn.apache.org/viewvc/httpd/httpd/trunk/.gdbinit?revision=1866078&view=co
Once you attached to such a memory consuming process with gdb using the above .gdbinit please execute the following command on gdb side and report back the output:
dump_all_pools
Comment 14 nitop 2019-09-17 06:20:00 UTC
@RuedigerPluem
"Have you compiled your Apache with debugging symbols?"

No. Is this necessary to go on with gdb?

"Are you able to attach to such a memory consuming process with gdb?"

I've not tried it yet. How should I do this?
I can not use your .gdbinit, it outputs some errors - we have an old debian here and gdb 7.0.1.
Comment 15 Ruediger Pluem 2019-09-17 06:42:20 UTC
(In reply to nitop from comment #14)
> @RuedigerPluem
> "Have you compiled your Apache with debugging symbols?"
> 
> No. Is this necessary to go on with gdb?

Yes, you need the debugging symbols to extract the proper information from the process.

> 
> "Are you able to attach to such a memory consuming process with gdb?"
> 
> I've not tried it yet. How should I do this?

http://httpd.apache.org/dev/debugging.html#backtrace

> I can not use your .gdbinit, it outputs some errors - we have an old debian
> here and gdb 7.0.1.

This is bad. The lowest version I tested the .gdbinit with was with 7.2. What are the error messages?
Comment 16 nitop 2019-09-17 07:22:28 UTC
@RuedigerPluem

Here are the errors with your ".gdbinit":
gdb apache2 16606
Reading symbols from /usr/sbin/apache2...done.
Attaching to program: /usr/sbin/apache2, process 16606

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffc77fa000
0x00007fc035f1b303 in ?? ()
Traceback (most recent call last):
  File "<string>", line 78, in <module>
  File "<string>", line 8, in __init__
AttributeError: 'module' object has no attribute 'COMMAND_USER'
/usr/local/src/mni/.gdbinit:548: Error in sourced command file:
Error while executing Python code.

"Yes, you need the debugging symbols to extract the proper information from the process."

-> Did you mean the flag "--enable-maintainer-mode"?
Comment 17 Ruediger Pluem 2019-09-17 08:02:53 UTC
(In reply to nitop from comment #16)
> @RuedigerPluem
> 
> Here are the errors with your ".gdbinit":
> gdb apache2 16606
> Reading symbols from /usr/sbin/apache2...done.
> Attaching to program: /usr/sbin/apache2, process 16606
> 
> warning: no loadable sections found in added symbol-file system-supplied DSO
> at 0x7fffc77fa000
> 0x00007fc035f1b303 in ?? ()
> Traceback (most recent call last):
>   File "<string>", line 78, in <module>
>   File "<string>", line 8, in __init__
> AttributeError: 'module' object has no attribute 'COMMAND_USER'
> /usr/local/src/mni/.gdbinit:548: Error in sourced command file:
> Error while executing Python code.

I cannot fix this. You would need to use a higher version of gdb in this case. Unfortunately the Python code in .gdbinit is essential for debugging your issue.

> 
> "Yes, you need the debugging symbols to extract the proper information from
> the process."
> 
> -> Did you mean the flag "--enable-maintainer-mode"?

That would be one way. Another less strict one is to

export CFLAGS="-Wall -O2 -g"

before you run the configure script.
Comment 18 nitop 2019-09-17 08:52:03 UTC
I am now running gdb 7.2 and tried it again:

/usr/local/bin/gdb apache2 29505 
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/apache2...done.
Attaching to program: /usr/sbin/apache2, process 29505
Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libpcre.so.3
Reading symbols from /usr/lib/libaprutil-1.so.0...done.
Loaded symbols for /usr/lib/libaprutil-1.so.0
Reading symbols from /usr/lib/libexpat.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libexpat.so.1
Reading symbols from /usr/lib/libapr-1.so.0...done.
Loaded symbols for /usr/lib/libapr-1.so.0
Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
....
....
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffd92a92000
0x00007f7bcee85303 in select () from /lib/libc.so.6
.gdbinit:548: Error in sourced command file:
Python scripting is not supported in this copy of GDB.
(gdb) dump_all_pools
Undefined command: "dump_pool_and_children".  Try "help".
Comment 19 nitop 2019-09-17 09:19:36 UTC
We can not continue with debugging because of larger dependencies (gdb, python, ...)

Can someone else debug that with the same problem?
Comment 20 Ruediger Pluem 2019-09-17 10:37:40 UTC
(In reply to nitop from comment #18)
> I am now running gdb 7.2 and tried it again:

> warning: no loadable sections found in added symbol-file system-supplied DSO
> at 0x7ffd92a92000
> 0x00007f7bcee85303 in select () from /lib/libc.so.6
> .gdbinit:548: Error in sourced command file:
> Python scripting is not supported in this copy of GDB.
> (gdb) dump_all_pools
> Undefined command: "dump_pool_and_children".  Try "help".

This gdb does not have Python scripting enabled.
How did you get this gdb 7.2? Did you compile it on your own or did you download it from somewhere? If you compile it on your own you need to specify --with-python to the configure script of gdb. Of cause this requires Python to be available on your system. Not sure how these Python packages are named in Debian.
Comment 21 nitop 2019-09-17 11:04:33 UTC
Someone else should debug this.
We've some hugh dependencies here and can not get >= gdb7.2 with python2.7 to work.
Comment 22 Ruediger Pluem 2019-09-17 12:42:04 UTC
(In reply to nitop from comment #21)
> Someone else should debug this.
> We've some hugh dependencies here and can not get >= gdb7.2 with python2.7
> to work.

Just for the sake of completeness: My GDB 7.2 uses Python 2.6 as this is what my Centos 6 delivers as OS package.
Comment 23 Jim Jagielski 2019-09-24 15:48:30 UTC
I am wondering if the changes to PCRE_DOTALL is doing this. If using regex a lot try:

    RegexDefaultOptions -DOTALL
Comment 24 Jim Jagielski 2019-09-24 15:52:42 UTC
Are you using HTTP2 and/or mod_md at all? This will allow us to pare down the diffs
Comment 25 nitop 2019-09-25 11:53:32 UTC
@Jim Jagielski
We do not use http2 or mod_md.
Comment 26 Jim Jagielski 2019-09-26 12:21:47 UTC
Thanks... that helps to narrow down things quite a bit.
Comment 27 Renato Nogueira 2019-12-05 23:19:22 UTC
(In reply to Ruediger Pluem from comment #7)
> (In reply to Curtis Wilson from comment #5)
> > I have run into the same issue on my servers, they are all Centos 6 and are
> > running cPanel. After the updates I found that I was getting constant issues
> > with memory use from apache in the worker mpm, most times when I would get
> > to them all httpd processes were using 8-12% memory, and causing the boxes
> > to overcommit. 
> > 
> > They updated to 
> > Apache 2.4.41 (8/22/2019)
> > APR 1.7 (07/03/2019)
> > pcre is at 7.8-7
> 
> So only httpd and APR where updated, correct? pcre remained unchanged?
> 
> Like with the other reporter, can you just update Apache to isolate the
> component that causes this?

Hy Ruediger,

So, I was with the same problem here and i found this topic talking exactly about. And the problem is aparently solved. I would like to explain the situation in hope this help you to identify another causes like this.

Here comes the history..
My server is an Amazon Linux. The problematic httpd 2.4.41 has been compiled from source because a security audit demanded. In that day  I could not install it through yum because the amazon linux repositories already had the last apache version, i don´t remember exactly, but think the 2.4.34 and the security reports suggested the 2.4.41. Ok this was the history.

So in your Comments #5 and #7 I was about to follow that you suggested but, before I would like to test if my httpd's earlier version would work normally, and i did.

I renamed my current apache directory, undo the custom settings and installed the "previous" apache version throught yum. To my great surprise yum told me it would install 2.4.41 version. I got a little confused but if yum gave me 2.4.41, that´s ok. 

Ahh, I forgot to talk that the MPM chosen was the Event.

Imediately after install i check the version with httpd -V (After a few corrections to make sure the httpd binary's running was that installed by yum) and the result was as follow:

My Server with Memory Leakage:

Server version: Apache/2.4.41 (Unix)
Server built:   Nov  6 2019 00:42:00
Server's Module Magic Number: 20120211:88
Server loaded:  APR 1.7.0, APR-UTIL 1.6.1
Compiled using: APR 1.7.0, APR-UTIL 1.6.1
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)

The same server with memory leakage fixed after install httpd 2.4.41 throught yum (Actally not the same server, the above server it´s a clone in homologation enviroment):

Server version: Apache/2.4.41 ()
Server built:   Oct 22 2019 22:59:04
Server's Module Magic Number: 20120211:88
Server loaded:  APR 1.6.3, APR-UTIL 1.6.1
Compiled using: APR 1.6.3, APR-UTIL 1.6.1
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)

Another thing maybe help you is that these two servers above were replicated from another server's image, also running apache. This image has been used to create about 10 servers and none of them have this problem. Here is the httpd -V result from the root server:

Server version: Apache/2.4.34 ()
Server built:   Aug 17 2018 22:14:33
Server's Module Magic Number: 20120211:79
Server loaded:  APR 1.6.3, APR-UTIL 1.6.1
Compiled using: APR 1.6.3, APR-UTIL 1.6.1
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)

Ok, as i see, the problematic guy was compiled using APR 1.7.0 and all of the others using APR 1.6.3. 

I didn´t did none of your suggested tests. Until now, my problem aparently resolved, as 1 day has been passed and no memory leaks. In fact, memory consumption didn´t even go up.

My server has 2 GB RAM, it runs just one application, with just one connection for just one user, and with memory leakage, the server was freezing after about 5 hours of use. 

The server with no problem, runs 5 httpd children, and each one consumes about 27MB each. The problematic starts with 3 children and the memory grows up to ceiling.
Comment 28 nitop 2019-12-09 13:11:47 UTC
It still doesn't look good here.
I've compiled "APR 1.6.3" and "APR-UTIL 1.6.1" with "Apache/2.4.41".

APR 1.6.3:
./configure --prefix=/usr/local/apr/
make
make install

APR-UTIL 1.6.1:
./configure --prefix=/usr/local/apr/ --with-apr=/usr/local/apr/
make
make install

Apache2:
--with-apr=/usr/local/apr/bin/apr-1-config --with-apr-util=/usr/local/apr/bin/apu-1-config

--> High Memory usage again.
Comment 29 nitop 2019-12-09 13:12:46 UTC
apache2ctl -V
Server version: Apache/2.4.41 (Unix)
Server built:   Dec  9 2019 08:44:12
Server's Module Magic Number: 20120211:88
Server loaded:  APR 1.6.3, APR-UTIL 1.6.1
Compiled using: APR 1.6.3, APR-UTIL 1.6.1
Architecture:   64-bit
Server MPM:     worker
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
Server compiled with....
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_SYSVSEM_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=256
 -D HTTPD_ROOT=""
 -D SUEXEC_BIN="/usr/lib/apache2/suexec"
 -D DEFAULT_PIDLOG="/var/run/apache2/httpd.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="/etc/apache2/mime.types"
 -D SERVER_CONFIG_FILE="/etc/apache2/apache2.conf"
Comment 30 Curtis Wilson 2020-02-24 17:01:46 UTC
Created attachment 37038 [details]
CPU usage drop after Apache Downgrade
Comment 31 Curtis Wilson 2020-02-24 17:02:14 UTC
Created attachment 37039 [details]
Response time drop after Apache Downgrade
Comment 32 Curtis Wilson 2020-02-24 17:03:59 UTC
We are still seeing this issue actively posing a problem and causing performance issues for servers running Apache 2.4.41. Is there any other information that is needed to investigate this matter further as we would like to see this issue resolved. If there is any other information needed we would like to help gather this where we can. 

Attached I provided an example of just data, we collected form our servers that provide us an overall look at things. On Feb 20, we were having performance issues with a server using Apache 2.4.41 from EasyApache on cPanel and we downgraded this back to Apache 2.4.39 and there is a noticeable sharp decrease in CPU usage and actual site response times from the test sites on this server.
Comment 33 Giovanni Bechis 2020-02-24 21:56:14 UTC
Have you tried disabling PCRE_DOTALL as suggested in comment #23 ?
Just add "RegexDefaultOptions -DOTALL" to your httpd.conf and restart httpd(8).
Comment 34 Curtis Wilson 2020-02-26 15:41:35 UTC
It does not look like adding "RegexDefaultOptions -DOTALL" to the httpd.conf is working. We added this on a few test servers 2 days ago and this morning all of them are starting to overcommit again with the problem being the memory consumption of Apache. 

# httpd -V
Server version: Apache/2.4.41 (cPanel)
Server built:   Nov 19 2019 16:06:59
Server's Module Magic Number: 20120211:88
Server loaded:  APR 1.7.0, APR-UTIL 1.6.1
Compiled using: APR 1.7.0, APR-UTIL 1.6.1
Architecture:   64-bit
Server MPM:     worker
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)


# grep DOTALL /etc/httpd/conf/httpd.conf
RegexDefaultOptions -DOTALL

We are using HTTP2, and we are not using mod_md. Any information that will help bring this to a resolution we would like to help and provide if we can.
Comment 35 nitop 2020-02-27 13:19:19 UTC
Hello,
I've also tried it again:
Setting "RegexDefaultOptions -DOTALL" does not help us.

The servers start overcommitting after a few hours - so we have to go back again to 2.4.39.
It does not matter in which version APR is compiled.

# apache2ctl -V
Server version: Apache/2.4.41 (Unix)
Server built:   Feb 27 2020 07:54:36
Server's Module Magic Number: 20120211:88
Server loaded:  APR 1.6.3, APR-UTIL 1.6.1
Compiled using: APR 1.6.3, APR-UTIL 1.6.1
Architecture:   64-bit
Server MPM:     worker
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
	
# grep RegexDefaultOption /etc/apache2/apache2.conf 
RegexDefaultOptions -DOTALL

We are not using HTTP2 or mod_md.

Compiled with:

APR 1.6.3:
./configure --prefix=/usr/local/apr/
make
make install

APR-UTIL 1.6.1:
./configure --prefix=/usr/local/apr/ --with-apr=/usr/local/apr/
make
make install

Apache2:
./configure --enable-layout=Debian --enable-so --with-program-name=apache2 --with-suexec-caller=www-data --with-mpm=worker --with-suexec-bin=/usr/lib/apache2/suexec --with-suexec-docroot=/var/www --with-suexec-userdir=public_html
 --with-suexec-logfile=/var/log/apache2/suexec.log --with-suexec-uidmin=100 --enable-suexec=shared --enable-log-config=static --enable-logio=static --enable-version=static --with-apr=/usr/local/apr/bin/apr-1-config --with-apr-uti
l=/usr/local/apr/bin/apu-1-config --with-pcre=/usr/local/pcre --enable-pie --with-ssl=/usr/lib/ssl --enable-ssl=shared --enable-vhost-alias=shared --enable-module=shared --enable-authn-alias=shared \
                      --enable-disk-cache=shared --enable-cache=shared \
                      --enable-mem-cache=shared --enable-file-cache=shared \
                      --enable-cern-meta=shared --enable-dumpio=shared --enable-ext-filter=shared \
                      --enable-charset-lite=shared --enable-cgi=shared \
                      --enable-dav-lock=shared --enable-log-forensic=shared \
                      --enable-proxy=shared \
                      --enable-proxy-connect=shared --enable-proxy-ftp=shared \
                      --enable-proxy-http=shared --enable-proxy-ajp=shared \
                      --enable-proxy-scgi=shared \
                      --enable-proxy-balancer=shared \
                      --enable-authn-dbm=shared --enable-authn-anon=shared \
                      --enable-authn-dbd=shared --enable-authn-file=shared \
                      --enable-authn-default=shared --enable-authz-host=shared \
                      --enable-authz-groupfile=shared --enable-authz-user=shared \
                      --enable-authz-dbm=shared --enable-authz-owner=shared \
                      --enable-authz-default=shared \
                      --enable-auth-basic=shared --enable-auth-digest=shared \
                      --enable-dbd=shared --enable-deflate=shared \
                      --enable-include=shared --enable-filter=shared \
                      --enable-env=shared --enable-mime-magic=shared \
                      --enable-expires=shared --enable-headers=shared \
                      --enable-ident=shared --enable-usertrack=shared \
                      --enable-unique-id=shared --enable-setenvif=shared \
                      --enable-status=shared \
                      --enable-autoindex=shared --enable-asis=shared \
                      --enable-info=shared --enable-cgid=shared \
                      --enable-dav=shared --enable-dav-fs=shared \
                      --enable-vhost-alias=shared --enable-negotiation=shared \
                      --enable-dir=shared --enable-imagemap=shared \
                      --enable-actions=shared --enable-speling=shared \
                      --enable-userdir=shared --enable-alias=shared \
                      --enable-rewrite=shared --enable-mime=shared \
                      --enable-substitute=shared --enable-reqtimeout=shared
Comment 36 Yann Ylavic 2020-02-27 13:39:29 UTC
(In reply to nitop from comment #35)
> It does not matter in which version APR is compiled.

Could you please run httpd with LD_LIBRARY_PATH including your compiled apr/lib directory or alternatively configure like:
    LDFLAGS="-Wl,-rpath,/usr/local/apr/lib" ./configure ...

Compiling with an APR version doesn't mean httpd will link to it at runtime, unless one of the above is used. Then we can really figure out whether it's due to APR-1.7 or not.
Comment 37 Curtis Wilson 2020-02-27 19:26:14 UTC
I will be adding "SetEnv LD_LIBRARY_PATH $path" on 5 test boxes tonight the the location of the 1.7.0 APR that cPanel provides. However I do want to point out that APR 1.7.0 has had 0 issues or at least we are not seeing issues in Apache 2.4.39. It looks like we received Apache APR 1.7.0 in May of 2019 and it was already running on our servers with Apache 2.4.39 before the release of Apache 2.4.41, it seems that after Apache 2.4.41 released and was distributed whether via cPanel or normal repositories are when issues started occurring. Once I have an update on the test boxes I will update though.
Comment 38 Curtis Wilson 2020-02-27 22:29:13 UTC
Due to our Apache being provided by cPanel with EasyApache4, we will not be able to custom compile different APR or Apache versions to test. Setting the LD_LIBRARY_PATH can be done in /etc/sysconfig/http . Older versions of Apache and the APR would only be able to be obtained via RPM from cPanel and those older RPM's do not exist any longer. What we have noticed is without specifying what path to use is that Apache is opening the right APR, verified by using lsof on the Apache PID's..
Comment 39 Yann Ylavic 2020-02-28 00:24:32 UTC
(In reply to Curtis Wilson from comment #38)
Can you apply Ruediger's debugging steps from comment #13 on your system?
When the memory is high enough, that would be a good way to gather informations on what happens in httpd-2.4.41 (at least) with apr-1.7, the combination that seems to matter.
Comment 40 Curtis Wilson 2020-02-28 04:24:19 UTC
I don't believe that .gdbinit is complete. When you use dump_all_pools, it tries to call dump_pool_and_children, which looks like it is done via the python portion but is not actually defined and does not exist. 

(gdb) dump_all_pools
Undefined command: "dump_pool_and_children".  Try "help".

This is not actively happening, but I did have to install debug packages and restart httpd in order to be able to provide this info when it is.
Comment 41 Ruediger Pluem 2020-02-28 10:37:34 UTC
dump_pool_and_children is contained in .gdbinit (at least the one from here: http://svn.apache.org/viewvc/httpd/httpd/branches/2.4.x/.gdbinit?revision=1866656&view=markup). It might be possible that your gdb does not have Python support. Can you try the following gdb command once you started gdb and post back the results?
python print(True)
Comment 42 Ruediger Pluem 2020-02-28 10:38:37 UTC
What does
gdb --version
deliver?
Comment 43 Curtis Wilson 2020-02-28 16:34:25 UTC
(gdb) python print(True)
True

# gdb --version
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-92.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Comment 44 Giovanni Bechis 2020-02-28 18:05:54 UTC
To be able to debug properly this issue, you should install ea-apache24-debuginfo rpm package.
This will provide you a httpd.debug binary; it's the same binary than httpd(8) but built by cPanel with debug symbols.
Comment 45 Curtis Wilson 2020-02-28 18:12:56 UTC
It is already installed, the only issue we have is finding a debug package for 

lua-5.1.4-4.1.el6.x86_64

The issue is that when I execute gdb -p $pid it loads everything in the .gdbinit and when I execute dump_all_pools it goes through and tries to then execute dump_pool_and_children which gdb shell is not finding as a defined command to execute. 

If I put define dump_pool_and_children above the execution of python it does try to work. However I run into this. 

(gdb) dump_all_pools
Traceback (most recent call last):
  File "<string>", line 78, in <module>
  File "<string>", line 8, in __init__
AttributeError: 'module' object has no attribute 'COMMAND_USER'
Error while executing Python code.
(gdb)

However without that being added all that is reported is. 

(gdb) dump_all_pools
Undefined command: "dump_pool_and_children".  Try "help".
Comment 46 Curtis Wilson 2020-02-28 18:41:21 UTC
Okay, I think I found the problem. It was an error I was missing, python is available however it is not playing nice. What version of python does this expect as it is failing to load the python portion of this gdbinit. The systems in question all have python 2.6.6 .
Comment 47 Curtis Wilson 2020-02-28 18:57:14 UTC
The problem we are having is related to the gdb version on the systems. They all run with gdb-7.2-92.el6
Comment 48 Curtis Wilson 2020-02-28 19:10:58 UTC
This is working now, enabled the right gdb so when this does start happening again through the weekend I will grab data from this.
Comment 49 Curtis Wilson 2020-03-02 17:28:58 UTC
Created attachment 37054 [details]
gdb_dump_all_pools

Today one of our test servers was using around 4.3% per thread and causing issues where the server was overcommitting. I was able to grab the information using gdb and the provided .gdbinit , executing dump_all_pools .
Comment 50 Ruediger Pluem 2020-03-03 07:26:40 UTC
(In reply to Curtis Wilson from comment #49)
> Created attachment 37054 [details]
> gdb_dump_all_pools
> 
> Today one of our test servers was using around 4.3% per thread and causing
> issues where the server was overcommitting. I was able to grab the
> information using gdb and the provided .gdbinit , executing dump_all_pools .

Thanks. Looks like a high number of subpools of the pconf pool are created. Now we need to figure out why :-).
Can you please provide a complete list of all modules that you load with your configuration?
A

<Path to your httpd binary>/httpd -t -D DUMP_MODULES -f <path to your httpd.conf>

should provide this.
Comment 51 Ruediger Pluem 2020-03-03 14:19:17 UTC
Next time you come accross the overcommit situation can you please use http://svn.apache.org/viewvc/httpd/httpd/trunk/.gdbinit?revision=1874723&view=markup as .gdbinit when you do a dump_all_pools?
This should get you rid of the
Python Exception <type 'exceptions.RuntimeError'> maximum recursion depth exceeded:
messages and give us the full output.
Comment 52 Curtis Wilson 2020-03-03 16:04:06 UTC
The loaded modules are as follows: 

Loaded Modules:
 core_module (static)
 so_module (static)
 http_module (static)
 mpm_worker_module (shared)
 cgid_module (shared)
 access_compat_module (shared)
 actions_module (shared)
 alias_module (shared)
 asis_module (shared)
 auth_basic_module (shared)
 auth_digest_module (shared)
 authn_core_module (shared)
 authn_dbd_module (shared)
 authn_dbm_module (shared)
 authn_file_module (shared)
 authz_core_module (shared)
 authz_dbm_module (shared)
 authz_groupfile_module (shared)
 authz_host_module (shared)
 authz_user_module (shared)
 autoindex_module (shared)
 cache_module (shared)
 cache_disk_module (shared)
 dav_module (shared)
 dav_fs_module (shared)
 dav_lock_module (shared)
 dbd_module (shared)
 deflate_module (shared)
 dir_module (shared)
 env_module (shared)
 expires_module (shared)
 file_cache_module (shared)
 filter_module (shared)
 headers_module (shared)
 include_module (shared)
 log_config_module (shared)
 log_forensic_module (shared)
 logio_module (shared)
 mime_module (shared)
 mime_magic_module (shared)
 negotiation_module (shared)
 proxy_module (shared)
 lbmethod_byrequests_module (shared)
 lbmethod_bytraffic_module (shared)
 proxy_ajp_module (shared)
 proxy_balancer_module (shared)
 proxy_connect_module (shared)
 proxy_fcgi_module (shared)
 proxy_ftp_module (shared)
 proxy_http_module (shared)
 proxy_scgi_module (shared)
 proxy_wstunnel_module (shared)
 remoteip_module (shared)
 reqtimeout_module (shared)
 rewrite_module (shared)
 setenvif_module (shared)
 slotmem_shm_module (shared)
 socache_dbm_module (shared)
 socache_shmcb_module (shared)
 socache_redis_module (shared)
 speling_module (shared)
 status_module (shared)
 substitute_module (shared)
 suexec_module (shared)
 unique_id_module (shared)
 unixd_module (shared)
 userdir_module (shared)
 version_module (shared)
 bwlimited_module (shared)
 ssl_module (shared)
 fcgid_module (shared)
 http2_module (shared)
 security2_module (shared)
 suphp_module (shared)
 passenger_module (shared)
 rbld_module (shared)

Nex time this overcommits I will grab the full output for you.
Comment 53 Curtis Wilson 2020-03-03 16:35:16 UTC
I would also like to note that we are open to some form of video conference while this is happening so that we can gather more information as needed.
Comment 54 Christophe JAILLET 2020-03-03 18:19:26 UTC
Hi,

I gave a look at rbld_module.

If the module is the following one on github
https://github.com/bluehost/mod_rbld/blob/master/mod_rbld.c#L370

The pool allocated above is leaking in many paths, including what looks to me normal use cases.
This could explain some memory leak, but the corresponding parent pool should not be 'pconf' as it seems to be.


Do you need bwlimited_module and the last 4 modules in the list?
These at least looks like 3rd party modules.
Comment 55 Curtis Wilson 2020-03-04 13:54:33 UTC
Created attachment 37060 [details]
gdb_dump_all_pools_latestinit

Attached now is a dump_all_pools output without the tracebacks at the end where this fully completes the output.
Comment 56 Curtis Wilson 2020-03-04 14:00:00 UTC
The last 4 modules on the list are not modules that would be able to disable as some of the provide basic functionality and security for us. 

 
security2_module (shared) - Mod_security
 
suphp_module (shared) - What allows user PHP Scripts to be executed and served through httpd 
 
passenger_module (shared) - Used for Ruby


bwlimited_module - This module is what is used to display overbandwidth pages, when a user has reached their allocated bandwidth limit in cPanel.
Comment 57 Ruediger Pluem 2020-03-04 14:36:51 UTC
(In reply to Curtis Wilson from comment #55)
> Created attachment 37060 [details]
> gdb_dump_all_pools_latestinit
> 
> Attached now is a dump_all_pools output without the tracebacks at the end
> where this fully completes the output.

Thanks. Quick question. Which process did you dump? Was it the main process (the one that is the parent to all other httpd processes)? Or was it a child process?
The reason I am asking is that I see no transaction pool on this process which is a bit unusal.
Comment 58 Curtis Wilson 2020-03-04 14:51:58 UTC
On each of these I dumped the main process that everything is forked from.
Comment 59 Yann Ylavic 2020-03-04 16:19:48 UTC
(In reply to Curtis Wilson from comment #58)
> On each of these I dumped the main process that everything is forked from.

Is it the main process which is leaking memory?
Comment 60 Curtis Wilson 2020-03-04 16:21:54 UTC
So the issue is it is every process under the main, I can grab a few dumps on this when the server starts showing signs again to prevent downtime when it reaches the point where it does affect traffic we do have to fully restart httpd.
Comment 61 Ruediger Pluem 2020-03-05 09:25:04 UTC
Next time you experience the issue, can you please provide:

1. ps -e -o pid,comm,vsz,rsz | grep httpd
2. A dump_all_pools from the process that is memory leaking. It should be a child
   process. If not then from the main process.
3. The pid of this process
4. A pmap <pid> of this pid?
5. If the leaking process is the main process please provide a
    thread apply all bt full
    from your gdb session.
Comment 62 Joe Orton 2020-03-05 10:01:16 UTC
A technique I've found helpful for catching leaks is:

gdb <attach to actively leaking child>
break sbrk
cont
...
bt full

to get a backtrace at the point sbrk is called to expand the heap.  Not sure how well this works with a threaded MPM.
Comment 63 Curtis Wilson 2020-03-06 14:02:08 UTC
Created attachment 37066 [details]
Dump of thread

Attahced is the dump of a thread, we also beleive we may have solved this, at least in our case and are testing this also.
Comment 64 Dybala R 2020-03-09 11:55:53 UTC
That's nice <a href="https://www.google.com"> hao</a>
Comment 65 Ruediger Pluem 2020-03-09 12:06:59 UTC
(In reply to Curtis Wilson from comment #63)
> Created attachment 37066 [details]
> Dump of thread
> 
> Attahced is the dump of a thread, we also beleive we may have solved this,
> at least in our case and are testing this also.

Did you solve it and if yes how?
Comment 66 Curtis Wilson 2020-03-09 12:41:34 UTC
We attempted disabling mod_rbld, on our test servers and it was working well,  however over the weekend it looks like this started to lead to the same issue. I will attach a new set of data with mod_rbld disabled.
Comment 67 Curtis Wilson 2020-03-09 12:42:03 UTC
Created attachment 37077 [details]
Data with mod_rbld disabled
Comment 68 Ruediger Pluem 2020-03-10 07:20:44 UTC
(In reply to Curtis Wilson from comment #67)
> Created attachment 37077 [details]
> Data with mod_rbld disabled

Thanks. More questions:

1. Have you set MaxMemFree in your configuration and if yes to what value?
2. Can you deliver all ProxyPass[Match] directives in your configuration?
3. Can you deliver all <Proxy > blocks in your configuration?
Comment 69 Ruediger Pluem 2020-03-10 07:38:56 UTC
What are your settings for

ServerLimit
StartServers
MaxRequestWorkers
MinSpareThreads
MaxSpareThreads
ThreadsPerChild

?
Comment 70 Ruediger Pluem 2020-03-10 07:59:59 UTC
How many cores / VCPU's does the system have you run your httpd on?
Comment 71 Curtis Wilson 2020-03-10 15:28:40 UTC
Created attachment 37089 [details]
Vhosts

1. Have you set MaxMemFree in your configuration and if yes to what value?

Default

2. Can you deliver all ProxyPass[Match] directives in your configuration?

There would be to many to provide every one of them. I can provide an example, with the domain removed.
# grep -c ProxyPass /etc/apache2/conf/httpd.conf
10846
# grep "<Proxymatch" /etc/apache2/conf/httpd.conf  -c
4408

3. Can you deliver all <Proxy > blocks in your configuration?

    <Proxy "*">
        <IfModule security2_module>
            SecRuleEngine Off
        </IfModule>
    </Proxy>



4. How many cores / VCPU's does the system have you run your httpd on?

20 VCPU, 60GB of Memory

5. What are your settings for

ServerLimit
StartServers
MaxRequestWorkers
MinSpareThreads
MaxSpareThreads
ThreadsPerChild

?


MaxRequestsPerChild 5000
ThreadLimit 120
Threadsperchild 64
MinSpareThreads 75
MaxSpareThreads 395
ServerLimit 32
MaxRequestWorkers 2048

Attached I have added an example of the vhost used for every site. I have also added the vhost for the various proxy subdomains used.
Comment 72 Ruediger Pluem 2020-03-11 09:36:20 UTC
Thanks. Can I also have a 'dump_all_pools' from a freshly started child process that does not consume an unexpected amount of memory?
Comment 73 Curtis Wilson 2020-03-11 14:11:00 UTC
Created attachment 37093 [details]
Fresh child dump

Attched is a dump of a fresh child.
Comment 74 Ruediger Pluem 2020-03-11 20:26:57 UTC
Pool dump wise there is no big difference between a fresh process and one of the memory eating ones (only 1103 block which are about 8 MiB). And there is also not much difference in the free memory in the allocators which is about 750 KiB in the fresh child case and about 10 MiB in the memory consuming case.
Can you please provide a

ps -e -o pid,comm,vsz,rsz
pmap <pid>

for a fresh child and the main process?
Comment 75 Curtis Wilson 2020-03-12 15:07:27 UTC
Created attachment 37096 [details]
Pmaps of fressh child and main

Attached are pmap's of a fresh child, and main process
Comment 76 Curtis Wilson 2020-03-31 15:34:06 UTC
Is there anything else that is needed at this time?
Comment 77 Ruediger Pluem 2020-04-01 12:36:23 UTC
(In reply to Curtis Wilson from comment #76)
> Is there anything else that is needed at this time?

Not now. I am honestly a bit lost now. It looks like that your processes consume a lot of memory (about 600 MB) from the start due to their configuration, but from pool usage perspective not much changes between the freshly started process and the one which consumes the huge amount of memory (roughly 2.8 GB). So the question is where is this memory lost and why did the behavior change with the httpd version.

It could be that a 3rd party module consumes memory outside the pools, but due to sideeffects introduced by the newer httpd version does not free up that memory any longer or it is caused by the underlying memory management of the c library, but then it should show up the same way with both httpd version. So this is the rather unlikely option. Maybe we are left to the proposal from Joe:

Have a fresh process handle some requests to "warm up" a little bit memory wise and then do https://bz.apache.org/bugzilla/show_bug.cgi?id=63687#c62 with that process and see where it stops over and over again.
Comment 78 Ruediger Pluem 2020-04-01 12:43:53 UTC
Maybe the following untested stuff helps you to do this in an unattended way:

gdb <attach to actively leaking child>
break sbrk
commands
silent
bt full
cont
end
set logging file <file to log output to>
set logging redirect on
set logging on
cont
Comment 79 Curtis Wilson 2020-04-02 17:22:25 UTC
I am trying to gather the set of data requested with the steps to see if I can get that to work. However something we have noticed is the Main proc looks like it starts bloating, and when new children are born, they start in that bloated state and continue to grow.. We built our own 2.4.43 RPM's for easy apache yesterday and put it on one of the boxes and the behavior is being noticed in it also.
Comment 80 Eric Covener 2020-04-02 17:27:49 UTC
(In reply to Curtis Wilson from comment #79)
> I am trying to gather the set of data requested with the steps to see if I
> can get that to work. However something we have noticed is the Main proc
> looks like it starts bloating, and when new children are born, they start in
> that bloated state and continue to grow.. We built our own 2.4.43 RPM's for
> easy apache yesterday and put it on one of the boxes and the behavior is
> being noticed in it also.

In that case it may be better to augment Ruedigers instructions to try to catch the parent allocating memory significantly after startup.
Comment 81 Curtis Wilson 2020-04-06 16:01:27 UTC
So after determining the trigger for causing this to happen (Reloads) we were able to track this down to a specific commit that was causing this issues, and building a build without this commit and installing it to test servers we did find that without this commit the memory issues that have been happening no longer are occuring. 



$ git bisect good
734313ca6e758f94ae3f923f801f34da02251b9b is the first bad commit
commit 734313ca6e758f94ae3f923f801f34da02251b9b
Author: Stefan Eissing <icing@apache.org>
Date:   Tue Jul 30 11:23:52 2019 +0000

    Merged /httpd/httpd/trunk:r1851621,1852128,1862075
    
      *) mod_ssl/mod_md: reversing dependency by letting mod_ssl offer hooks for
         adding certificates and keys to a virtual host. An additional hook allows
         answering special TLS connections as used in ACME challenges.
         Adding 2 new hooks for init/get of OCSP stapling status information when
         other modules want to provide those. Falls back to own implementation with
         same behaviour as before.
    
    
    
    git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/branches/2.4.x@1863988 13f79535-47bb-0310-9956-ffa450edef68

 CHANGES                         |   8 +++
 modules/ssl/mod_ssl.h           |  29 ++++++++++
 modules/ssl/mod_ssl_openssl.h   |  40 ++++++++++++++
 modules/ssl/ssl_engine_init.c   | 116 +++++++++++++++++-----------------------
 modules/ssl/ssl_engine_kernel.c |  74 +++++++++++++++++--------
 modules/ssl/ssl_util_stapling.c |  93 +++++++++++++++++++++++++-------
 6 files changed, 253 insertions(+), 107 deletions(-)
Comment 82 Stefan Eissing 2020-04-09 12:57:49 UTC
Created attachment 37158 [details]
ssl stapling leak fix

An fix for a leak introduced in 2.4.43 that was introduced by the new stapling interaction with mod_md.
Comment 83 Stefan Eissing 2020-04-09 12:59:05 UTC
Would you mind trying the ssl stapling leak fixed I attached here. It applies on top of a 2.4.43. I would be intersted to know if this fixes your memory problems. 

Thanks, Stefan
Comment 84 Curtis Wilson 2020-04-09 13:37:43 UTC
Hey Stefan, 

I will talk with my team today to see if we can get this built and pushed to our test servers.
Comment 85 Curtis Wilson 2020-04-13 14:51:44 UTC
Hey Stefan, 

We built this shortly after you posted it on Thursday and installed this on one of our test boxes and this has been working throughout the weekend without any issues. It does not appear that memory consumption is growing exponentially on reloads and we have not had a memory problem over the weekend with this.
Comment 86 Giovanni Bechis 2020-04-14 15:25:03 UTC
Created attachment 37171 [details]
simplified patch

Judging from openssl man pages, x509_free is a no-op if the parameter is NULL, so the goto dance could be avoided.
I think the patch is correct either way anyway.
Comment 87 Yann Ylavic 2020-04-16 10:48:20 UTC
Fixed in trunk (r1876548).
Comment 88 nitop 2020-04-16 11:43:59 UTC
Hi,

I've also rebuilt Apache2.4.43 with Stefan's fix (ssl stapling leak fix).
It looks very good and it seems the bug has been fixed.

Thank you all!

Best regards,
Michael