Hello, after we upgraded 7 of our systems to apr-1.7.0 (from 1.6.5), pcre-8.43 (from 8.42) and apache 2.4.41 (from 2.4.39) we have a hugh memory problem - you can see this in the screenshot attached. Upgrade was performed on 20 Aug. We do not changed anything in the config-files. After rolling back to the old versions, everything is fine. Any advice?
Created attachment 36732 [details] High Memory usage High Memory usage after upgrading
Additional details: All systems are running with Debian and Kernel "4.4.186".
Created attachment 36733 [details] apache2.conf httpd/apache2 config
You've made three changes at the same time which increases the difficult in diagnosing this. Can you try httpd 2.4.41 on the old APR/PCRE versions, and see if that also has the same memory problem?
I have run into the same issue on my servers, they are all Centos 6 and are running cPanel. After the updates I found that I was getting constant issues with memory use from apache in the worker mpm, most times when I would get to them all httpd processes were using 8-12% memory, and causing the boxes to overcommit. They updated to Apache 2.4.41 (8/22/2019) APR 1.7 (07/03/2019) pcre is at 7.8-7 Apache is being obtained from the cPanel easyapache repository.
Also to note, until I can figure out the cause as to why this is happening, I have had to downgrade them all to Apache 2.4.39.
(In reply to Curtis Wilson from comment #5) > I have run into the same issue on my servers, they are all Centos 6 and are > running cPanel. After the updates I found that I was getting constant issues > with memory use from apache in the worker mpm, most times when I would get > to them all httpd processes were using 8-12% memory, and causing the boxes > to overcommit. > > They updated to > Apache 2.4.41 (8/22/2019) > APR 1.7 (07/03/2019) > pcre is at 7.8-7 So only httpd and APR where updated, correct? pcre remained unchanged? Like with the other reporter, can you just update Apache to isolate the component that causes this?
(In reply to nitop from comment #3) > Created attachment 36733 [details] > apache2.conf > > httpd/apache2 config The given configuration does not show me which MPM you are using. It may be configured in /etc/apache2/mods-enabled/*.load /etc/apache2/mods-enabled/*.conf Which MPM do you use?
Hello, "Which MPM do you use?" -> We use worker. I've now just updated apache2 to 2.4.41 - so far no problems with Memory BUT, I saw this: # apache2 -V Server version: Apache/2.4.41 (Unix) Server loaded: APR 1.7.0, APR-UTIL 1.6.1 Compiled using: APR 1.6.5, APR-UTIL 1.6.1 Why is this different? We've compiled APR 1.6.5 and NOT 1.7.0. Thank you!
(In reply to nitop from comment #9) > Hello, > > "Which MPM do you use?" > > -> We use worker. > > I've now just updated apache2 to 2.4.41 - so far no problems with Memory > BUT, I saw this: > > # apache2 -V > Server version: Apache/2.4.41 (Unix) > Server loaded: APR 1.7.0, APR-UTIL 1.6.1 > Compiled using: APR 1.6.5, APR-UTIL 1.6.1 > > Why is this different? > We've compiled APR 1.6.5 and NOT 1.7.0. This is because you complied it against 1.6.5, but when started the httpd process finds 1.7.0 first and hence loads it.
It's definitly Apache 2.4.41. I've updated only Apache2 this morning at ~9:00am - without APR or PCRE: Please see attached Memory-Graph (mem_usage_ONLY_Apache2_4_41.PNG)
Created attachment 36743 [details] Apache2.4.41 Memory Usage without APR- and pcre-update
Have you compiled your Apache with debugging symbols? Are you able to attach to such a memory consuming process with gdb? If this is the case it would be helpful if you could use the following .gdbinit for your gdb session http://svn.apache.org/viewvc/httpd/httpd/trunk/.gdbinit?revision=1866078&view=co Once you attached to such a memory consuming process with gdb using the above .gdbinit please execute the following command on gdb side and report back the output: dump_all_pools
@RuedigerPluem "Have you compiled your Apache with debugging symbols?" No. Is this necessary to go on with gdb? "Are you able to attach to such a memory consuming process with gdb?" I've not tried it yet. How should I do this? I can not use your .gdbinit, it outputs some errors - we have an old debian here and gdb 7.0.1.
(In reply to nitop from comment #14) > @RuedigerPluem > "Have you compiled your Apache with debugging symbols?" > > No. Is this necessary to go on with gdb? Yes, you need the debugging symbols to extract the proper information from the process. > > "Are you able to attach to such a memory consuming process with gdb?" > > I've not tried it yet. How should I do this? http://httpd.apache.org/dev/debugging.html#backtrace > I can not use your .gdbinit, it outputs some errors - we have an old debian > here and gdb 7.0.1. This is bad. The lowest version I tested the .gdbinit with was with 7.2. What are the error messages?
@RuedigerPluem Here are the errors with your ".gdbinit": gdb apache2 16606 Reading symbols from /usr/sbin/apache2...done. Attaching to program: /usr/sbin/apache2, process 16606 warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffc77fa000 0x00007fc035f1b303 in ?? () Traceback (most recent call last): File "<string>", line 78, in <module> File "<string>", line 8, in __init__ AttributeError: 'module' object has no attribute 'COMMAND_USER' /usr/local/src/mni/.gdbinit:548: Error in sourced command file: Error while executing Python code. "Yes, you need the debugging symbols to extract the proper information from the process." -> Did you mean the flag "--enable-maintainer-mode"?
(In reply to nitop from comment #16) > @RuedigerPluem > > Here are the errors with your ".gdbinit": > gdb apache2 16606 > Reading symbols from /usr/sbin/apache2...done. > Attaching to program: /usr/sbin/apache2, process 16606 > > warning: no loadable sections found in added symbol-file system-supplied DSO > at 0x7fffc77fa000 > 0x00007fc035f1b303 in ?? () > Traceback (most recent call last): > File "<string>", line 78, in <module> > File "<string>", line 8, in __init__ > AttributeError: 'module' object has no attribute 'COMMAND_USER' > /usr/local/src/mni/.gdbinit:548: Error in sourced command file: > Error while executing Python code. I cannot fix this. You would need to use a higher version of gdb in this case. Unfortunately the Python code in .gdbinit is essential for debugging your issue. > > "Yes, you need the debugging symbols to extract the proper information from > the process." > > -> Did you mean the flag "--enable-maintainer-mode"? That would be one way. Another less strict one is to export CFLAGS="-Wall -O2 -g" before you run the configure script.
I am now running gdb 7.2 and tried it again: /usr/local/bin/gdb apache2 29505 GNU gdb (GDB) 7.2 Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/apache2...done. Attaching to program: /usr/sbin/apache2, process 29505 Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done. Loaded symbols for /lib/libpcre.so.3 Reading symbols from /usr/lib/libaprutil-1.so.0...done. Loaded symbols for /usr/lib/libaprutil-1.so.0 Reading symbols from /usr/lib/libexpat.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libexpat.so.1 Reading symbols from /usr/lib/libapr-1.so.0...done. Loaded symbols for /usr/lib/libapr-1.so.0 Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_compat.so.2 Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_nis.so.2 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 .... .... warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffd92a92000 0x00007f7bcee85303 in select () from /lib/libc.so.6 .gdbinit:548: Error in sourced command file: Python scripting is not supported in this copy of GDB. (gdb) dump_all_pools Undefined command: "dump_pool_and_children". Try "help".
We can not continue with debugging because of larger dependencies (gdb, python, ...) Can someone else debug that with the same problem?
(In reply to nitop from comment #18) > I am now running gdb 7.2 and tried it again: > warning: no loadable sections found in added symbol-file system-supplied DSO > at 0x7ffd92a92000 > 0x00007f7bcee85303 in select () from /lib/libc.so.6 > .gdbinit:548: Error in sourced command file: > Python scripting is not supported in this copy of GDB. > (gdb) dump_all_pools > Undefined command: "dump_pool_and_children". Try "help". This gdb does not have Python scripting enabled. How did you get this gdb 7.2? Did you compile it on your own or did you download it from somewhere? If you compile it on your own you need to specify --with-python to the configure script of gdb. Of cause this requires Python to be available on your system. Not sure how these Python packages are named in Debian.
Someone else should debug this. We've some hugh dependencies here and can not get >= gdb7.2 with python2.7 to work.
(In reply to nitop from comment #21) > Someone else should debug this. > We've some hugh dependencies here and can not get >= gdb7.2 with python2.7 > to work. Just for the sake of completeness: My GDB 7.2 uses Python 2.6 as this is what my Centos 6 delivers as OS package.
I am wondering if the changes to PCRE_DOTALL is doing this. If using regex a lot try: RegexDefaultOptions -DOTALL
Are you using HTTP2 and/or mod_md at all? This will allow us to pare down the diffs
@Jim Jagielski We do not use http2 or mod_md.
Thanks... that helps to narrow down things quite a bit.
(In reply to Ruediger Pluem from comment #7) > (In reply to Curtis Wilson from comment #5) > > I have run into the same issue on my servers, they are all Centos 6 and are > > running cPanel. After the updates I found that I was getting constant issues > > with memory use from apache in the worker mpm, most times when I would get > > to them all httpd processes were using 8-12% memory, and causing the boxes > > to overcommit. > > > > They updated to > > Apache 2.4.41 (8/22/2019) > > APR 1.7 (07/03/2019) > > pcre is at 7.8-7 > > So only httpd and APR where updated, correct? pcre remained unchanged? > > Like with the other reporter, can you just update Apache to isolate the > component that causes this? Hy Ruediger, So, I was with the same problem here and i found this topic talking exactly about. And the problem is aparently solved. I would like to explain the situation in hope this help you to identify another causes like this. Here comes the history.. My server is an Amazon Linux. The problematic httpd 2.4.41 has been compiled from source because a security audit demanded. In that day I could not install it through yum because the amazon linux repositories already had the last apache version, i don´t remember exactly, but think the 2.4.34 and the security reports suggested the 2.4.41. Ok this was the history. So in your Comments #5 and #7 I was about to follow that you suggested but, before I would like to test if my httpd's earlier version would work normally, and i did. I renamed my current apache directory, undo the custom settings and installed the "previous" apache version throught yum. To my great surprise yum told me it would install 2.4.41 version. I got a little confused but if yum gave me 2.4.41, that´s ok. Ahh, I forgot to talk that the MPM chosen was the Event. Imediately after install i check the version with httpd -V (After a few corrections to make sure the httpd binary's running was that installed by yum) and the result was as follow: My Server with Memory Leakage: Server version: Apache/2.4.41 (Unix) Server built: Nov 6 2019 00:42:00 Server's Module Magic Number: 20120211:88 Server loaded: APR 1.7.0, APR-UTIL 1.6.1 Compiled using: APR 1.7.0, APR-UTIL 1.6.1 Architecture: 64-bit Server MPM: event threaded: yes (fixed thread count) forked: yes (variable process count) The same server with memory leakage fixed after install httpd 2.4.41 throught yum (Actally not the same server, the above server it´s a clone in homologation enviroment): Server version: Apache/2.4.41 () Server built: Oct 22 2019 22:59:04 Server's Module Magic Number: 20120211:88 Server loaded: APR 1.6.3, APR-UTIL 1.6.1 Compiled using: APR 1.6.3, APR-UTIL 1.6.1 Architecture: 64-bit Server MPM: event threaded: yes (fixed thread count) forked: yes (variable process count) Another thing maybe help you is that these two servers above were replicated from another server's image, also running apache. This image has been used to create about 10 servers and none of them have this problem. Here is the httpd -V result from the root server: Server version: Apache/2.4.34 () Server built: Aug 17 2018 22:14:33 Server's Module Magic Number: 20120211:79 Server loaded: APR 1.6.3, APR-UTIL 1.6.1 Compiled using: APR 1.6.3, APR-UTIL 1.6.1 Architecture: 64-bit Server MPM: event threaded: yes (fixed thread count) forked: yes (variable process count) Ok, as i see, the problematic guy was compiled using APR 1.7.0 and all of the others using APR 1.6.3. I didn´t did none of your suggested tests. Until now, my problem aparently resolved, as 1 day has been passed and no memory leaks. In fact, memory consumption didn´t even go up. My server has 2 GB RAM, it runs just one application, with just one connection for just one user, and with memory leakage, the server was freezing after about 5 hours of use. The server with no problem, runs 5 httpd children, and each one consumes about 27MB each. The problematic starts with 3 children and the memory grows up to ceiling.
It still doesn't look good here. I've compiled "APR 1.6.3" and "APR-UTIL 1.6.1" with "Apache/2.4.41". APR 1.6.3: ./configure --prefix=/usr/local/apr/ make make install APR-UTIL 1.6.1: ./configure --prefix=/usr/local/apr/ --with-apr=/usr/local/apr/ make make install Apache2: --with-apr=/usr/local/apr/bin/apr-1-config --with-apr-util=/usr/local/apr/bin/apu-1-config --> High Memory usage again.
apache2ctl -V Server version: Apache/2.4.41 (Unix) Server built: Dec 9 2019 08:44:12 Server's Module Magic Number: 20120211:88 Server loaded: APR 1.6.3, APR-UTIL 1.6.1 Compiled using: APR 1.6.3, APR-UTIL 1.6.1 Architecture: 64-bit Server MPM: worker threaded: yes (fixed thread count) forked: yes (variable process count) Server compiled with.... -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=256 -D HTTPD_ROOT="" -D SUEXEC_BIN="/usr/lib/apache2/suexec" -D DEFAULT_PIDLOG="/var/run/apache2/httpd.pid" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="/etc/apache2/mime.types" -D SERVER_CONFIG_FILE="/etc/apache2/apache2.conf"
Created attachment 37038 [details] CPU usage drop after Apache Downgrade
Created attachment 37039 [details] Response time drop after Apache Downgrade
We are still seeing this issue actively posing a problem and causing performance issues for servers running Apache 2.4.41. Is there any other information that is needed to investigate this matter further as we would like to see this issue resolved. If there is any other information needed we would like to help gather this where we can. Attached I provided an example of just data, we collected form our servers that provide us an overall look at things. On Feb 20, we were having performance issues with a server using Apache 2.4.41 from EasyApache on cPanel and we downgraded this back to Apache 2.4.39 and there is a noticeable sharp decrease in CPU usage and actual site response times from the test sites on this server.
Have you tried disabling PCRE_DOTALL as suggested in comment #23 ? Just add "RegexDefaultOptions -DOTALL" to your httpd.conf and restart httpd(8).
It does not look like adding "RegexDefaultOptions -DOTALL" to the httpd.conf is working. We added this on a few test servers 2 days ago and this morning all of them are starting to overcommit again with the problem being the memory consumption of Apache. # httpd -V Server version: Apache/2.4.41 (cPanel) Server built: Nov 19 2019 16:06:59 Server's Module Magic Number: 20120211:88 Server loaded: APR 1.7.0, APR-UTIL 1.6.1 Compiled using: APR 1.7.0, APR-UTIL 1.6.1 Architecture: 64-bit Server MPM: worker threaded: yes (fixed thread count) forked: yes (variable process count) # grep DOTALL /etc/httpd/conf/httpd.conf RegexDefaultOptions -DOTALL We are using HTTP2, and we are not using mod_md. Any information that will help bring this to a resolution we would like to help and provide if we can.
Hello, I've also tried it again: Setting "RegexDefaultOptions -DOTALL" does not help us. The servers start overcommitting after a few hours - so we have to go back again to 2.4.39. It does not matter in which version APR is compiled. # apache2ctl -V Server version: Apache/2.4.41 (Unix) Server built: Feb 27 2020 07:54:36 Server's Module Magic Number: 20120211:88 Server loaded: APR 1.6.3, APR-UTIL 1.6.1 Compiled using: APR 1.6.3, APR-UTIL 1.6.1 Architecture: 64-bit Server MPM: worker threaded: yes (fixed thread count) forked: yes (variable process count) # grep RegexDefaultOption /etc/apache2/apache2.conf RegexDefaultOptions -DOTALL We are not using HTTP2 or mod_md. Compiled with: APR 1.6.3: ./configure --prefix=/usr/local/apr/ make make install APR-UTIL 1.6.1: ./configure --prefix=/usr/local/apr/ --with-apr=/usr/local/apr/ make make install Apache2: ./configure --enable-layout=Debian --enable-so --with-program-name=apache2 --with-suexec-caller=www-data --with-mpm=worker --with-suexec-bin=/usr/lib/apache2/suexec --with-suexec-docroot=/var/www --with-suexec-userdir=public_html --with-suexec-logfile=/var/log/apache2/suexec.log --with-suexec-uidmin=100 --enable-suexec=shared --enable-log-config=static --enable-logio=static --enable-version=static --with-apr=/usr/local/apr/bin/apr-1-config --with-apr-uti l=/usr/local/apr/bin/apu-1-config --with-pcre=/usr/local/pcre --enable-pie --with-ssl=/usr/lib/ssl --enable-ssl=shared --enable-vhost-alias=shared --enable-module=shared --enable-authn-alias=shared \ --enable-disk-cache=shared --enable-cache=shared \ --enable-mem-cache=shared --enable-file-cache=shared \ --enable-cern-meta=shared --enable-dumpio=shared --enable-ext-filter=shared \ --enable-charset-lite=shared --enable-cgi=shared \ --enable-dav-lock=shared --enable-log-forensic=shared \ --enable-proxy=shared \ --enable-proxy-connect=shared --enable-proxy-ftp=shared \ --enable-proxy-http=shared --enable-proxy-ajp=shared \ --enable-proxy-scgi=shared \ --enable-proxy-balancer=shared \ --enable-authn-dbm=shared --enable-authn-anon=shared \ --enable-authn-dbd=shared --enable-authn-file=shared \ --enable-authn-default=shared --enable-authz-host=shared \ --enable-authz-groupfile=shared --enable-authz-user=shared \ --enable-authz-dbm=shared --enable-authz-owner=shared \ --enable-authz-default=shared \ --enable-auth-basic=shared --enable-auth-digest=shared \ --enable-dbd=shared --enable-deflate=shared \ --enable-include=shared --enable-filter=shared \ --enable-env=shared --enable-mime-magic=shared \ --enable-expires=shared --enable-headers=shared \ --enable-ident=shared --enable-usertrack=shared \ --enable-unique-id=shared --enable-setenvif=shared \ --enable-status=shared \ --enable-autoindex=shared --enable-asis=shared \ --enable-info=shared --enable-cgid=shared \ --enable-dav=shared --enable-dav-fs=shared \ --enable-vhost-alias=shared --enable-negotiation=shared \ --enable-dir=shared --enable-imagemap=shared \ --enable-actions=shared --enable-speling=shared \ --enable-userdir=shared --enable-alias=shared \ --enable-rewrite=shared --enable-mime=shared \ --enable-substitute=shared --enable-reqtimeout=shared
(In reply to nitop from comment #35) > It does not matter in which version APR is compiled. Could you please run httpd with LD_LIBRARY_PATH including your compiled apr/lib directory or alternatively configure like: LDFLAGS="-Wl,-rpath,/usr/local/apr/lib" ./configure ... Compiling with an APR version doesn't mean httpd will link to it at runtime, unless one of the above is used. Then we can really figure out whether it's due to APR-1.7 or not.
I will be adding "SetEnv LD_LIBRARY_PATH $path" on 5 test boxes tonight the the location of the 1.7.0 APR that cPanel provides. However I do want to point out that APR 1.7.0 has had 0 issues or at least we are not seeing issues in Apache 2.4.39. It looks like we received Apache APR 1.7.0 in May of 2019 and it was already running on our servers with Apache 2.4.39 before the release of Apache 2.4.41, it seems that after Apache 2.4.41 released and was distributed whether via cPanel or normal repositories are when issues started occurring. Once I have an update on the test boxes I will update though.
Due to our Apache being provided by cPanel with EasyApache4, we will not be able to custom compile different APR or Apache versions to test. Setting the LD_LIBRARY_PATH can be done in /etc/sysconfig/http . Older versions of Apache and the APR would only be able to be obtained via RPM from cPanel and those older RPM's do not exist any longer. What we have noticed is without specifying what path to use is that Apache is opening the right APR, verified by using lsof on the Apache PID's..
(In reply to Curtis Wilson from comment #38) Can you apply Ruediger's debugging steps from comment #13 on your system? When the memory is high enough, that would be a good way to gather informations on what happens in httpd-2.4.41 (at least) with apr-1.7, the combination that seems to matter.
I don't believe that .gdbinit is complete. When you use dump_all_pools, it tries to call dump_pool_and_children, which looks like it is done via the python portion but is not actually defined and does not exist. (gdb) dump_all_pools Undefined command: "dump_pool_and_children". Try "help". This is not actively happening, but I did have to install debug packages and restart httpd in order to be able to provide this info when it is.
dump_pool_and_children is contained in .gdbinit (at least the one from here: http://svn.apache.org/viewvc/httpd/httpd/branches/2.4.x/.gdbinit?revision=1866656&view=markup). It might be possible that your gdb does not have Python support. Can you try the following gdb command once you started gdb and post back the results? python print(True)
What does gdb --version deliver?
(gdb) python print(True) True # gdb --version GNU gdb (GDB) Red Hat Enterprise Linux (7.2-92.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>.
To be able to debug properly this issue, you should install ea-apache24-debuginfo rpm package. This will provide you a httpd.debug binary; it's the same binary than httpd(8) but built by cPanel with debug symbols.
It is already installed, the only issue we have is finding a debug package for lua-5.1.4-4.1.el6.x86_64 The issue is that when I execute gdb -p $pid it loads everything in the .gdbinit and when I execute dump_all_pools it goes through and tries to then execute dump_pool_and_children which gdb shell is not finding as a defined command to execute. If I put define dump_pool_and_children above the execution of python it does try to work. However I run into this. (gdb) dump_all_pools Traceback (most recent call last): File "<string>", line 78, in <module> File "<string>", line 8, in __init__ AttributeError: 'module' object has no attribute 'COMMAND_USER' Error while executing Python code. (gdb) However without that being added all that is reported is. (gdb) dump_all_pools Undefined command: "dump_pool_and_children". Try "help".
Okay, I think I found the problem. It was an error I was missing, python is available however it is not playing nice. What version of python does this expect as it is failing to load the python portion of this gdbinit. The systems in question all have python 2.6.6 .
The problem we are having is related to the gdb version on the systems. They all run with gdb-7.2-92.el6
This is working now, enabled the right gdb so when this does start happening again through the weekend I will grab data from this.
Created attachment 37054 [details] gdb_dump_all_pools Today one of our test servers was using around 4.3% per thread and causing issues where the server was overcommitting. I was able to grab the information using gdb and the provided .gdbinit , executing dump_all_pools .
(In reply to Curtis Wilson from comment #49) > Created attachment 37054 [details] > gdb_dump_all_pools > > Today one of our test servers was using around 4.3% per thread and causing > issues where the server was overcommitting. I was able to grab the > information using gdb and the provided .gdbinit , executing dump_all_pools . Thanks. Looks like a high number of subpools of the pconf pool are created. Now we need to figure out why :-). Can you please provide a complete list of all modules that you load with your configuration? A <Path to your httpd binary>/httpd -t -D DUMP_MODULES -f <path to your httpd.conf> should provide this.
Next time you come accross the overcommit situation can you please use http://svn.apache.org/viewvc/httpd/httpd/trunk/.gdbinit?revision=1874723&view=markup as .gdbinit when you do a dump_all_pools? This should get you rid of the Python Exception <type 'exceptions.RuntimeError'> maximum recursion depth exceeded: messages and give us the full output.
The loaded modules are as follows: Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (shared) cgid_module (shared) access_compat_module (shared) actions_module (shared) alias_module (shared) asis_module (shared) auth_basic_module (shared) auth_digest_module (shared) authn_core_module (shared) authn_dbd_module (shared) authn_dbm_module (shared) authn_file_module (shared) authz_core_module (shared) authz_dbm_module (shared) authz_groupfile_module (shared) authz_host_module (shared) authz_user_module (shared) autoindex_module (shared) cache_module (shared) cache_disk_module (shared) dav_module (shared) dav_fs_module (shared) dav_lock_module (shared) dbd_module (shared) deflate_module (shared) dir_module (shared) env_module (shared) expires_module (shared) file_cache_module (shared) filter_module (shared) headers_module (shared) include_module (shared) log_config_module (shared) log_forensic_module (shared) logio_module (shared) mime_module (shared) mime_magic_module (shared) negotiation_module (shared) proxy_module (shared) lbmethod_byrequests_module (shared) lbmethod_bytraffic_module (shared) proxy_ajp_module (shared) proxy_balancer_module (shared) proxy_connect_module (shared) proxy_fcgi_module (shared) proxy_ftp_module (shared) proxy_http_module (shared) proxy_scgi_module (shared) proxy_wstunnel_module (shared) remoteip_module (shared) reqtimeout_module (shared) rewrite_module (shared) setenvif_module (shared) slotmem_shm_module (shared) socache_dbm_module (shared) socache_shmcb_module (shared) socache_redis_module (shared) speling_module (shared) status_module (shared) substitute_module (shared) suexec_module (shared) unique_id_module (shared) unixd_module (shared) userdir_module (shared) version_module (shared) bwlimited_module (shared) ssl_module (shared) fcgid_module (shared) http2_module (shared) security2_module (shared) suphp_module (shared) passenger_module (shared) rbld_module (shared) Nex time this overcommits I will grab the full output for you.
I would also like to note that we are open to some form of video conference while this is happening so that we can gather more information as needed.
Hi, I gave a look at rbld_module. If the module is the following one on github https://github.com/bluehost/mod_rbld/blob/master/mod_rbld.c#L370 The pool allocated above is leaking in many paths, including what looks to me normal use cases. This could explain some memory leak, but the corresponding parent pool should not be 'pconf' as it seems to be. Do you need bwlimited_module and the last 4 modules in the list? These at least looks like 3rd party modules.
Created attachment 37060 [details] gdb_dump_all_pools_latestinit Attached now is a dump_all_pools output without the tracebacks at the end where this fully completes the output.
The last 4 modules on the list are not modules that would be able to disable as some of the provide basic functionality and security for us. security2_module (shared) - Mod_security suphp_module (shared) - What allows user PHP Scripts to be executed and served through httpd passenger_module (shared) - Used for Ruby bwlimited_module - This module is what is used to display overbandwidth pages, when a user has reached their allocated bandwidth limit in cPanel.
(In reply to Curtis Wilson from comment #55) > Created attachment 37060 [details] > gdb_dump_all_pools_latestinit > > Attached now is a dump_all_pools output without the tracebacks at the end > where this fully completes the output. Thanks. Quick question. Which process did you dump? Was it the main process (the one that is the parent to all other httpd processes)? Or was it a child process? The reason I am asking is that I see no transaction pool on this process which is a bit unusal.
On each of these I dumped the main process that everything is forked from.
(In reply to Curtis Wilson from comment #58) > On each of these I dumped the main process that everything is forked from. Is it the main process which is leaking memory?
So the issue is it is every process under the main, I can grab a few dumps on this when the server starts showing signs again to prevent downtime when it reaches the point where it does affect traffic we do have to fully restart httpd.
Next time you experience the issue, can you please provide: 1. ps -e -o pid,comm,vsz,rsz | grep httpd 2. A dump_all_pools from the process that is memory leaking. It should be a child process. If not then from the main process. 3. The pid of this process 4. A pmap <pid> of this pid? 5. If the leaking process is the main process please provide a thread apply all bt full from your gdb session.
A technique I've found helpful for catching leaks is: gdb <attach to actively leaking child> break sbrk cont ... bt full to get a backtrace at the point sbrk is called to expand the heap. Not sure how well this works with a threaded MPM.
Created attachment 37066 [details] Dump of thread Attahced is the dump of a thread, we also beleive we may have solved this, at least in our case and are testing this also.
That's nice <a href="https://www.google.com"> hao</a>
(In reply to Curtis Wilson from comment #63) > Created attachment 37066 [details] > Dump of thread > > Attahced is the dump of a thread, we also beleive we may have solved this, > at least in our case and are testing this also. Did you solve it and if yes how?
We attempted disabling mod_rbld, on our test servers and it was working well, however over the weekend it looks like this started to lead to the same issue. I will attach a new set of data with mod_rbld disabled.
Created attachment 37077 [details] Data with mod_rbld disabled
(In reply to Curtis Wilson from comment #67) > Created attachment 37077 [details] > Data with mod_rbld disabled Thanks. More questions: 1. Have you set MaxMemFree in your configuration and if yes to what value? 2. Can you deliver all ProxyPass[Match] directives in your configuration? 3. Can you deliver all <Proxy > blocks in your configuration?
What are your settings for ServerLimit StartServers MaxRequestWorkers MinSpareThreads MaxSpareThreads ThreadsPerChild ?
How many cores / VCPU's does the system have you run your httpd on?
Created attachment 37089 [details] Vhosts 1. Have you set MaxMemFree in your configuration and if yes to what value? Default 2. Can you deliver all ProxyPass[Match] directives in your configuration? There would be to many to provide every one of them. I can provide an example, with the domain removed. # grep -c ProxyPass /etc/apache2/conf/httpd.conf 10846 # grep "<Proxymatch" /etc/apache2/conf/httpd.conf -c 4408 3. Can you deliver all <Proxy > blocks in your configuration? <Proxy "*"> <IfModule security2_module> SecRuleEngine Off </IfModule> </Proxy> 4. How many cores / VCPU's does the system have you run your httpd on? 20 VCPU, 60GB of Memory 5. What are your settings for ServerLimit StartServers MaxRequestWorkers MinSpareThreads MaxSpareThreads ThreadsPerChild ? MaxRequestsPerChild 5000 ThreadLimit 120 Threadsperchild 64 MinSpareThreads 75 MaxSpareThreads 395 ServerLimit 32 MaxRequestWorkers 2048 Attached I have added an example of the vhost used for every site. I have also added the vhost for the various proxy subdomains used.
Thanks. Can I also have a 'dump_all_pools' from a freshly started child process that does not consume an unexpected amount of memory?
Created attachment 37093 [details] Fresh child dump Attched is a dump of a fresh child.
Pool dump wise there is no big difference between a fresh process and one of the memory eating ones (only 1103 block which are about 8 MiB). And there is also not much difference in the free memory in the allocators which is about 750 KiB in the fresh child case and about 10 MiB in the memory consuming case. Can you please provide a ps -e -o pid,comm,vsz,rsz pmap <pid> for a fresh child and the main process?
Created attachment 37096 [details] Pmaps of fressh child and main Attached are pmap's of a fresh child, and main process
Is there anything else that is needed at this time?
(In reply to Curtis Wilson from comment #76) > Is there anything else that is needed at this time? Not now. I am honestly a bit lost now. It looks like that your processes consume a lot of memory (about 600 MB) from the start due to their configuration, but from pool usage perspective not much changes between the freshly started process and the one which consumes the huge amount of memory (roughly 2.8 GB). So the question is where is this memory lost and why did the behavior change with the httpd version. It could be that a 3rd party module consumes memory outside the pools, but due to sideeffects introduced by the newer httpd version does not free up that memory any longer or it is caused by the underlying memory management of the c library, but then it should show up the same way with both httpd version. So this is the rather unlikely option. Maybe we are left to the proposal from Joe: Have a fresh process handle some requests to "warm up" a little bit memory wise and then do https://bz.apache.org/bugzilla/show_bug.cgi?id=63687#c62 with that process and see where it stops over and over again.
Maybe the following untested stuff helps you to do this in an unattended way: gdb <attach to actively leaking child> break sbrk commands silent bt full cont end set logging file <file to log output to> set logging redirect on set logging on cont
I am trying to gather the set of data requested with the steps to see if I can get that to work. However something we have noticed is the Main proc looks like it starts bloating, and when new children are born, they start in that bloated state and continue to grow.. We built our own 2.4.43 RPM's for easy apache yesterday and put it on one of the boxes and the behavior is being noticed in it also.
(In reply to Curtis Wilson from comment #79) > I am trying to gather the set of data requested with the steps to see if I > can get that to work. However something we have noticed is the Main proc > looks like it starts bloating, and when new children are born, they start in > that bloated state and continue to grow.. We built our own 2.4.43 RPM's for > easy apache yesterday and put it on one of the boxes and the behavior is > being noticed in it also. In that case it may be better to augment Ruedigers instructions to try to catch the parent allocating memory significantly after startup.
So after determining the trigger for causing this to happen (Reloads) we were able to track this down to a specific commit that was causing this issues, and building a build without this commit and installing it to test servers we did find that without this commit the memory issues that have been happening no longer are occuring. $ git bisect good 734313ca6e758f94ae3f923f801f34da02251b9b is the first bad commit commit 734313ca6e758f94ae3f923f801f34da02251b9b Author: Stefan Eissing <icing@apache.org> Date: Tue Jul 30 11:23:52 2019 +0000 Merged /httpd/httpd/trunk:r1851621,1852128,1862075 *) mod_ssl/mod_md: reversing dependency by letting mod_ssl offer hooks for adding certificates and keys to a virtual host. An additional hook allows answering special TLS connections as used in ACME challenges. Adding 2 new hooks for init/get of OCSP stapling status information when other modules want to provide those. Falls back to own implementation with same behaviour as before. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/branches/2.4.x@1863988 13f79535-47bb-0310-9956-ffa450edef68 CHANGES | 8 +++ modules/ssl/mod_ssl.h | 29 ++++++++++ modules/ssl/mod_ssl_openssl.h | 40 ++++++++++++++ modules/ssl/ssl_engine_init.c | 116 +++++++++++++++++----------------------- modules/ssl/ssl_engine_kernel.c | 74 +++++++++++++++++-------- modules/ssl/ssl_util_stapling.c | 93 +++++++++++++++++++++++++------- 6 files changed, 253 insertions(+), 107 deletions(-)
Created attachment 37158 [details] ssl stapling leak fix An fix for a leak introduced in 2.4.43 that was introduced by the new stapling interaction with mod_md.
Would you mind trying the ssl stapling leak fixed I attached here. It applies on top of a 2.4.43. I would be intersted to know if this fixes your memory problems. Thanks, Stefan
Hey Stefan, I will talk with my team today to see if we can get this built and pushed to our test servers.
Hey Stefan, We built this shortly after you posted it on Thursday and installed this on one of our test boxes and this has been working throughout the weekend without any issues. It does not appear that memory consumption is growing exponentially on reloads and we have not had a memory problem over the weekend with this.
Created attachment 37171 [details] simplified patch Judging from openssl man pages, x509_free is a no-op if the parameter is NULL, so the goto dance could be avoided. I think the patch is correct either way anyway.
Fixed in trunk (r1876548).
Hi, I've also rebuilt Apache2.4.43 with Stefan's fix (ssl stapling leak fix). It looks very good and it seems the bug has been fixed. Thank you all! Best regards, Michael
This has been backported in 2.4.x in r1879034. This will be part of 2.4.44.