Bug 62308 - Apache crashes after graceful restart with AH02599: slotmem (failed size check)
Summary: Apache crashes after graceful restart with AH02599: slotmem (failed size check)
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_balancer (show other bugs)
Version: 2.4.33
Hardware: PC All
: P2 regression (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-17 10:21 UTC by dnie
Modified: 2018-07-02 21:22 UTC (History)
1 user (show)



Attachments
logfile with configuration change example (69.04 KB, text/plain)
2018-04-17 10:21 UTC, dnie
Details
Relax slotmem reuse checks (4.36 KB, patch)
2018-04-18 16:21 UTC, Yann Ylavic
Details | Diff
Relax slotmem reuse checks (v2) (4.36 KB, patch)
2018-04-18 16:24 UTC, Yann Ylavic
Details | Diff
Logfile after adding BalancerMember (35.38 KB, text/plain)
2018-04-22 19:41 UTC, dnie
Details
Change SHM filenames when not reusable (4.94 KB, patch)
2018-04-29 21:13 UTC, Yann Ylavic
Details | Diff
Logfile with random termination (56.74 KB, text/plain)
2018-04-30 13:43 UTC, dnie
Details
Change SHM filenames when not reusable (v4) (7.18 KB, patch)
2018-04-30 21:41 UTC, Yann Ylavic
Details | Diff
Change SHM filenames when not reusable (v5) (14.30 KB, patch)
2018-05-03 00:49 UTC, Yann Ylavic
Details | Diff
Logfile after changing the port for a BalancerMember (19.74 KB, text/plain)
2018-05-16 08:46 UTC, dnie
Details
One SHM (filename) per generation (v6) (18.22 KB, patch)
2018-05-18 17:38 UTC, Yann Ylavic
Details | Diff
Test script (5.71 KB, application/zip)
2018-05-22 09:27 UTC, dnie
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dnie 2018-04-17 10:21:09 UTC
Created attachment 35878 [details]
logfile with configuration change example

After updating from 2.4.27 to 2.4.33, we get a crash when doing a graceful restart after modifying the mod_proxy/mod_proxy_balancer configuration in the filesystem. 
We are modifying the configuration files dynamicaly when our infrastructure changes. After this, we do a graceful restart using the following Windows command: httpd.exe -k restart
This worked fine with 2.4.27 and below. 
With 2.4.33 we get the following message:
AH02599: existing shared memory for C:/Apache24/temp/slotmem-shm-p17ffdef3.shm could not be used (failed size check)

I've added a Apache logfile with an example of configuration change that causes this issue
Comment 1 mark 2018-04-17 16:43:40 UTC
I see this too, under linux, in the move from 2.4.29 to 2.4.32. Possibly associated with the recent refactoring of the shm code triggered by one of my earlier bug reports.
Comment 2 Yann Ylavic 2018-04-18 10:57:05 UTC
Do you mean that AH02599 is the issue or the httpd process is really crashing (i.e. segmentation fault)?

I tried adding <Proxy balancer:...> sections and BalancerMembers within them but could not reproduce a crash (slotmems get reset where they shouldn't given the growth margin, which is an issue, but not a crash...). What is your scenario more precisely?
Comment 3 Yann Ylavic 2018-04-18 16:21:07 UTC
Created attachment 35881 [details]
Relax slotmem reuse checks

Since slotmems now servive restarts, so we can do minimal checks (item size only) when asked to reuse them (at creation time).
The overall size (i.e. also depending on the number of items) may have changed but we don't know at this point whether the current/reused size will fit, and it's the purpose of (b)growth parameters to allow/foresee for extension.
If it doesn't fit finally, the error should happen in slotmem_grab(), which aligns with the previous code in 2.4.29.

Does this patch work for you?
Comment 4 Yann Ylavic 2018-04-18 16:24:50 UTC
Created attachment 35882 [details]
Relax slotmem reuse checks (v2)

Argh sorry, forgot to return the created slotmem in *new, here is v2.
Comment 5 dnie 2018-04-19 09:03:52 UTC
(In reply to Yann Ylavic from comment #4)

>Do you mean that AH02599 is the issue or the httpd process is really crashing (i.e. segmentation fault)?

The httpd process does not exist anymore. That is what I mean. It's not really a crash.

> What is your scenario more precisely?

My httpd is used as a ReverseProxy only. My mod_proxy configuration does not use any additional configuration other than "keepalive". I also do not use any directives like "BalancerGrowth". Everything is standard.

>Does this patch work for you?
I have to prepare an anvironment under Windows to compile this. This may take a while (Visual Studio). Is there any other way to test this?
In comment #1 from Mark, I understand this issure is related to Linux too.
Comment 6 Yann Ylavic 2018-04-19 17:01:19 UTC
(In reply to dnie from comment #5)
> I have to prepare an anvironment under Windows to compile this. This may
> take a while (Visual Studio). Is there any other way to test this?
Maybe Steffen (ApacheLounge) can provide a Windows package with this fix, will kindly ask him if you don't beat me at it.

> In comment #1 from Mark, I understand this issure is related to Linux too.
I suppose the failure is Windows only, the issue on Linux is the spurious error log (possibly).
Comment 7 Yann Ylavic 2018-04-20 10:16:47 UTC
(In reply to Yann Ylavic from comment #6)
> Maybe Steffen (ApacheLounge) can provide a Windows package with this fix,
> will kindly ask him if you don't beat me at it.

He kindly did (many thanks!): http://people.apache.org/~steffenal/VC15/Patches/VC15-patch-mod_slotmem_shm.rar (assuming VC15 for runtime).

Hope this helps.
Comment 8 dnie 2018-04-22 19:37:05 UTC
I tested Apache VC15 64Bit with the VC15-patch-mod_slotmem_shm.rar

This patch fixed the issue with adding or removing <Proxy balancer://...>.
But, when adding BalancerMember to this Proxy, httpd terminates again. 
AH02293: slotmem ... grab failed

I attached the current log
Comment 9 dnie 2018-04-22 19:41:18 UTC
Created attachment 35888 [details]
Logfile after adding BalancerMember
Comment 10 Yann Ylavic 2018-04-29 21:13:15 UTC
Created attachment 35899 [details]
Change SHM filenames when not reusable

This new patch should restore 2.4.29 behaviour on Windows by creating SHMs as needed when the any size changes on restart (a generation number is used in this case to allow for the previous generation and the new one to run simultaneously).

The change is not Windows specific though since some mechanisms on Unix may also prevent the creation of an SHM whose filename/inode has been unlinked but still attached by some process.

I did some testing on Linux to verify that this patch works as expected, but I guess more testing is needed on Windows, could you please try it with your configuration?
Comment 11 Yann Ylavic 2018-04-30 10:35:15 UTC
Thanks to Steffen still, here is a binary version of "mod_slotmem_shm.so" with the patch applied:
http://people.apache.org/~steffenal/VC15/Patches/VC15-patch-2-mod_slotmem_shm.rar
Comment 12 dnie 2018-04-30 13:43:32 UTC
Created attachment 35900 [details]
Logfile with random termination

I tested VC15-patch-2-mod_slotmem_shm.rar

Adding and removing BalancerMember seems to work now.

But after some more testing, I got some random terminations in diffrent scenarios. 
I attached one logfile with the following scenario.

1. No Proxy "a"
2. Two BalancerMember for Proxy "a"
3. No Proxy "a"
4. One BalancerMember for Proxy "a"
5. Process terminated

I have an automated script to reproduce this steps. This script mostly fails in step 4 but sometimes fails in other later steps.
Comment 13 Yann Ylavic 2018-04-30 21:41:32 UTC
Created attachment 35902 [details]
Change SHM filenames when not reusable (v4)

The previous version missed the attachment part using the generation number, something specific to Windows and how its child process reuses slotmems.

Could you please try it? Also, do you need the binary version provided by Steffen or do you build your own?
Comment 14 Yann Ylavic 2018-05-01 07:52:37 UTC
Well, Steffen beat the call anyway: http://people.apache.org/~steffenal/VC15/Patches/VC15-patch-3-mod_slotmem_shm.rar

Thanks!
Comment 15 dnie 2018-05-02 11:50:33 UTC
Thanks Yann!
Thanks Steffen!

My httpd does not terminate anymore.

But how about the slotmem files?
There are 41 slotmem files left over after 3 minutes of continous graceful restart's. Even when complete stopping and then clean starting the httpd. They will not be removed.
Comment 16 Yann Ylavic 2018-05-02 12:28:21 UTC
Thanks for testing.

Do the 41 files left over correspond to the number of restarts and/or balancers configurations changes (or rather a random number lower than this)?
Also, do you issue "immediate" restarts or is there some traffic/sleep in between?
Comment 17 Yann Ylavic 2018-05-03 00:49:21 UTC
Created attachment 35906 [details]
Change SHM filenames when not reusable (v5)

Same patch as v4, plus some rework on the cleanup handling (actually a simplification) so that Windows children processes also try to destroy/remove SHMs (and files) when exiting.

Do you still see leaks with this new patch?
Comment 18 dnie 2018-05-03 05:52:55 UTC
(In reply to Yann Ylavic from comment #16)
 
> Do the 41 files left over correspond to the number of restarts and/or
> balancers configurations changes (or rather a random number lower than this)?
> Also, do you issue "immediate" restarts or is there some traffic/sleep in
> between?

Right now, I have absolute not traffic when testing this with my test script. (I will do such test next days)

Here the filenames after some restarts:
(tested with VC15-patch-3-mod_slotmem_shm.rar)

1. Start with 3 Proxy elements 
slotmem-shm-p17ffdef3.shm
slotmem-shm-p17ffdef3_config.shm
slotmem-shm-p17ffdef3_home.shm
slotmem-shm-p17ffdef3_res.shm

2. Adding Proxy "a" with 2 BalancerMembers
httpd -k restart
slotmem-shm-p17ffdef3.shm
slotmem-shm-p17ffdef3.shm.2
slotmem-shm-p17ffdef3_a.shm
slotmem-shm-p17ffdef3_config.shm
slotmem-shm-p17ffdef3_home.shm
slotmem-shm-p17ffdef3_res.shm

3. Removing this Proxy "a"
httpd -k restart
slotmem-shm-p17ffdef3.shm
slotmem-shm-p17ffdef3.shm.2
slotmem-shm-p17ffdef3.shm.3
slotmem-shm-p17ffdef3_a.shm
slotmem-shm-p17ffdef3_config.shm
slotmem-shm-p17ffdef3_home.shm
slotmem-shm-p17ffdef3_res.shm

4. Adding Proxy "a" again with 1 BalancerMember
httpd -k restart
slotmem-shm-p17ffdef3.shm
slotmem-shm-p17ffdef3.shm.2
slotmem-shm-p17ffdef3.shm.3
slotmem-shm-p17ffdef3.shm.4
slotmem-shm-p17ffdef3_a.shm.4
slotmem-shm-p17ffdef3_config.shm
slotmem-shm-p17ffdef3_home.shm
slotmem-shm-p17ffdef3_res.shm

5. Adding 1 another BalancerMembers to Proxy "a"
httpd -k restart
slotmem-shm-p17ffdef3.shm
slotmem-shm-p17ffdef3.shm.2
slotmem-shm-p17ffdef3.shm.3
slotmem-shm-p17ffdef3.shm.4
slotmem-shm-p17ffdef3_a.shm.4
slotmem-shm-p17ffdef3_a.shm.5
slotmem-shm-p17ffdef3_config.shm
slotmem-shm-p17ffdef3_home.shm
slotmem-shm-p17ffdef3_res.shm
Comment 19 Yann Ylavic 2018-05-09 11:26:31 UTC
(In reply to Yann Ylavic from comment #17)
> Created attachment 35906 [details]
> Change SHM filenames when not reusable (v5)

mod_slotmem.so provided by Steffen (thanks!) for this version:
http://people.apache.org/~steffenal/VC15/Patches/VC15-patch-v5-mod_slotmem_shm.rar

How does it work for you?
Comment 20 dnie 2018-05-09 14:30:48 UTC
Looks great related to the slotmemfiles. (tested with VC15-patch-v5-mod_slotmem_shm.rar and my test scripts)

We had some few terminations in our test environment which simulates traffic while modifying the config. We could not realy reproduce this since this day. We are still trying to reproduce this. 
That was tested with VC15-patch-3-mod_slotmem_shm.rar. We will repeat this test with VC15-patch-v5-mod_slotmem_shm.rar and post the result (after my vacation)
Comment 21 dnie 2018-05-16 08:46:53 UTC
Created attachment 35936 [details]
Logfile after changing the port for a BalancerMember

Now tested with v5

The terminations in out test environment occured after changing the port for a BalancerMember. Even without traffic.
Comment 22 Yann Ylavic 2018-05-18 17:38:50 UTC
Created attachment 35940 [details]
One SHM (filename) per generation (v6)

A new approach (based off 2.4.29 code) which creates new slotmems by generation/restart.

Each SHM should be created according to the needs of each (re)startup, and still be cleaned up when the last child process using it exits.

This solves the case where a BalancerMember is renamed (e.g. port change), while previously there was no room available in SHM for what is considered a new member (without knowing whether the old one exists still at this point...).

Anyway, it passes all my tests on Linux, could you please (re-)run yours?
Thanks!
Comment 24 dnie 2018-05-22 07:53:45 UTC
Now it works completly. I do not have any issues with v6.

Thanks!
Comment 25 Yann Ylavic 2018-05-22 08:07:28 UTC
Thanks for your complete testing!

If possible, could you please provide your tests/scripts so that we can integrate or adapt them in our tests suite?
Comment 26 dnie 2018-05-22 09:27:01 UTC
Created attachment 35942 [details]
Test script

My Testscript is a Windows batch file located in the "bin" folder, that uses the files in folder "conf/test" (1.conf,2.conf,...) to modify the httpd active configuration at "conf/extra/registrations" and restarts the httpd. Thats all.
Comment 27 Yann Ylavic 2018-07-02 21:22:04 UTC
Backported to upcoming 2.4.34 (r1834887).