Bug 61666 - regression on httpd reload (could not bind to address 0.0.0.0:80)
Summary: regression on httpd reload (could not bind to address 0.0.0.0:80)
Status: CLOSED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mpm_event (show other bugs)
Version: 2.4.29
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-25 15:03 UTC by Thomas Jarosch
Modified: 2017-11-23 09:00 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Jarosch 2017-10-25 15:03:24 UTC
We have an autotest based automated test system that automatically tests various httpd and mod_proxy configurations. A few tests failed 100% of the time after updating from 2.4.25 to 2.4.29.

As soon as the script issues 

As soon as a script issues a "httpd reload", this appears in the logs:

[mpm_event:notice] [pid 2452:tid 3072501504] AH00494: SIGHUP received.  Attempting to restart
(98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80
[mpm_event:alert] [pid 2452:tid 3072501504] no listening sockets available, shutting down


I've tracked it down to this change in httpd 2.4.26:

  *) Don't set SO_REUSEPORT unless ListenCoresBucketsRatio is greater
     than zero.  [Eric Covener]

Commit in questions is this one:
https://github.com/apache/httpd/commit/d3e0ad6e2ba88f465765e0a8c315189c311583a5

Once I revert it, "httpd reload" works just fine.

Looks like graceful shutdowns now cause trouble when doing httpd reloads?
Comment 1 Thomas Jarosch 2017-10-25 15:03:47 UTC
PS: There's no 2.4.29 version in bugzilla yet, 2.4.28 is the highest as of writing.
Comment 2 Eric Covener 2017-10-25 16:23:05 UTC
Do you set ListenCoresBucketsRatio? Were you by any chance running tests work prior to 2.4.17 (prior to SO_REUSEPORT)?
Comment 3 Thomas Jarosch 2017-10-25 16:37:03 UTC
ListenCoresBucketsRatio is not explicitly set as far as I can tell.

Relevant part of httpd.conf:
-------------------
Timeout 300
KeepAliveTimeout 300
GracefulShutdownTimeout 1800

<IfModule mpm_event_module>
  # MinSpareThreads         10
  # MaxSpareThreads         250
  #
  # Number of concurrent connections is: ServerLimit * ThreadsPerChild
  # Result: 160 * 5 -> 800
  #
  StartServers 1
  ServerLimit 160
  ThreadLimit 5
  ThreadsPerChild 5
  MaxConnectionsPerChild  5000
</IfModule>

Listen 192.168.1.254:80
Listen 192.168.1.254:443
Listen 127.0.0.1:80
Listen 127.0.0.1:443
Listen 447
-------------------


I'll dig through our git history tomorrow at which pointed we switched from httpd 2.2.x to 2.4.x. There was also another switch later on from prefork mpm to the event mpm. 2.4.17 was released around September 2015, we'll see tomorrow.
Comment 4 Eric Covener 2017-10-25 16:46:45 UTC
> Listen 192.168.1.254:80
> Listen 192.168.1.254:443
> Listen 127.0.0.1:80
> Listen 127.0.0.1:443
> Listen 447
> -------------------

Interesting, did you redact? This is not consistent with the failure msg (0.0.0.0:80 / 80)
Comment 5 Thomas Jarosch 2017-10-26 08:41:11 UTC
(In reply to Eric Covener from comment #4)
> > Listen 192.168.1.254:80
> > Listen 192.168.1.254:443
> > Listen 127.0.0.1:80
> > Listen 127.0.0.1:443
> > Listen 447
> > -------------------
> 
> Interesting, did you redact? This is not consistent with the failure msg
> (0.0.0.0:80 / 80)

sorry, I've grabbed the config from another VM while the original VM was executing a different test.

This is how the config looks like:

----
Listen 80
Listen 447
----

The thread limits etc. are identical.

Regarding our upgrade history: We went straight from 2.0.64 to 2.4.7 in January 2014. The switch from the prefork mpm to the event mpm was done already with version 2.4.18 in June 2016.

But it looks like I hit the issue last night again while doing a stress test on mod_proxy with the reverted patch. It just doesn't appear every time.
I'll now double check the patch was in there. If so, I 'll take version 2.4.27 instead of 2.4.29 to make sure we are not searching for a different issue.
Comment 6 Joe Orton 2017-10-26 12:13:47 UTC
For a config like:

Listen 127.0.0.1:80
Listen 80

httpd should fail to start by default.  This was true up to 2.4.16, but the behaviour accidentally regressed in 2.4.17 with the listener bucket code.  From 2.4.26 up it will fail again.

It's possible you had a config working with (2.4.17..25) which now fails, but this is not a regression.

Alternatively, it's possible you have some unrelated issue.  Please provide output from:

strace -ebind ./bin/httpd
Comment 7 Thomas Jarosch 2017-10-26 12:35:48 UTC
(In reply to Joe Orton from comment #6)
> For a config like:
> 
> Listen 127.0.0.1:80
> Listen 80

I don't think we are having a mix of "Listen XX" with an explicit IP address and one without on the same port number. That's a different thing.

In the first config I've posted, it's port 443 and *447*.
(we use 447 to serve a different SSL cert)
Comment 8 Joe Orton 2017-10-26 13:59:12 UTC
All you've shown us is a conflict error message from :80.  If there is some issue here other than a config error you need to show full Listen config & strace output I indicated above.
Comment 9 Thomas Jarosch 2017-10-26 16:32:50 UTC
(In reply to Joe Orton from comment #8)
> All you've shown us is a conflict error message from :80.  If there is some
> issue here other than a config error you need to show full Listen config &
> strace output I indicated above.

the "Listen" lines have been provided in comment #5, here they are again:

Listen 80
Listen 447

# strace -ebind httpd -f /etc/httpd/conf/httpd.conf 
bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
bind(4, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
bind(5, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
bind(5, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
bind(5, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
bind(5, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
bind(4, {sa_family=AF_INET, sin_port=htons(447), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
bind(5, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0

The output is from httpd 2.4.29 + revert of the commit mentioned in comment #1.


I've been running the mod_proxy tests on two machines in a loop all afternoon. So far no single failure. If I stick to vanilla httpd 2.4.29 or 2.4.27, it fails on "httpd reload". Note: This is no *startup* failure, httpd starts fine.
The graceful restart during "httpd reload" -> SIGHUP dies.

What's the normal control flow once a SIGHUP is received in the event mpm?
Comment 10 Thomas Jarosch 2017-10-27 17:06:01 UTC
Finally tracked it down:

The automated system changes the "Listen" directives for a few tests and issues a "httpd reload". While this worked fine in httpd 2.4.25, it stopped working once the SO_REUSEPORT handling was changed in 2.4.26+.

I've adapted the tests to issue a "httpd restart" if the Listen directives change and a quick test finished without errors. I'll run the full testsuite over the weekend.

Here's a diff of the config file before the "httpd reload":

-Listen 172.18.0.1:80
-Listen 172.18.0.1:443
-Listen 192.168.2.1:80
-Listen 192.168.2.1:443
-Listen 127.0.0.1:80
-Listen 127.0.0.1:443
+Listen 80
 Listen 447

So Joe was on the right track in comment #6. Changing from "Listen 127.0.0.1:80" to just "Listen 80" is a pretty uncommon (stupid?) thing.

For me the matter is resolved, I've adapted the code and all should be fine.
If someone else ever stumbles upon this issue, we could think about a warning message in the error log that changing the Listen directives in this way is unsupported during a SIGHUP.

My thanks to everyone involved!

Have a nice weekend,
Thomas
Comment 11 Joe Orton 2017-11-23 09:00:26 UTC
(In reply to Thomas Jarosch from comment #10)
> So Joe was on the right track in comment #6. Changing from "Listen
> 127.0.0.1:80" to just "Listen 80" is a pretty uncommon (stupid?) thing.

That's a pretty interesting special case, at least I wasn't aware of it, so thanks a lot for reporting back!

I documented this in r1816112

https://httpd.apache.org/docs/trunk/bind.html#reload