Created attachment 26073 [details] Patch that kind-of fixes this issue Hello, I've noticed that after adding 10 000 vhosts or more, Apache is taking abnormally long time to restart. I've ran oprofile when restart was running and here is where it spent 94% of its time: http://gdr.pastebin.pl/28191 It was looking for the last non-null element of a leaf list in directive tree. Please note that I know that the patch I'm proposing probably isn't applicable to mainline source, it's more that I want to indicate that there is a problem. Anyway, patch description follows: Because changing the tree structure to some else would break compatibility with modules, I've decided to address it by extending ap_directive_t with an extra field, "last". In the first leaf on a given level, leaf->last keeps a pointer to last known non-null element. It may not be the last non-null element in that list, but it's still closer to the end than first element. It decreased restart time from minutes to several seconds (6 secs for 30k vhosts). If you would rather browse source than read patch, it's at http://github.com/gjedeer/httpd
You haven't actually described a bug. Please reopen with details if you want to describe something different from 41887. *** This bug has been marked as a duplicate of bug 41887 ***
This is not the same bug. What slows things down here can be described as following: Apache restarts. It kills all the subprocesses and re-reads configs. Re-reading configs (stored on a local, very fast disk) takes little I/O and 100% CPU. For 30k vhosts, it takes from 3 to 5 minutes before this phase finishes. During that time, server is inaccessible. This happens because of inefficient processing of lots of <VirtualHost> directives and my patch addresses that as described previously. Bug 41887 is about lowering I/O by avoiding stat calls. This bug is about avoiding unnecessarily high CPU usage and lowering config reading time. HOW TO REPRODUCE: Generate a config file with 30 000 entries similar to this one: http://gdr.pastebin.pl/28193 They may all point to the same directory, it doesn't matter for this test. Restart httpd, measure time when sites are inaccessible (several minutes). Observe CPU usage (100% usage of 1 CPU core). Apply patch, test again. The server is up in seconds.
(In reply to comment #2) > Apply patch, test again. The server is up in seconds. And what happens when you apply the pr41887 patch?
(In reply to comment #3) With patch 41887 and -T: http://gdr.pastebin.pl/28236 With patch 50002: http://gdr.pastebin.pl/28237 Unpatched httpd: http://gdr.pastebin.pl/28238 Please note that it's an idle test server, on production machines it really takes minutes but for obvious reasons I won't do measurements there.
We're using server with this patch for 2 days now on a production server with 29342 vhosts and I can see no side effects. Also, the restart time went down from 2:31 to 0:09. I'll be installing it on our other servers because everything seems work well.
Thanks for the patch. Commited in r1003808 Though it would really make sense to use some mass virtual hosting module for this number of vhosts
(In reply to comment #6) > Thanks for the patch. Commited in r1003808 > > Though it would really make sense to use some mass virtual hosting module for > this number of vhosts It would, however sometimes it's just not applicable. I have researched this before diving into profiling the core.
fixed in 2.4.1