50002 – Restart with many vhosts taking forever [patch]

Bug 50002 - Restart with many vhosts taking forever [patch]

Summary: Restart with many vhosts taking forever [patch]

Status:	RESOLVED FIXED

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	Core (show other bugs)
Version:	2.2.16
Hardware:	PC Linux

Importance:	P2 minor (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:	FixedInTrunk

Depends on:
Blocks:

Reported:	2010-09-25 05:04 UTC by GDR!
Modified:	2020-04-09 18:59 UTC (History)
CC List:	1 user (show)

Attachments
Patch that kind-of fixes this issue (1.56 KB, patch) 2010-09-25 05:04 UTC, GDR!	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description GDR! 2010-09-25 05:04:10 UTC

Created attachment 26073 [details]
Patch that kind-of fixes this issue

Hello,

I've noticed that after adding 10 000 vhosts or more, Apache is taking abnormally long time to restart. I've ran oprofile when restart was running and here is where it spent 94% of its time:

http://gdr.pastebin.pl/28191

It was looking for the last non-null element of a leaf list in directive tree. 

Please note that I know that the patch I'm proposing probably isn't applicable to mainline source, it's more that I want to indicate that there is a problem. Anyway, patch description follows:

Because changing the tree structure to some else would break compatibility with modules, I've decided to address it by extending ap_directive_t with an extra field, "last".

In the first leaf on a given level, leaf->last keeps a pointer to last known non-null element. It may not be the last non-null element in that list, but it's still closer to the end than first element.

It decreased restart time from minutes to several seconds (6 secs for 30k vhosts).

If you would rather browse source than read patch, it's at http://github.com/gjedeer/httpd

Comment 1 Nick Kew 2010-09-25 05:25:15 UTC

You haven't actually described a bug.  Please reopen with details if you want to describe something different from 41887.

*** This bug has been marked as a duplicate of bug 41887 ***

Comment 2 GDR! 2010-09-25 05:58:23 UTC

This is not the same bug.

What slows things down here can be described as following:

Apache restarts. It kills all the subprocesses and re-reads configs. Re-reading configs (stored on a local, very fast disk) takes little I/O and 100% CPU. For 30k vhosts, it takes from 3 to 5 minutes before this phase finishes. During that time, server is inaccessible.

This happens because of inefficient processing of lots of <VirtualHost> directives and my patch addresses that as described previously.

Bug 41887 is about lowering I/O by avoiding stat calls. This bug is about avoiding unnecessarily high CPU usage and lowering config reading time.

HOW TO REPRODUCE:

Generate a config file with 30 000 entries similar to this one:
http://gdr.pastebin.pl/28193
They may all point to the same directory, it doesn't matter for this test. Restart httpd, measure time when sites are inaccessible (several minutes). Observe CPU usage (100% usage of 1 CPU core).

Apply patch, test again. The server is up in seconds.

Comment 3 Nick Kew 2010-09-25 06:55:16 UTC

(In reply to comment #2)

> Apply patch, test again. The server is up in seconds.

And what happens when you apply the pr41887 patch?

Comment 4 GDR! 2010-09-26 07:36:20 UTC

(In reply to comment #3)

With patch 41887 and -T:
http://gdr.pastebin.pl/28236

With patch 50002:
http://gdr.pastebin.pl/28237

Unpatched httpd:
http://gdr.pastebin.pl/28238

Please note that it's an idle test server, on production machines it really takes minutes but for obvious reasons I won't do measurements there.

Comment 5 GDR! 2010-09-30 16:29:22 UTC

We're using server with this patch for 2 days now on a production server with 29342 vhosts and I can see no side effects. Also, the restart time went down from 2:31 to 0:09. I'll be installing it on our other servers because everything seems work well.

Comment 6 Stefan Fritsch 2010-10-02 11:02:17 UTC

Thanks for the patch. Commited in r1003808

Though it would really make sense to use some mass virtual hosting module for this number of vhosts

Comment 7 GDR! 2010-10-04 05:46:37 UTC

(In reply to comment #6)
> Thanks for the patch. Commited in r1003808
> 
> Though it would really make sense to use some mass virtual hosting module for
> this number of vhosts

It would, however sometimes it's just not applicable. I have researched this before diving into profiling the core.

Comment 8 Stefan Fritsch 2012-02-26 17:08:33 UTC

fixed in 2.4.1