Bug 41111

Summary:	New option for filters: run only if there are spare CPU cycles
Product:	Apache httpd-2	Reporter:	Dan Harkless <apache-issues>
Component:	mod_deflate	Assignee:	Apache HTTPD Bugs Mailing List <bugs>
Status:	NEW ---
Severity:	enhancement
Priority:	P2
Version:	2.5-HEAD
Target Milestone:	---
Hardware:	All
OS:	All

Description Dan Harkless 2006-12-05 19:08:35 UTC

It would be nice if mod_deflate had a directive you could use to tell it to only
do compression if there are spare CPU cycles available.  I'm concerned about
turning on mod_deflate because of the increased load on my server, especially if
I were to be hit by badly-behaved robots, parallel downloaders, Slashdotting, etc.

Defining "spare CPU cycles available" could be a bit tricky, of course.  It'd be
great to be able to just tell it not to compress if doing so would peg the CPU
to 100% (or a user-definable threshhold), but implementing that would be tricky
since different content can be more or less CPU-intensive to attempt to
compress.  An average-case or worst-case guess could be used, but it'd still
vary by CPU type and speed as to how much of a dent that overhead would have on
available CPU cycles.  I suppose it'd have to benchmark itself to have good
predictive ability, which would be starting to get pretty complex.

A simpler approach could be to have the directive be called, e.g.
DeflateIfCPUUsageBelow.  If the user specified 'DeflateIfCPUUsageBelow 75',
mod_deflate would check to see if current CPU usage were below 75% in order to
compress the given content.  The user would be left to do their own measurements
to see how much overhead mod_deflate uses for compression and thus where to set
that threshhold.

And perhaps instead of basing the decision on instantaneous CPU usage for the
current CPU, it'd make more sense for the directive to work in terms of load
averages, although I've always found those somewhat fuzzy and hard to use as a
basis for decision-making.

In any case, I think this would be a nice option because machines whose
bandwidth is constrained enough to make mod_deflate highly desirable are often
going to also not be the *fastest* machines in the world.

Comment 1 Nick Kew 2006-12-06 04:52:13 UTC

An interesting suggestion.  Are you aware of mod_load_average, which does a
similar job for handlers?

Your comment about mod_deflate could apply to other filters in a similar manner,
and a load_average check could apply in mod_filter.  Why isn't there a bugzilla
entry for mod_filter?

The difficulty here (as in mod_load_average) is a cross-platform way to define load.

Also, bear in mind mod_cache for your own purposes.

Comment 2 Ruediger Pluem 2006-12-06 11:36:56 UTC

I agree with Nick. We should aim for a general solution here that could be used
via mod_filter. In my experience CPU usage is only usable if you use an average
value of CPU usage over a reasonable amout of time. Otherwise you just get
unreasonable flip flops. Thats why I would regard load average as more reliable.
But as Nick pointed out there is a problem to define and measure load platform
independently.
BTW: Shouldn't we move this discussion to dev@httpd? I think continuing this
discussion here is somewhat pointless.

Comment 3 Dan Harkless 2006-12-06 12:20:35 UTC

Ah yes, I think I did see a reference to mod_load_average some time back when I
was researching Apache throttling options, but I'd forgotten about it.  Seems to
be undocumented and not supported by its author, though.  It's not featured with
his other modules on http://www.outoforder.cc/, for instance.

Sounds good to make this general-purpose for filters.

Thanks for the pointer to mod_cache.  It might be good to add a note about it to
the mod_deflate documentation, as it wouldn't necessarily be obvious to people
who aren't experts on Apache internals that it would cache output from
mod_deflate so it wouldn't have to be recompressed next time.  I was confused as
to how caching would work with mod_deflate (without having a separate caching
proxy instance) -- for instance, a lot of the stuff I read online (outside of
httpd.apache.org) about it claimed it did its *own* caching, but I couldn't find
any evidence of that in the documentation.

Yes, good point about "instantaneous CPU usage", a la 'top', being a misnomer,
since clearly it has to average over some time period to be meaningful -- it's
just a shorter time period than with load averages, without any extra factors
thrown in to the calculation, and expressed in terms of an easy to understand
0-100%, rather than load averages which can go arbitrarily high.  And yeah, I
hadn't really thought about this on Windows -- load averages would be even
harder to understand for Windows jockeys.