Bug 52342 - ap_internal_redirect dropping filters means inconsistent behaviour for includes
Summary: ap_internal_redirect dropping filters means inconsistent behaviour for includes
Status: RESOLVED LATER
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_include (show other bugs)
Version: 2.2.21
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: MassUpdate
Depends on:
Blocks:
 
Reported: 2011-12-15 16:15 UTC by Matthew Byng-Maddick
Modified: 2018-11-07 21:09 UTC (History)
0 users



Attachments
mod_rewrite and mod_http patch to allow internal_redirect to retain filters and notes (9.40 KB, application/octet-stream)
2011-12-15 16:15 UTC, Matthew Byng-Maddick
Details
Allows SSI to consider following internal-redirects on its subrequests as valid (3.32 KB, patch)
2012-01-24 18:33 UTC, Matthew Byng-Maddick
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Byng-Maddick 2011-12-15 16:15:47 UTC
Created attachment 28076 [details]
mod_rewrite and mod_http patch to allow internal_redirect to retain filters and notes

In our setup, we have people using .htaccess files (still). We also have lots of proxying between various different service groups of the website.

In order to make the proxying consistent, we have a global main config ProxyPass rule, which points to the correct service group for that environment (eg. live, stage, test etc).

So, in main config:
ProxyPass /_proxy_/pal/ http://pal.<environment>.<site>/

And then in .htaccess:
RewriteRule mything /_proxy_/pal/myotherthing

Which, because it's in .htaccess gets internal-redirected and properly handled by the mod_proxy stage. All good.

We *also* use mod_include extensively on this platform, and we have an extension "sssi" to which we make sure we add an INCLUDES Output filter.

What we end up with, however, is that if you have, in .htaccess:
RewriteRule ^mything\.sssi$ http://pal.<environment>.<site>/mysite/mything.sssi [P]
then the include gets parsed, as expected. If, on the other hand, you have:
RewriteRule ^mything\.sssi$ /_proxy_/pal/mysite/mything.sssi
then the includes filter gets stripped (described below) and the inclusion just gets injected into the page instead of being parsed as sssi.

This is basically due to the default behaviour of internal_internal_redirect() in modules/http/http_request.c, which strips output filters (except for the sub_req filter in the case that that's being used) and replaces a completely new r->notes table.

This is sensible behaviour when an internal redirect is being used to, say, generate an error page, or similar, but is less useful when you then get different behaviour in mod_rewrite depending on whether you put the RewriteRule in main config or .htaccess and where in main config it goes.

The attached patch adds a new ap_internal_redirect_filter() function in the httpd public API, which keeps the input and output filter chain and notes (while still replacing the environment) by changing the calling convention slightly for internal_internal_redirect() which isn't part of the public API, and adds a new RewriteOptions option to mod_rewrite which allows it to do the ap_internal_redirect_filter() when it redirects, rather than always doing the ap_internal_redirect() which will strip the filter.

(the reason for the 1<<3 as the value rather than 1<<2 is that we're also currently applying 1<<2 as part of the bug in 48304 - which is now merged with httpd-2.4.x)
Comment 1 Graham Leggett 2011-12-17 13:36:10 UTC
I'm smelling a bug - if you're using AddOuputFilter, that's all processed within mod_mime.c in the type_checker hook, which in turn runs before fixups, and this in turn is triggered from ap_process_request_internal(), which is in turn triggered from ap_internal_redirect(), which is called by mod_rewrite in your case, so the output filters should be re-added when the request comes round again, the trick is finding out why this isn't happening.

Further digging finds two possible causes, and we might be suffering both:

- Something inside mod_mime.c:find_ct is aborting early and isn't reaching the point where output_filters are added. Would it be possible to put a breakpoint here and see if we ever reach this code? We'll reach it at least once for the initial request, key though is that we should reach this again for the internal direct, if we don't, we'd need to see why it's bailing out early. Most specifically, do we reach this code:

           if (exinfo->output_filters && r->proxyreq == PROXYREQ_NONE) {
               const char *filter, *filters = exinfo->output_filters;
               while (*filters
                   && (filter = ap_getword(r->pool, &filters, ';'))) {
                   ap_add_output_filter(filter, NULL, r, r->connection);
               }
               if (conf->multimatch & MULTIMATCH_FILTERS) {
                   found = 1;
               }
           }

I suspect we might be reaching this code, only to find we've detected it as a proxy request, and therefore aren't adding filters (In other words "r->proxyreq == PROXYREQ_NONE" is false).

I can kinda-sorta see why we might want to not apply extensions to proxy requests, but then at the same time I also see cases like this one where we would want to - the URL ends with .sssi, whether it comes from a proxy or not, who cares, we want the INCLUDES filter.

I suspect the fix might be to add a config directive that allows the end user to ignore the "r->proxyreq == PROXYREQ_NONE" bit, or to just take that away entirely (in 2.4).

In fact, this might be the cause of the inconsistency between using a ProxyPass and the mod_rewrite [P] flag - in the ProxyPass case, r->proxyreq == PROXYREQ_NONE is false and this code is bypassed. In the [P] flag case, r->proxyreq == PROXYREQ_NONE is true because the [P] flag hasn't been evaluated yet (I *think* that happens 1 phase later, in fixups), so we get our filters in this case.

Try this patch in other words, does this make it work?

Index: modules/http/mod_mime.c
===================================================================
--- modules/http/mod_mime.c	(revision 1214429)
+++ modules/http/mod_mime.c	(working copy)
@@ -895,7 +895,7 @@
             * setting redundant filters.    2, we insert these in the types
             * config hook, which may be too early (dunno.)
             */
-            if (exinfo->input_filters && r->proxyreq == PROXYREQ_NONE) {
+            if (exinfo->input_filters) {
                const char *filter, *filters = exinfo->input_filters;
                while (*filters
                    && (filter = ap_getword(r->pool, &filters, ';'))) {
@@ -905,7 +905,7 @@
                    found = 1;
                }
            }
-            if (exinfo->output_filters && r->proxyreq == PROXYREQ_NONE) {
+            if (exinfo->output_filters) {
                const char *filter, *filters = exinfo->output_filters;
                while (*filters
                    && (filter = ap_getword(r->pool, &filters, ';'))) {

- AddOutputFilter doesn't actually mean "add" (and this is broken). Tracing the code, when we merge configs from one location/directory to another, we end up replacing the previous filter definition for that extension, instead of adding it. Or to put this another way, if you have this:

AddOutputFilter FOO html
<Location /baz>
 AddOutputFilter BAR html
</Location>

The BAR filter ends up replacing the FOO filter completely for the html type, which is broken - add should mean just that, add, not replace.

To test this, put a breakpoint inside mod_mime:overlay_extension_mappings, and see if you reach this code:

   if (overlay_info->output_filters) {
       new_info->output_filters = overlay_info->output_filters;
   }

In theory, you should reach this code twice. If this bug is present, you'll reach a point where new_info->output_filters evaluates to "INCLUDES" (or something containing INCLUDES), and overlay_info->output_filters evaluates to something else (like "BUFFER", or "DEFLATE", or...).
Comment 2 Matthew Byng-Maddick 2012-01-16 12:26:31 UTC
I've spent a lot more time thinking about this, and Graham (minfrin) and I have had some discussions about this in private email...

There's a definite issue in that a sub-request delegation, while it, obviously, creates a new request_rec doesn't actually scan through the filters and update f->r for each filter in the chain, if an internal_redirect happens, it creates the new request_rec, and does scan the filters and reset f->r to the redirected request_rec.

While it's obvious that if you're not in a sub-request context this is the right behaviour for the internal_redirect, the inconsistency between a plain sub-request and a sub-request that you end up internal_redirecting seems wrong. My patch is definitely wrong in (a) keeping the filter chain - in fact, my initial claims were wrong - though I don't like them, because of the order of processing, and (b) adjusting the notes.

Now, I suspect that we can't just change this unilaterally - because it is such a major change in behaviour - so the question is what should happen? Do we keep the separate "don't update the f->r s" function, and allow mod_rewrite to call that with a RewriteOption - this seems really ugly, as there are presumably other situations where the internal_redirect could happen in a sub-request and those won't necessarily call it? Do we have some flag set on any request_rec, but then you need to traverse the request_rec calling chain to find out what you're supposed to be doing? Other suggestions?

I may patch our local copy of mod_rewrite to solve my immediate problems with something like the above, and apply minfrin's patch within this ticket to drop the r->proxyreq check off mod_mime.c, but it would be nice if there were a sensible way of solving this in a consistent way...

Cheers

MBM
Comment 3 Joe Orton 2012-01-18 16:46:48 UTC
"The BAR filter ends up replacing the FOO filter completely for the html type,
which is broken - add should mean just that, add, not replace."

The AddBlah directives *all* work this way with mod_mime; I agree it is not particularly intuitive, but that it how they have always worked (and have been documented!).  If you are trying to set up the filter chain for a reverse proxy config surely SetOutputFilter is more appropriate?

Is it possible to create a minimal self-contained test case which demonstrates the issue you're seeing here, using built-in modules?  I'm struggling to understand what exactly the issue is here.
Comment 4 Matthew Byng-Maddick 2012-01-18 16:54:25 UTC
Hi Joe,

The proxy thing is perhaps confusing the issue. Maybe this is more easily done by doing something like:

<Proxy balancer://.../**/*.sssi>
  SetOutputFilter INCLUDES
</Proxy>

Or there are possibly other ways of working around this.

I'm actually more concerned, as I say above, by the difference in behaviour of the filters going:
main->sub_request, and main->(sub_request->internal_redirect), which I think is very broken.

Cheers

MBM
Comment 5 Joe Orton 2012-01-19 12:05:09 UTC
(In reply to comment #4)
> Hi Joe,
> 
> The proxy thing is perhaps confusing the issue. Maybe this is more easily done
> by doing something like:
> 
> <Proxy balancer://.../**/*.sssi>
>   SetOutputFilter INCLUDES
> </Proxy>

LocationMatch would be the obvious choice to me.

> I'm actually more concerned, as I say above, by the difference in behaviour of
> the filters going:
> main->sub_request, and main->(sub_request->internal_redirect), which I think is
> very broken.

I don't understand your explanation of what it is broken, I'm afraid.  You can try again, but a test case would be a way to communicate unambiguously.

In the case where, say:

1. main is uri /X
2. main requires a subreq for, say, SSI exec, so main->subreq is /X/Y
3. filter FOO is configured to apply to ONLY /X/Y, and is added to r->output_filters
4. main->subreq does an internal redirect to /X/Z
5. main->subreq's handler is run

at step (5), is your contention that FOO should still be in the filter chain for main->subreq's r->output_filters?  Or am I missing the point completely?
Comment 6 Matthew Byng-Maddick 2012-01-23 19:26:45 UTC
So, having dug a lot deeper, and with a lot more ap_log_rerror(APLOG_MARK, APLOG_DEBUG...) lines, I've finally found why this doesn't work, it's not actually to do with keeping the filters or not, though obviously keeping the existing filter, and the existing context in the internal-redirect case does help (but is wrong).

It's actually what happens in mod_include, basically, in handle_include() (which handles the <!--#include tag, we make the subrequest (as a file or uri), which creates a completely empty r->request_config, fine. Before we call ap_run_sub_req() on the new subrequest request_rec (which is actually called "rr" in that function) we do:
        /* See the Kludge in includes_filter for why.
         * Basically, it puts a bread crumb in here, then looks
         * for the crumb later to see if its been here.
         */
        if (rr) {
            ap_set_module_config(rr->request_config, &include_module, r);
        }

In the actual filtering call (includes_filter()), we find:
    if ((parent = ap_get_module_config(r->request_config, &include_module))) {
        /* Kludge --- for nested includes, we want to keep the subprocess
         * environment of the base document (for compatibility); that means
         * torquing our own last_modified date as well so that the
         * LAST_MODIFIED variable gets reset to the proper value if the
         * nested document resets <!--#config timefmt -->.
         */
        r->subprocess_env = r->main->subprocess_env;
        apr_pool_join(r->main->pool, r->pool);
        r->finfo.mtime = r->main->finfo.mtime;
    }
    else {
        /* we're not a nested include, so we create an initial
         * environment */
        ap_add_common_vars(r);
        ap_add_cgi_vars(r);
        add_include_vars(r);
    }

This is obviously absolutely horrid, and explains why this doesn't work. I'm guessing that the actual handling should be something like:
    initr=r;
    while(initr->prev) {
        initr = initr->prev;
    }
    if ((parent = ap_get_module_config(initr->request_config, &include_module))) {
        r->subprocess_env = parent->subprocess_env;
...
using parent (as the token stuck in) rather than r->main.

With that in place, it can still be perfectly possible to set filters based on existing stuff, without necessarily needing Graham's patches to mod_mime which I definitely have concerns about.

This seems like the right solution here, though because you don't have a proper r->request_config set up, I'm wondering if it can be done with a switch on mod_include, so as not to change existing functionality generically.

Cheers

MBM
Comment 7 Matthew Byng-Maddick 2012-01-24 18:33:07 UTC
Created attachment 28201 [details]
Allows SSI to consider following internal-redirects on its subrequests as valid

The above has the kind of properties we want - that we follow the internal-redirect chain back to its source, which means that we can see the original sub-request's tagging (or not), and hence act just like we would within that sub-request. This is, of course, a kludge on a kludge, and in many ways I don't like it.

There are other ways of making sure that the mod_include filter actually gets activated, of course, which mean that applying stuff by extension where you're using mod_proxy possibly isn't the right thing after all.

That the location that the hack acts on is the source location rather than the destination is a bit of an annoyance, but if you're using mod_rewrite and .htaccess (and probably SSI) the source is likely to be the one you have more control over...

Cheers

MBM
Comment 8 William A. Rowe Jr. 2018-11-07 21:09:05 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.