49328 – mod_filter mishandles contexts in subrequests, corrupt output with mod_deflate and mod_include

Bug 49328 - mod_filter mishandles contexts in subrequests, corrupt output with mod_deflate and mod_include

Summary: mod_filter mishandles contexts in subrequests, corrupt output with mod_deflat...

Status:	RESOLVED FIXED

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	mod_filter (show other bugs)
Version:	2.2.14
Hardware:	All All

Importance:	P2 normal with 3 votes (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-05-23 00:37 UTC by Alex Docauer
Modified:	2010-10-25 06:46 UTC (History)
CC List:	2 users (show)

Attachments
Prevent filter_init in mod_filter from destroying it's own harness context (574 bytes, application/octet-stream) 2010-05-23 00:37 UTC, Alex Docauer	Details
patch which fixes ap_invoke_filter_init (1.08 KB, text/plain) 2010-06-08 12:20 UTC, Joe Orton	Details
updated patch to fix ap_invoke_filter_init (2.88 KB, patch) 2010-06-09 04:40 UTC, Joe Orton	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Alex Docauer 2010-05-23 00:37:20 UTC

Created attachment 25472 [details]
Prevent filter_init in mod_filter from destroying it's own harness context

Currently, mod_filter can't be used as a harness for mod_deflate when a 
subrequest may be initiated, specifically from mod_include.  Attempting to 
do so causes data in the response to be lost and/or sent uncompressed even
though the Content-Encoding header specifies a "gzip" encoding.

I originally thought this was related to bug 17629, and comments on that 
bug which describe similar symptoms indicate that many others do, too.  
However, the solution I proposed for this issue failed to address the 
original problem described in that bug, so I no longer think they are related.
I don't believe this is related to bug 43939 either, as mod_filter wasn't being 
used in that case.  A repeatable example of this issue follows.


The httpd configuration loads mod_filter and mod_include and contains these 
directives:

AddType text/html .shtml
AddOutputFilter INCLUDES .shtml

FilterDeclare smart_compress CONTENT_SET
FilterProvider smart_compress DEFLATE resp=Content-Type $text/
FilterChain smart_compress


Contents of example.shtml:

<HTML>
<BODY>
Included text below line
<HR />
<!--#include virtual="/includeme.shtml"-->
</BODY>
</HTML>


Contents of includeme.shtml:

This text was included


Command:

( echo 'GET /example.shtml HTTP/1.1' ; \
         echo 'Host: localhost' ; \
         echo 'Accept-Encoding: gzip' ; \
         echo '' ) | nc localhost 3080


Output:

HTTP/1.1 200 OK
Date: Sun, 23 May 2010 03:01:39 GMT
Server: Apache/2.2.14 (Unix)
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Transfer-Encoding: chunked
Content-Type: text/html

17
This text was included

11

</BODY>
</HTML>

0



Output should be (as viewed in less):

HTTP/1.1 200 OK
Date: Sun, 23 May 2010 03:05:12 GMT
Server: Apache/2.2.14 (Unix)
Last-Modified: Sun, 23 May 2010 02:51:50 GMT
ETag: "2cb08c0-68-4873a01821d80"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 105
Content-Type: text/html

^_<8B>^H^@^@^@^@^@^@^C<B3><F1>^H<F1><F5><B1><E3><B2>q<F2>w<89><B4><E3><F2><CC>K<CE>)MIMQ(I<AD>(QH<CA><C9>/W<C8><C9><CC>K<E5><B2>
<F1>^HR<D0>^G<AA>R<D4><D5>U΄<A8>Q(<CB>,*)M̱U<U+0487><8A><E4><A6><EA>^Ug<94><E4><E6>(<E9><EA>^B<95><EA>CL<B4>чX^@^@<B5>r<8E>Gh^@^@^@



mod_filter appears to be the only module that uses the filter_init_func field
of the ap_filter_rec_t structure.  This member is a pointer to function that is
to be run right after the insert_filter hooks are run and right before the
content handler is invoked.  As most modules do their initialization at the
time they are called to process data, this feature is largely unused.  However,
mod_filter chooses to do its initialization here  so that it can correctly call
the filter_init_func of each provider if it exists.  

mod_filter sets filter_init_func to be a pointer to its filter_init function. 
In filter_init, it first creates the context for the harness, before calling
filter_init_func of each provider.  The initial context for each provider and
the ongoing context of the subsequently invoked provider are all stored in the
harness context.

The problem arises in that mod_filter intuitively assumes that filter_init will
only ever be called once during the lifetime of the filter harness.  However,
when mod_include creates a subrequest, all of the filters from the main request
get copied into the subrequest.  Before the content handler is invoked in the
subrequest, the filter_init_func of every filter in the subrequest gets called.
 This means that the filter_init_func of the filter harness gets called again,
one addition time for each subrequest.

The first thing that mod_filter does in filter_init is to assign the context
pointer to freshly allocated memory.  When called more than once, this
essentially destroys the existing harness context and any provider contexts
that it contains.  In the case of mod_deflate, it loses any data waiting in the
compression buffer.  Furthermore, when mod_deflate is invoked again without a
context, it assumes it is being called for the first time in that request. 
When it sees that Content-Encoding is already set to gzip, it removes itself
from the filter chain, leaving the remaining output uncompressed.

I don't know if it's considered correct or ideal operation that
filter_init_func can be called more than once, but as it stands, mod_filter
can't assume that it doesn't.  I am attaching a small patch that will make
filter_init first check for an existing harness context, simply returning if it
finds one already exists. I tested this patch with 2.2.14.

Comment 1 Alex Docauer 2010-05-23 07:31:34 UTC

I'm sorry. The correct output should look more like:

HTTP/1.1 200 OK
Date: Sun, 23 May 2010 11:29:29 GMT
Server: Apache/2.2.14 (Unix)
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 89
Content-Type: text/html

^_<8B>^H^@^@^@^@^@^@^C<B3><F1>^H<F1><F5><B1><E3><B2>q<F2>w<89><B4><E3><F2><CC>K<CE>)MIMQ(I<AD>(QHJ<CD><C9>/W<C8><C9><CC>K<E5><B2>
<F1>^HRз<E3>
<C9><C8>,<86>ȕ'^V+dBUsq<D9><E8>C<F4><DB><E8>C<8C>^C^@<A9>W^L<D9>V^@^@^@

Comment 2 Ruediger Pluem 2010-05-23 10:17:30 UTC

I don't think that your patch is correct. What about filters that expect their init function to be called a second time? The following patch should fix that. Could you please give it a try?

Index: modules/filters/mod_filter.c
===================================================================
--- modules/filters/mod_filter.c        (Revision 947412)
+++ modules/filters/mod_filter.c        (Arbeitskopie)
@@ -100,10 +100,16 @@
 {
     ap_filter_provider_t *p;
     provider_ctx *pctx;
+    provider_ctx *pctx_head;
     int err;
     ap_filter_rec_t *filter = f->frec;
+    harness_ctx *fctx = f->ctx;

-    harness_ctx *fctx = apr_pcalloc(f->r->pool, sizeof(harness_ctx));
+    /* Only allocate if not done previously */
+    if (!fctx) {
+       fctx = apr_pcalloc(f->r->pool, sizeof(harness_ctx));
+    }
+    pctx_head = fctx->init_ctx;
     for (p = filter->providers; p; p = p->next) {
         if (p->frec->filter_init_func == filter_init) {
             ap_log_cerror(APLOG_MARK, APLOG_ERR, 0, f->c,
@@ -112,6 +118,15 @@
         }
         else if (p->frec->filter_init_func) {
             f->ctx = NULL;
+            /*
+             * Check if we already had a context for this filter and if
+             * yes restore it.
+             */
+            for (pctx = pctx_head; pctx; pctx = pctx->next) {
+                if (pctx->provider == p) {
+                    f->ctx = pctx->ctx ;
+                }
+            }
             if ((err = p->frec->filter_init_func(f)) != OK) {
                 ap_log_cerror(APLOG_MARK, APLOG_ERR, 0, f->c,
                               "filter_init for %s failed", p->frec->name);

Comment 3 Alex Docauer 2010-05-23 12:22:00 UTC

(In reply to comment #2)
> I don't think that your patch is correct. What about filters that expect their
> init function to be called a second time? The following patch should fix that.
> Could you please give it a try?

I guess the question is whether filter_init_func _should_ be called more than 
once. It's description in util_filter.h is:

/** The function to call before the handlers are invoked. Notice
 * that this function is called only for filters participating in
 * the http protocol. Filters for other protocols are to be
 * initialized by the protocols themselves.
 */

I read "before the handlers are invoked" to mean that I should never expect 
the filter_init_func to be called after the handler has been invoked.  I think the 
best fix to this problem would be to make it so that Apache never calls 
filter_init_func more than once.  However, that might require a change to the
request_rec, and thus can only be incorporated into 2.3 or later.

So, I would argue:

1) If it's decided that the behavior of calling filter_init_func a second time in a 
subrequest needs to be preserved for compatibility with existing 2.2 modules, 
then I would only apply your patch to the 2.2 branch while in trunk eliminating
multiple calls to filter_init_func instead.

2) Any filter that relies on it's context being preserved in a subrequest is already 
broken when used with mod_filter.  After fixing mod_filter's context handling, 
failing to call filter_init_func a second time shouldn't introduce new negative
behavior.

3) I can't think of a case where one might actually expect filter_init_func to 
be called twice.  Modules in the wild today that use filter_init_func most likely 
break in subrequests.

Comment 4 Joe Orton 2010-05-24 06:47:08 UTC

I've added a test case for this, http://svn.apache.org/viewvc?rev=947593&view=rev

(will look at the patches shortly)

Comment 5 Joe Orton 2010-06-08 12:20:11 UTC

Created attachment 25550 [details]
patch which fixes ap_invoke_filter_init

(Foreword: great analysis of both this and bug 17629, Alex - thanks)

1) I don't think it makes any sense for a filter to expect a second "filter_init_func" invocation here merely because a subrequest has been created and changed some independent part of the filter chain.

2) Although there is a vague caveat regarding this in the header, the code will AFAICT re-run the "init" functions for any protocol/connection-level filters for each request and each sub-request therein, which would be kind of insane if it were the intended API.

So I would argue we should simply "clarify" the API description here and fix the bug, rather than attempt to work around it.  Fixing the bug is fairly simple, patch is attached and fixes the test case without causing any other regressions in the test suite, for what that's worth.

I was hoping for an all-in-one solution for both this and bug 17629, by more cleanly separating the subreq's filter chain from the main request_rec, as mentioned in dev@ thread today, but couldn't get that working without regressions.

Comment 6 Joe Orton 2010-06-09 04:40:10 UTC

Created attachment 25554 [details]
updated patch to fix ap_invoke_filter_init

Finished off version of patch to fix the filter_init process.

I'd hope this is uncontroversial for trunk, and will commit soonish unless I hear otherwise.

I'll propose for 2.2.x but we could go with either Alex's or Ruediger's patch there instead if there is concern; I guess if there's concern it must be to retain the multi-init "feature" so Ruediger's patch is the way.

Comment 7 Ruediger Pluem 2010-06-09 11:00:04 UTC

(In reply to comment #6)
> Created an attachment (id=25554) [details]
> updated patch to fix ap_invoke_filter_init
> 
> Finished off version of patch to fix the filter_init process.
> 
> I'd hope this is uncontroversial for trunk, and will commit soonish unless I
> hear otherwise.

+1

> 
> I'll propose for 2.2.x but we could go with either Alex's or Ruediger's patch
> there instead if there is concern; I guess if there's concern it must be to
> retain the multi-init "feature" so Ruediger's patch is the way.

I think this is even fine for backporting to 2.2.x as I am now convinced that calling it on a subrequest does not make sense.

Comment 8 Joe Orton 2010-06-10 08:56:48 UTC

Thanks for reviewing, Ruediger.

trunk: http://svn.apache.org/viewvc?view=revision&revision=953311

Comment 9 Alex Docauer 2010-09-30 15:06:23 UTC

I just wanted to say that we've validated that this patch resolves our issue and are moving to production with 2.2.16.  As far as I'm concerned, this bug can be closed.

Comment 10 Nick Kew 2010-10-03 22:13:15 UTC

Alex, given that it's your bug and you're happy with the conclusion, it would've made sense for you to close it.  Don't be shy if there's a next time!

Comment 11 Rainer Perske 2010-10-13 09:48:27 UTC

Sorry, I disagree. I have verified that the patch is part of 2.2.16, but the
bug is not fixed nevertheless:

Situation:

error.shtml contains: <h1>TEST</h1><!--#include virtual="error.php" -->

error.php contains: HELLO

Relevant lines from httpd.conf:

ScriptAlias /sys-bin/ "/www/sys-bin/"
Action php5-script /sys-bin/php5/php
AddHandler php5-script php
AddType text/html .shtml
AddOutputFilter includes;deflate .shtml

(When removing ";deflate" from the AddOutputFilter line, the problem is circumvented and the output delivered uncompressed to the browser.)

Request:

GET /WWWTest/error.shtml HTTP/1.1
Host: www
User-Agent: Dummy/1.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

Response (via Cut&Paste, thus not exactly):

HTTP/1.1 200 OK
Date: Wed, 13 Oct 2010 13:40:51 GMT
Server: Apache
Last-Modified: Wed, 13 Oct 2010 13:20:21 GMT
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Type: text/html
Keep-Alive: timeout=1, max=500
Connection: Keep-Alive
Transfer-Encoding: chunked

6
HELLO

20
���
��2��3�
0

Obviously the output of the PHP handler does not get compressed and thus the output is broken.

Comment 12 Rainer Perske 2010-10-25 06:33:37 UTC

With 2.2.17 my test case does not produce the bug any longer. Can it be that the fix for Bug 43939 by accident fixed my test case, too?

Comment 13 Nick Kew 2010-10-25 06:46:00 UTC

(In reply to comment #12)
> With 2.2.17 my test case does not produce the bug any longer. Can it be that
> the fix for Bug 43939 by accident fixed my test case, too?

No.  It's because you hijacked a *different* bug report that just happened to have similar symptoms to 43939 (and is in fact a much older issue).