Bug 14451 - Using mod_deflate on an internally redirected request results in an extra 20 byte gzip header appended to the response.
Summary: Using mod_deflate on an internally redirected request results in an extra 20 ...
Status: CLOSED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: All (show other bugs)
Version: 2.0-HEAD
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-11-11 16:06 UTC by Bruno Wolff III
Modified: 2008-01-07 08:36 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bruno Wolff III 2002-11-11 16:06:20 UTC
When I filter CGI output with the DEFLATE filter (with or without
also using INCLUDES) I get a content-length header with a byte count
off by 20 bytes. The ones I have observed have been low in GET
requests and high in HEAD requests (for the latter a count of 20
was returned even though there was no body data returned).
You can see a sample of this using
http://www.schroepl.net/cgi-bin/http_trace.pl
to fetch the page http://wolff.to/area/G_776.html . I currently am
not compressing output for browsers that supply the string MSIE in
the user-agent header.
I am using a CVS version of 2.0.44 from last week.
Comment 1 Bruno Wolff III 2002-11-15 22:43:07 UTC
I saw that some other fixes to mod_deflate were applied recently
and retested on current CVS and the problem still exists.
Comment 2 Bruno Wolff III 2002-11-22 16:01:59 UTC
Is there some additional information that I can provide that would
help you guys confirm this bug?
I have tried looking through the bucket passing code in the past
when reporting mod_deflate bugs and wasn't able to understand
enough to be able to spot where the errors are.
I see that you are possible going to change the status of mod_deflate
to be a normal module instead of experimental. I have found that
people running IE 5.0 and IE 6.0 have been having a lot of problems
viewing pages generate by cgi scripts and filtered by mod_deflate.
I suspect that the problem is related to the content-length header
issue, but I am not absolutely sure of this. If confirming this would
help increase the priority on figuring this problem out, I could try
using mod_headers to strip the content-length headers and see if that
solves the problem for the IE users. However, I have to ask others
to do the testing and it might take a few days before I get help after
I have some modified pages for them to look at.
Comment 3 Jeff Trawick 2002-11-22 19:13:17 UTC
Supplying the simplest testcase you can come up with that shows the problem
would be very helpful.  I've tried to duplicate the problem with GET requests on
current code but I haven't been successful.  I haven't looked at the HEAD issue yet.
Comment 4 Bruno Wolff III 2002-11-23 03:12:12 UTC
I think I have figured out enough so that you should be able to
reproduce the problem. The missing part is that I do an internal
redirect and that is needed to make the problem show up.
http://wolff.to/area/htaccess (no period) with show the .htaccess
file. The first line (the filter list) and the last line (the
relevant redirect) should be all that matters.
http://wolff.to/area/test.pl , http://wolff.to/area/test.cgi and
http://wolff.to/area/test.html all point to the same file. The first
will give you the source, the second will work correctly and the
third has a content-length header off by 20. However, note that in
the second case the file is 20 bytes longer.
Comment 5 Bruno Wolff III 2002-11-23 17:50:06 UTC
I looked at the difference between the compressed output returned
by test.cgi and test.html and it appears that test.html has an
extra 20 bytes of what appears to be a gzip header tacked on to the
end.
I also updated to use the latest cvs yesterday so I am now running
a 2.1.0 version of httpd.
Comment 6 Bruno Wolff III 2002-11-27 13:49:40 UTC
The HEAD and GET problems appear to be separate. The extra 20 bytes
get added when there is an internal redirect (using mod_rewrite) and
both the new and old extension are subject to mod_deflate filtering.
I added another rewrite rule to the htaccess file and a test.txt file
accessible through a redirect as test1.txt to illustrate this case.
I will see if I can do a better job of isolating the bad
content-length header returns on HEAD requests.
Comment 7 Bruno Wolff III 2002-11-27 14:38:42 UTC
I figured out what was happenning with the odd content-length header
values on HEAD requests. The short story is that it isn't a bug.
The script I was using only returns a content-type header on HEAD
requests (this is probably broken behavior) to save doing database
lookups that won't be used. Even though the body is 0 bytes, it still
gets gzip encoded and this encoding takes up 20 bytes. That is why
there is a content-length of 20.
So the only issue is the extra 20 bytes being tacked on to the body
of requests that have a filtered (by mod_deflate) on both the initial
and final filename extensions.
Comment 8 Bruno Wolff III 2002-11-27 15:11:19 UTC
This is sort of a related note. Normally a content-lenght header
isn't sent when an includes filter is used or when the output
comes from a cgi script. However if mod_deflate is used, then a
content-length header is included for either or both of these cases.
Comment 9 Bruno Wolff III 2002-12-01 22:35:52 UTC
I tried using addoutputfilterbytype to see if that would work around
the problem, but it didn't help.
Comment 10 Bruno Wolff III 2002-12-08 15:24:28 UTC
Is there anything else I can do to help get this bug verified?
I think I have included enough information for someone to check
on it, but haven't heard back either way since figuring out that
combining internal redirects and mod_deflate exhibits the problem.
Comment 11 Bruno Wolff III 2002-12-16 18:35:48 UTC
I was able to find a configuration that avoided the problem in my
setup. I have changed things to reflect this and have removed some
of the test pages I had up.
The successful way to handle *.html URLs that were redirected to
*.cgi URLs (with internal redirects) so that the output was passed
both though INCLUDES and DEFLATE without getting the extra 20 bytes
of gzip header or messing *.html files that aren't redirected is:
AddOutputFilter INCLUDES;DEFLATE .html
AddOutputFilter DEFLATE .txt .pl .sql .cgi
AddOutputFilterByType INCLUDES text/html
This is now all being done at the top level of my document root.
Previously there was an addoutputfilter for .cgi files that only
specified DEFLATE at the top level and this was overridden in a
subdirectory (area) where cgi output was to get passed through both
INCLUDES and DEFLATE. This may be related to the problem I was having
as reduplicating the issue with the output filter directives only
at the top level doesn't seem to work.
By using the addoutputfilterbytype directive I can just do INCLUDES
processing for cgi output that returns text/html (as opposed to
text/plain) and don't need to have diferent rules in different
directories.
So, I am not completely sure what is needed to make the problem show
up. I was able to get the problem to occur with just plain html
files (no cgi script) subject to a redirect in the subdirectory.
I am asking one of my users to see if this fixes the problem with
accessing the pages with IE.
Comment 12 Bruno Wolff III 2002-12-17 17:08:45 UTC
It turns out my new configuration didn't fix things.
http://www.schroepl.net/cgi-bin/http_trace.pl started reporting a
data returned size that matched the content length header, but
according to my access_log another 20 bytes were actually sent.
I noticed this when I got a report back from an IE user that
they still couldn't get compressed responses to work. So what I think
changed was the testing service, not the content being served.
Comment 13 Bruno Wolff III 2002-12-22 19:53:44 UTC
I think I have something that will point out where I am seeing the
problem. In mod_deflate.c there is code to not due compression for
subrequests. I think this check should be extended to not do it
for internal redirects. (I am not sure why the check that the request
doesn't already have a gzip encoding doesn't catch this.) I checked
r->next to see if there will be more processing after this request
is completed. I am not sure this is correct, because the .h that
defines record_rec has a comment that this link refers to external
requests. I suspect that that is a typo. Anyway the patch seems to
work for my problem. I will let you know if this fixes the IE problem
when I hear back from the person who has been having problems with
compressed responses.
The diff versus mod_deflate.c follows:
*** mod_deflate.c.orig  Sun Dec 22 13:53:19 2002
--- mod_deflate.c       Sun Dec 22 13:36:55 2002
***************
*** 260,266 ****
          const char *encoding, *accepts;

          /* only work on main request/no subrequests */
!         if (r->main) {
              ap_remove_output_filter(f);
              return ap_pass_brigade(f->next, bb);
          }
--- 260,266 ----
          const char *encoding, *accepts;

          /* only work on main request/no subrequests */
!         if (r->main || r->next) {
              ap_remove_output_filter(f);
              return ap_pass_brigade(f->next, bb);
          }
Comment 14 Bruno Wolff III 2002-12-23 15:12:16 UTC
I heard back from the IE user who was having problems and now that
there aren't an extra 20 bytes being tacked on to compressed
responses, things are working OK.
Comment 15 André Malo 2003-02-13 02:03:29 UTC
I'll try to have a closer look at the problem and the patch these days, but one
word anyway:

sending different headers for GET and HEAD is wrong. You should _not_ handle
both methods not differently. Just send the content regardless of the method.
Apache will do the right thing and discard the body if neccessary. But he's able
to maintain the Content-length header (or TRansfer-Encoding).
Comment 16 Bruno Wolff III 2003-02-13 02:28:11 UTC
Thanks for looking at this!
While working on this issue I did figure out that suppressing the
content for the head method was wrong. I haven't gotten around to
changing the code yet, but I plan to. The original idea was to save
resources by not doing database calls if the content was going to
be thrown away anyway, but it turned out that keeps the head method
from being as useful as it should. I also got confused by this as
I was expecting the content-length to be zero since there was no
body, not realizing that a gzip'd version of an empty content takes
up 20 bytes.
Get requests do seem to be broken though. And since mod_deflate checks
to make sure it isn't run twice, I am pretty sure there is something
wrong where I have patched the code. (Especially since it fixes things
for me.) I am just not sure that the fix is really correct for all
cases.
Comment 17 André Malo 2003-02-17 02:51:06 UTC
strange ...

Seems to happen only when redirecting to the cgi-handler. Can someone confirm or
disprove this?

Thanks anyway for your patience with us :)
Comment 18 André Malo 2003-02-18 00:39:06 UTC
errr, forget my last comment. I did the wrong tests ;-)

However, your patch worked around the actual problem, it did not solve it.
The problem was that after finalizing the (redirected) request, the original
request sent an(other) EOS bucket down the filter chain, which caused
mod_deflate to init zlib and exit zlib with the result of 20 extra bytes.

The following patch should solve the problem entirely:
<http://cvs.apache.org/viewcvs.cgi/httpd-2.0/server/util_filter.c.diff?r1=1.94&r2=1.95>

It's proposed for backport and may be in the next 2.0 release.

Thanks again for your patience and your detailed reports!
Comment 19 Bruno Wolff III 2003-02-18 01:42:29 UTC
I removed my patched mod_deflate and resynced with current CVS
and retested for the problem. It seems to be fixed now. Thanks!
Comment 20 André Malo 2003-02-22 20:30:50 UTC
FYI: The fix will be in 2.0.45.
Comment 21 Steven Grimm 2003-06-13 23:09:48 UTC
*** Bug 17629 has been marked as a duplicate of this bug. ***
Comment 22 Joe Orton 2003-07-21 14:09:41 UTC
*** Bug 14678 has been marked as a duplicate of this bug. ***
Comment 23 Ruediger Pluem 2008-01-07 08:36:31 UTC
*** Bug 14678 has been marked as a duplicate of this bug. ***