Bug 34526

Summary: Truncated content in decompressed requests from mod_deflate
Product: Tomcat Connectors Reporter: Michael Klepikov <mike_bos>
Component: mod_jkAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: jorton, lafeuil, nd
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: Sun   
OS: Solaris   

Description Michael Klepikov 2005-04-19 19:31:03 UTC
Brief summary: if a request is compressed (Content-Encoding: gzip), has
Content-Length corresponding to the compressed length, and if Apache's
mod_deflate is configured to decompress such requests, then the servlet request
input stream signals EOF at Content-Length bytes of decompressed content instead
of returning the entire decompressed content.

To reproduce:
- Enable Apache mod_deflate request decompression:
  <Location /servlet/MyTest>
    SetInputFilter DEFLATE
  </Location>
- Send a compressed request, e.g. compress a file with gzip and send it with cURL:
  gzip -9c some_file | curl -H 'Content-Encoding: gzip' --data-binary @-
http://host/servlet/MyTest

The servlet will get truncated data.

It's ok if the servlet will see Content-Length different from the actual number
of bytes in the request stream, servlets shouldn't trust Content-Length anyway,
and under chunked encoding Content-Length is not there at all, so IMHO useless
Content-Length value is a nonissue.

The only workaround so far is to write a servlet filter to decompress requests,
but that puts additional load on Tomcat and complicates web app configuration.

Possibly related HTTP server bug:
http://issues.apache.org/bugzilla/show_bug.cgi?id=23287
See my comment there (Michael Klepikov) for additional details.
Comment 1 Henri Gomez 2005-04-20 10:54:12 UTC
I'm using such settings in XML-RPC system, using Apache 2.x, mod_jk 1.2.x and
tomcat 3.3.2 (so with a not too old jtc) and didn't have problem if the provided
content lenght on the client side is set to -1.

In such case, JTC/Tomcat will be able to get the complete data stream.

Take a look at the XML-RPC HEAD CommonsXmlRpcTransport.java, where you could see
the following code :

        	ByteArrayOutputStream lBo = new ByteArrayOutputStream();
        	GZIPOutputStream lGzo = new GZIPOutputStream(lBo);
        	lGzo.write(request);
        	lGzo.finish();        	
        	lGzo.close();        	
        	byte[] lArray = lBo.toByteArray();
        	method.setRequestBody(new ByteArrayInputStream(lArray));
        	method.setRequestContentLength(-1);
Comment 2 Michael Klepikov 2005-04-21 18:16:37 UTC
To Henri Gomez: I have no control over Content-Length that the client sends. The
C++ program that sends the request uses Internet Explorer API (URLMON), it sets
Content-Length to the actual number of bytes in the request content, and there
is no way to change it.
Comment 3 Mladen Turk 2005-07-03 10:44:32 UTC
This is more your client problem then mod_jk one.
The Servlet spec is explicit about content-length, and we can
not cheat on that.
Comment 4 Michael Klepikov 2005-07-09 06:08:16 UTC
The client sends correct Content-Length equal to the compressed request size,
that's what it's supposed to be per HTTP 1.1, and I don't see where the client
has a problem here. Apache correctly reads the request, and mod_deflate
correctly decompresses it, only the servlet receives truncated content. Where
does the servlet spec imply that behaviour? The original Content-Length header
value cannot be trusted after passing through HTTP Server, specifically because
filters like mod_deflate may render it invalid.

Setting Content-Length to -1 might work as a workaround, but my point was that
it should also work with a correct positive Content-Length.

If I don't use mod_deflate's decompression and decompress with a servlet filter
instead, the servlet gets complete content, while the Content-Length header of
course remains with the original value, which is functionality-wise, but I would
much rather use mod_deflate's decompression for scalability and load
distribution reasons, and mod_deflate is likely faster than GZIPInputStream.

I initially thought it might be a mod_deflate's problem, but an HTTP Server
person (André Malo) said he is certain it should be fixed in mod_jk. I suppose
he implied that instead of trusting Content-Length, there is another more
reliable way to determine end of request stream from Apache. Please refer to the
HTTP Server bug linked in the original description.
Comment 5 william.barker 2005-07-09 22:37:54 UTC
This is clearly a mod_deflate problem.  If mod_deflate would only set the 
clength field in the request_rec to the correct uncompressed value then mod_jk 
will work correctly.  Otherwise, mod_jk has no way of knowing that mod_deflate 
is going to be changing the number of bytes available on input.
Comment 6 Jess Holle 2005-07-09 23:37:58 UTC
As a user of mod_deflate and mod_jk, I'll play stupid for a moment:

As long as the client Content-Length is either not present or correct, users of
these two software components don't care who is at fault, mod_jk or mod_deflate
-- we just want it fixed.

That said, I understand that this is an interface between development groups in
addition to between software components, so I quite understand a short game of
"hot potato".  All the same, it would be best if mod_jk and mod_deflate folk
could work this out sooner rather than later...
Comment 7 Michael Klepikov 2005-07-11 03:44:56 UTC
To william.barker@wilshire.com: by nature of compressed content, mod_deflate
cannot know decompressed length in advance when it just begins streaming the
content to mod_jk. There has to be some kind of an end of stream indicator
independent of any lengths known at the beginning of the exchange. Maybe it
already exists, I do not know. Either way I fully agree with Jess Holle: it
would be in everyone's best interest if mod_jk people discuss it directly with
the HTTP Server people... Thanks.
Comment 8 Joe Orton 2005-08-09 15:41:25 UTC
It looks like the AJP protocol cannot stream request bodies of unknown length. 
There are two cases where this might happen: for a chunked request body, or if
there are request input filters in use such as mod_deflate.

The correct thing to do here in mod_jk is:

 if (r->proto_input_filters != r->input_filters
     || apr_table_get(r->headers_in, "Transfer-Encoding")) {
     /* refuse to handle the body */
 }
Comment 9 Jess Holle 2005-08-09 15:48:25 UTC
Does this mean mod_jk will simply not accept mod_deflate'd or chunked uploads? 
If so, that would buy us nothing...
Comment 10 william.barker 2005-08-09 17:04:35 UTC
(In reply to comment #8)
> It looks like the AJP protocol cannot stream request bodies of unknown 
length. 

The AJP protocol handles request bodies of unknown length perfectly well.  It 
doesn't handle rogue modules like mod_deflate lying to it.
Comment 11 Jess Holle 2005-08-09 17:12:59 UTC
So does this really mean that mod_deflate should simply not give a
content-length when it does not know it?  That would seem (quite) reasonable
(and correct).
Comment 12 André Malo 2005-08-30 09:38:13 UTC
(In reply to comment #11)
> So does this really mean that mod_deflate should simply not give a
> content-length when it does not know it?  That would seem (quite) reasonable
> (and correct).

mod_deflate cannot change the content length, because

(a) it doesn not know it before uncompressing the whole stream. It had to buffer
the whole inflated content to determine a new length, which is not going to happen.
(b) Anyway, the content length reflects the value sent by the client.
mod_deflate is not the authority to change it.

Note that CGIs are also broken by this behaviour, but there's no much we can do
for it by now. Someone might write some day a file bucket type, so we can
inflate the content into a file.

My conclusion is, that the ajp handler just should unset the content length if
the protocol depends on it (meaning "CL reflects the actual content sent to the
servlet or be unset at all").
Comment 13 Rainer Jung 2008-01-03 10:44:28 UTC
There is a fix for mod_deflate in httpd 2.2.6+. I'm now investigating, what we
need to do to make it work with mod_jk.

1) Your example won't work, since mod_deflate can't inflate compressed content
which uses FLaGs in the sense of RFC 1952. If I use gzip like you did to
compress a file, it includes the file name as a flag and mod_deflate can not
handle that. A better test case is

cat myfile | gzip -9c | curl ...

2) In mod_jk we extract the Content-Length early in the content handler, which
is before the filter is running. So we still get the incorrect length. Let's
see, if we can do someting better about that...
Comment 14 Rainer Jung 2015-01-02 15:21:36 UTC
The situation has been improved in r1649064.

Unfortunately it doesn't seem to be possible to detect request body inflation by mod_deflate before mod_jk actually starts to read the body. Because it needs to send the request headers to the backend before reading the body, there's no easy way to detect the situation. We would need to implement some buffering and change the flow of processing in mod_jk quite a lot.

I decided to implement a workaround: you can set the new Apache environment variable JK_IGNORE_CL instead, to tell mod_jk that it should ignore an existing Content-Length request header.

All body data that can be read from the web server will then be send to the backend. No Content-Length header will be send to the backend.

The environment variable can be set using mod_setenvif or mod_rewrite as usual. You should choose conditions for setting the variable that trigger for the requests, for which you have configured request body inflation by mod_deflate.

Setting the variable for other requests as well should work, but might lead to less efficient behavior and maybe also bugs we are not yet aware of.