Brief summary: if a request is compressed (Content-Encoding: gzip), has Content-Length corresponding to the compressed length, and if Apache's mod_deflate is configured to decompress such requests, then the servlet request input stream signals EOF at Content-Length bytes of decompressed content instead of returning the entire decompressed content. To reproduce: - Enable Apache mod_deflate request decompression: <Location /servlet/MyTest> SetInputFilter DEFLATE </Location> - Send a compressed request, e.g. compress a file with gzip and send it with cURL: gzip -9c some_file | curl -H 'Content-Encoding: gzip' --data-binary @- http://host/servlet/MyTest The servlet will get truncated data. It's ok if the servlet will see Content-Length different from the actual number of bytes in the request stream, servlets shouldn't trust Content-Length anyway, and under chunked encoding Content-Length is not there at all, so IMHO useless Content-Length value is a nonissue. The only workaround so far is to write a servlet filter to decompress requests, but that puts additional load on Tomcat and complicates web app configuration. Possibly related HTTP server bug: http://issues.apache.org/bugzilla/show_bug.cgi?id=23287 See my comment there (Michael Klepikov) for additional details.
I'm using such settings in XML-RPC system, using Apache 2.x, mod_jk 1.2.x and tomcat 3.3.2 (so with a not too old jtc) and didn't have problem if the provided content lenght on the client side is set to -1. In such case, JTC/Tomcat will be able to get the complete data stream. Take a look at the XML-RPC HEAD CommonsXmlRpcTransport.java, where you could see the following code : ByteArrayOutputStream lBo = new ByteArrayOutputStream(); GZIPOutputStream lGzo = new GZIPOutputStream(lBo); lGzo.write(request); lGzo.finish(); lGzo.close(); byte[] lArray = lBo.toByteArray(); method.setRequestBody(new ByteArrayInputStream(lArray)); method.setRequestContentLength(-1);
To Henri Gomez: I have no control over Content-Length that the client sends. The C++ program that sends the request uses Internet Explorer API (URLMON), it sets Content-Length to the actual number of bytes in the request content, and there is no way to change it.
This is more your client problem then mod_jk one. The Servlet spec is explicit about content-length, and we can not cheat on that.
The client sends correct Content-Length equal to the compressed request size, that's what it's supposed to be per HTTP 1.1, and I don't see where the client has a problem here. Apache correctly reads the request, and mod_deflate correctly decompresses it, only the servlet receives truncated content. Where does the servlet spec imply that behaviour? The original Content-Length header value cannot be trusted after passing through HTTP Server, specifically because filters like mod_deflate may render it invalid. Setting Content-Length to -1 might work as a workaround, but my point was that it should also work with a correct positive Content-Length. If I don't use mod_deflate's decompression and decompress with a servlet filter instead, the servlet gets complete content, while the Content-Length header of course remains with the original value, which is functionality-wise, but I would much rather use mod_deflate's decompression for scalability and load distribution reasons, and mod_deflate is likely faster than GZIPInputStream. I initially thought it might be a mod_deflate's problem, but an HTTP Server person (André Malo) said he is certain it should be fixed in mod_jk. I suppose he implied that instead of trusting Content-Length, there is another more reliable way to determine end of request stream from Apache. Please refer to the HTTP Server bug linked in the original description.
This is clearly a mod_deflate problem. If mod_deflate would only set the clength field in the request_rec to the correct uncompressed value then mod_jk will work correctly. Otherwise, mod_jk has no way of knowing that mod_deflate is going to be changing the number of bytes available on input.
As a user of mod_deflate and mod_jk, I'll play stupid for a moment: As long as the client Content-Length is either not present or correct, users of these two software components don't care who is at fault, mod_jk or mod_deflate -- we just want it fixed. That said, I understand that this is an interface between development groups in addition to between software components, so I quite understand a short game of "hot potato". All the same, it would be best if mod_jk and mod_deflate folk could work this out sooner rather than later...
To william.barker@wilshire.com: by nature of compressed content, mod_deflate cannot know decompressed length in advance when it just begins streaming the content to mod_jk. There has to be some kind of an end of stream indicator independent of any lengths known at the beginning of the exchange. Maybe it already exists, I do not know. Either way I fully agree with Jess Holle: it would be in everyone's best interest if mod_jk people discuss it directly with the HTTP Server people... Thanks.
It looks like the AJP protocol cannot stream request bodies of unknown length. There are two cases where this might happen: for a chunked request body, or if there are request input filters in use such as mod_deflate. The correct thing to do here in mod_jk is: if (r->proto_input_filters != r->input_filters || apr_table_get(r->headers_in, "Transfer-Encoding")) { /* refuse to handle the body */ }
Does this mean mod_jk will simply not accept mod_deflate'd or chunked uploads? If so, that would buy us nothing...
(In reply to comment #8) > It looks like the AJP protocol cannot stream request bodies of unknown length. The AJP protocol handles request bodies of unknown length perfectly well. It doesn't handle rogue modules like mod_deflate lying to it.
So does this really mean that mod_deflate should simply not give a content-length when it does not know it? That would seem (quite) reasonable (and correct).
(In reply to comment #11) > So does this really mean that mod_deflate should simply not give a > content-length when it does not know it? That would seem (quite) reasonable > (and correct). mod_deflate cannot change the content length, because (a) it doesn not know it before uncompressing the whole stream. It had to buffer the whole inflated content to determine a new length, which is not going to happen. (b) Anyway, the content length reflects the value sent by the client. mod_deflate is not the authority to change it. Note that CGIs are also broken by this behaviour, but there's no much we can do for it by now. Someone might write some day a file bucket type, so we can inflate the content into a file. My conclusion is, that the ajp handler just should unset the content length if the protocol depends on it (meaning "CL reflects the actual content sent to the servlet or be unset at all").
There is a fix for mod_deflate in httpd 2.2.6+. I'm now investigating, what we need to do to make it work with mod_jk. 1) Your example won't work, since mod_deflate can't inflate compressed content which uses FLaGs in the sense of RFC 1952. If I use gzip like you did to compress a file, it includes the file name as a flag and mod_deflate can not handle that. A better test case is cat myfile | gzip -9c | curl ... 2) In mod_jk we extract the Content-Length early in the content handler, which is before the filter is running. So we still get the incorrect length. Let's see, if we can do someting better about that...
The situation has been improved in r1649064. Unfortunately it doesn't seem to be possible to detect request body inflation by mod_deflate before mod_jk actually starts to read the body. Because it needs to send the request headers to the backend before reading the body, there's no easy way to detect the situation. We would need to implement some buffering and change the flow of processing in mod_jk quite a lot. I decided to implement a workaround: you can set the new Apache environment variable JK_IGNORE_CL instead, to tell mod_jk that it should ignore an existing Content-Length request header. All body data that can be read from the web server will then be send to the backend. No Content-Length header will be send to the backend. The environment variable can be set using mod_setenvif or mod_rewrite as usual. You should choose conditions for setting the variable that trigger for the requests, for which you have configured request body inflation by mod_deflate. Setting the variable for other requests as well should work, but might lead to less efficient behavior and maybe also bugs we are not yet aware of.