Bug 52121 - corrupt gzip output when compression=on in server.xml
corrupt gzip output when compression=on in server.xml
Status: RESOLVED FIXED
Product: Tomcat 6
Classification: Unclassified
Component: Connectors
6.0.33
PC Linux
: P2 normal (vote)
: default
Assigned To: Tomcat Developers Mailing List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2011-11-01 22:15 UTC by David Marcks
Modified: 2011-11-12 05:31 UTC (History)
0 users



Attachments
Gzip Corruption Repro WAR (61.07 KB, application/octet-stream)
2011-11-01 22:15 UTC, David Marcks
Details
Isolated Reproduction without using Tomcat (60.60 KB, application/octet-stream)
2011-11-01 22:18 UTC, David Marcks
Details
2011-11-07_tc6_FlushableGZIPOutputStream.patch (3.69 KB, patch)
2011-11-07 02:45 UTC, Konstantin Kolinko
Details | Diff
2011-11-11_tc6_FlushableGZIPOutputStream-v2.patch (4.44 KB, patch)
2011-11-11 02:38 UTC, Konstantin Kolinko
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Marcks 2011-11-01 22:15:16 UTC
Created attachment 27886 [details]
Gzip Corruption Repro WAR

This appears to be same issue as posted by Christopher Taylor
Jun 30, 2011 at 8:05 am to Tomcat Users. The subject line has been preserved above.

As mentioned in the post, this seems to have arisen with this issue:
https://issues.apache.org/bugzilla/show_bug.cgi?id=48738

The change introduced this class: FlushableGZIPOutputStream. When we started using this, we noticed that certain string flush sequences would result in corrupt gzip output. 

The attached WAR provides a repro. The sequence of writes to the output stream that would’ve been done by our server is mimicked in this case using a file system. There are 40 files each representing a flush. URL:

"http://localhost:8080/TomcatBugReportWAR"

I’ve allowed the ‘start’ and ‘end’ request parameters to control to the starting and ending points. I’ve also allowed an ‘append’ request parameter dump anything to the outputstream.

The defaults are 0, 40, and “” for start end and append respectively.

The following query string appears particularly interesting as it seems to be the smallest sequence that causes the corruption and the subsequent string that is appended after the output is the only thing that appears corrupted:

"http://localhost:8080/TomcatBugReportWAR?start=19&end=30&append=something”

It’s also possible to reproduce this using only the FlushableGZIPOutputStream without using Tomcat at all. See attached JAR.
Comment 1 David Marcks 2011-11-01 22:18:25 UTC
Created attachment 27887 [details]
Isolated Reproduction without using Tomcat
Comment 2 Mark Thomas 2011-11-03 23:14:00 UTC
Many thanks for the excellent test case. It looks like it was very difficult to track down. We really do appreciate your efforts on this.

I have converted the stand-alone reproduction JAR to a Tomcat test case. It isn't yet in the standard test suite as it currently always fails.

I noticed that the following is sufficient to reproduce the issue:
- send parts 19 to 28, flush, 29, flush, close

I have therefore merged parts 19 to 28 for the test case.

The failure is a CRC error.

At this point this looks like a JVM bug.

We don't normally address JVM bugs in the Tomcat codebase but if we can figure out exactly what causes this bug (it seems to be related to a very precise state) then we might be able to detect the state and prevent the flush that triggers the bug. Failing that, it should be possible to make use of the flushable GZIP filter configurable.

As a workaround, you may be able to add an explicit flush() to the page(s) where this occurs.
Comment 3 Konstantin Kolinko 2011-11-03 23:53:23 UTC
Note that you can run the test by adding the following line to the build.properties file:
test.entry=org.apache.coyote.http11.filters.TesterFlushableGZIPOutputStream

Just for reference:
If I run the test through Ant it fails in an odd way:

[[[
Testsuite: org.apache.coyote.http11.filters.TesterFlushableGZIPOutputStream
Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0,031 sec

Testcase: testBug52121 took 0,016 sec
	Caused an ERROR
null
java.lang.NullPointerException
	at org.apache.catalina.util.IOTools.flow(IOTools.java:74)
	at org.apache.catalina.util.IOTools.flow(IOTools.java:85)
	at org.apache.coyote.http11.filters.TesterFlushableGZIPOutputStream.testBug52121(TesterFlushableGZIPOutputStream.java:47)
]]]

I cannot explain this NPE.

If I run it as JUnit test from with Eclipse IDE I get the IO error:

[[[
java.io.IOException: Corrupt GZIP trailer
	at java.util.zip.GZIPInputStream.readTrailer(GZIPInputStream.java:203)
	at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:94)
	at java.io.FilterInputStream.read(FilterInputStream.java:90)
	at org.apache.catalina.util.IOTools.flow(IOTools.java:74)
	at org.apache.catalina.util.IOTools.flow(IOTools.java:85)
	at org.apache.coyote.http11.filters.TesterFlushableGZIPOutputStream.testBug52121(TesterFlushableGZIPOutputStream.java:63)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
]]]

The sources are trunk@1197344, both were run with jdk 6u29 (build 1.6.0_29-b11) on Windows XP 32-bit.
Comment 4 Konstantin Kolinko 2011-11-04 01:35:22 UTC
I think I fixed it with r1197382.


Regarding the NPE when testing with Ant: it is a fault in the test case.
The resources are not copied when compiling the test classes. Thus classloader.getResourceAsStream() does not find the resource and returns null.
Comment 5 Mark Thomas 2011-11-04 10:13:57 UTC
Unfortunately, that hasn't fixed it. It just avoids the call that triggers the problem. I've added an additional write to expose the issue again. I also found a way to crash the JVM. This makes me more convinced this is as JVM bug.
Comment 6 Konstantin Kolinko 2011-11-04 12:08:34 UTC
Several additional notes

A link to the thread by Christopher Taylor mentioned by OP:
http://tomcat.markmail.org/thread/v56xpi5uz4end7cd

Note, that the Christopher Taylor's thread contains an attachment but only containing the corrupted gzip stream.

1. I can observe the CRC error with 3-rd party tool, 7-zip.

It cannot decompress the file in Christopher Taylor's message, nether it can the one produced by the test case for this issue before I committed my changes. It reports CRC error.

Christopher Taylor's file was produced by Tomcat 7.0.14.

Thus not only Java report that the stream is broken.

2. According to the sources of deflate.c of zlib library that can be found in many placed, changing compression level which is performed by "deflateParams" call indeed causes deflate(strm, Z_PARTIAL_FLUSH) flushing the buffer.

That is in some old version of the code. In more recent ones it is just a deflate(strm, Z_BLOCK).


3. Looking at the content of the streams produced by compression. Both the stream in Christopher Taylor's message and the one produced by the JUnit test case for this bug have uncompressed data near the end of the stream.

That is "</id><version>"... or "<div "... respectively - about 200 bytes of plain text.

It just means that the flush in deflateParams() call failed to flush the whole stream.

That happens when the output buffer used by deflate.c (that is configured by 'size' parameter in GZIPOutputStream constructor) is too small. There is a loop that deals with this in DeflaterOutputStream.deflate(). Because the effective compression level at that time is Deflater.NO_COMPRESSION the data is flushed uncompressed.

The following follows:
1) If I increase size of the buffer by the following change in FlushableGZIPOutputStream constructor, the problem reported here disappears:

-        super(os);
+        super(os, 64*1024);


4. The Java bugs 4255743 and 4813885 that are mentioned in the Javadoc for the FlushableGZIPOutputStream class are both fixed in Java 7.

There is an additional boolean argument to GZIPOutputStream and DeflaterOutputStream constructors in Java 7, "syncFlush". If it is set to true, then calling mere flush() results in flushing the compressor.

5. Some documentation mentions that there is a trick in gzip when partial flush is performed. It sends either one block or two blocks.

http://www.bolet.org/~pornin/deflate-flush.html
Comment 7 Konstantin Kolinko 2011-11-04 14:14:25 UTC
Further fixed in r1197578

I removed the trick with empty byte array and direct call to def.setInput(). Calling setInput() with empty data never happens in the standard write() method.
Comment 8 Mark Thomas 2011-11-04 19:57:40 UTC
The latest fix looks good to me. I have applied it to 7.0.x and it will be included in 7.0.23 onwards.

Moving this issue to 6.0.x as it also needs to be fixed there.
Comment 9 Konstantin Kolinko 2011-11-07 02:45:47 UTC
Created attachment 27905 [details]
2011-11-07_tc6_FlushableGZIPOutputStream.patch

Patch for Tomcat 6.0
Comment 10 Konstantin Kolinko 2011-11-10 22:36:52 UTC
Applied to 6.0 with r1200603 and will be in 6.0.34.
Comment 11 Konstantin Kolinko 2011-11-11 00:11:03 UTC
Reverted in 6.0 in r1200620. It is broken.
Browsers display empty page with compression="on".
Wget says "No data received".
As if connection has been closed.


Debugging with trunk, there is problem in OutputBuffer.close().
It does:
        doFlush(false);
        closed = true;

It closes underlying stream without closing FlushableGZIPOutputStream first.
Consequently the attempt to write out the last byte at FlushableGZIPOutputStream#close() fails with an IOException.
Comment 12 Konstantin Kolinko 2011-11-11 02:28:26 UTC
The issue in comment 11 fixed in trunk and 7.0 with r1200696 and r1200698 respectively, will be in 7.0.23.
Comment 13 Konstantin Kolinko 2011-11-11 02:38:54 UTC
Created attachment 27924 [details]
2011-11-11_tc6_FlushableGZIPOutputStream-v2.patch

Updated patch for 6.0.x.
Comment 14 Konstantin Kolinko 2011-11-11 04:34:39 UTC
(In reply to comment #11)
>> Debugging with trunk, there is problem in OutputBuffer.close().
>> It does:
>>         doFlush(false);
>>         closed = true;
>> 
>> It closes underlying stream without closing FlushableGZIPOutputStream first.
>> Consequently the attempt to write out the last byte at
>> FlushableGZIPOutputStream#close() fails with an IOException.

Just to clear misinformation: the statements quoted above are wrong.

It behaved as if the stream were closed, but the real cause was finished Deflater. It has nothing to do with the underlying buffer and the code fragment above. See r1200696
Comment 15 Konstantin Kolinko 2011-11-12 05:31:34 UTC
Fixed in 6.0 with r1201193 and will be in 6.0.34.