52328 – Massive garbage production observed when using the response writer

Bug 52328 - Massive garbage production observed when using the response writer

Summary: Massive garbage production observed when using the response writer

Status:	RESOLVED FIXED

Alias:	None

Product:	Tomcat 7
Classification:	Unclassified
Component:	Catalina (show other bugs)
Version:	7.0.23
Hardware:	PC All

Importance:	P2 regression (vote)
Target Milestone:	---
Assignee:	Tomcat Developers Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-12-14 11:03 UTC by Nikolay Diakov
Modified:	2012-01-13 13:07 UTC (History)
CC List:	2 users (show)

Attachments
Initial ideas (1.63 KB, patch) 2012-01-10 16:16 UTC, Mark Thomas	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nikolay Diakov 2011-12-14 11:03:38 UTC

In our application we used to use Tomcat 5, by writing our content directly to the response writer in our servlet:

Writer w = response.getWriter();
w.append(...)

We have regularly measured performance of our application under stress with multiple simultaneous requests per second.

We recently upgraded to Tomcat 7, and immediately observed a performance degradation of 10% on the request roundtrip on the same machine and the same tests, compared to our application deployed on Tomcat 5.

Profiling with JProfiler showed that we got massive amount of transient objects {{java.nio.HeapCharBuffer}} produced - 400Mb+ within a few minutes.

The way we are writing the response is by multiple statements of {{append}} of small strings or single chars, which used to work properly. For experiment, we wrapped the response writer with a {{BufferedWriter}} with a buffer of 64k. This brought our application's performance back to normal.

We see this as either a:
- anti-pattern usage since Tomcat 6/7, in which case please provide some documentation about that.
- bug in the C2BConverter that is applied per every call to the CoyoteWriter write methods, which generates the garbage buffers per every call.

We know that the response stream has a 8k buffer by default. Nevertheless, this buffer comes after the encoding step and as such it is too late to help with the garbage. Perhaps Tomcat can buffer before it encodes?

Still, so much transient objects for just a pattern of eligible use of the servlet API seems a bug to me.

Comment 1 Nikolay Diakov 2011-12-14 15:24:50 UTC

Some more information why we did what we did to work around this, and possibly hint how this can be resolved inside Tomcat: the documentation at 
http://docs.oracle.com/javase/6/docs/api/java/io/OutputStreamWriter.html
states there to wrap in a buffered writer in order to "to avoid frequent converter invocations", which have a cost. In this case, the observed garbage.

Comment 2 Mark Thomas 2011-12-23 21:48:56 UTC

r411577 looks to be the commit the triggered this issue

Comment 3 Mark Thomas 2012-01-10 10:38:46 UTC

Looking at this some more, I think we'll need to be careful around recycling, resetting and flushing but the most efficient way to fix this is to remove the calls to conv.flush() after every write. The C2BConvertor already has a buffer, so lets use it.

I'll start looking at a patch along these lines and doing some testing.

Comment 4 Konstantin Kolinko 2012-01-10 11:21:03 UTC

(In reply to comment #3)
> the most efficient way to fix this is to remove the
> calls to conv.flush() after every write. The C2BConvertor already has a buffer,
> so lets use it.

Interesting, but maybe cut 7.0.24 first? The change sounds risky so it will need some time to review.

I do not see what flush() calls you are talking about. Maybe that will be more clear from the patch.

Comment 5 Mark Thomas 2012-01-10 16:16:22 UTC

Created attachment 28132 [details]
Initial ideas

My initial idea (attached) was quite simple. There isn't the buffering I thought there was in the Converter so there is scope to add some buffering and further improve performance.

Comment 6 Mark Thomas 2012-01-10 16:19:31 UTC

Forgot to add the performance numbers for adding 100,000 single characters through a Writer.
Before
129ms without wrapping the Writer in a BufferedWriter
 37ms with wrapping the Writer in a BufferedWriter

After
77ms without wrapping
28ms with wrapping

I'm still looking at the best place to insert a Buffer before the conversion.

Comment 7 Mark Thomas 2012-01-10 18:00:37 UTC

If I insert a Buffer before the converter then the figures drop to:
20ms without wrapping
7ms with wrapping

Clearly the CoyoteWriter has some in-built overhead that adding a Buffer helps with. I don't want to provide a Buffer inside CoyoteWriter as that adds more complexity that I want to deal with right now.

Comment 8 Mark Thomas 2012-01-10 19:30:56 UTC

That initial patch is no good. It triggers a bunch of unit test failures. I am working on an improved patch now.

Comment 9 Mark Thomas 2012-01-10 21:05:26 UTC

Unit tests now work on trunk. I'm just running the TCK before I look at back-porting it.

Comment 10 Mark Thomas 2012-01-10 22:11:09 UTC

JSP and Servlet TCK pass for trunk (apart from the Servlet signature tests that are expected to fail since some changes have already been made for Servlet 3.1).

I'll look at back-porting these tomorrow.

Nikolay,
What are the chances of you being able to test your app with these changes? I can provide a snapshot release build if necessary.

Comment 11 Nikolay Diakov 2012-01-11 09:06:26 UTC

Yes, Mark, we can test it pretty quick. If you would provide us with a patched distribution package, we will test:
- Query response in our application (having removed our buffering wrapper) deployed in the patched Tomcat.
- Memory consumption (whether we get too many transient objects of the originally observed type).

Cheers,
  Nikolay

Comment 12 Mark Thomas 2012-01-11 13:10:07 UTC

Excellent. I'll put a dev build up on my people.apache.org space and provide a link when it is available.

For reference, both the flushing changes and adding an internal buffer a[ppear to help performance in all cases. Here are some figures from the latest version of the test case which I'll commit shortly:

Trunk without internal buffer but with flushing patch
Write length:      1, Buffered: n, Time: 86143369ns
Write length:      1, Buffered: y, Time: 18922557ns
Write length:     10, Buffered: n, Time:  4281841ns
Write length:     10, Buffered: y, Time:  1817266ns
Write length:    100, Buffered: n, Time:  1687881ns
Write length:    100, Buffered: y, Time:  1185908ns
Write length:   1000, Buffered: n, Time:  1243862ns
Write length:   1000, Buffered: y, Time:  1160367ns
Write length:  10000, Buffered: n, Time:  1147934ns
Write length:  10000, Buffered: y, Time:  1281448ns
Write length: 100000, Buffered: n, Time:  1204381ns
Write length: 100000, Buffered: y, Time:  1218129ns


Trunk with internal buffer and flushing patch
Write length:      1, Buffered: n, Time: 27663008ns
Write length:      1, Buffered: y, Time: 15468331ns
Write length:     10, Buffered: n, Time:  1718582ns
Write length:     10, Buffered: y, Time:  1938994ns
Write length:    100, Buffered: n, Time:  1110883ns
Write length:    100, Buffered: y, Time:  1233152ns
Write length:   1000, Buffered: n, Time:   933836ns
Write length:   1000, Buffered: y, Time:  1259017ns
Write length:  10000, Buffered: n, Time:  1024246ns
Write length:  10000, Buffered: y, Time:  1168732ns
Write length: 100000, Buffered: n, Time:  1015159ns
Write length: 100000, Buffered: y, Time:  1074527ns

7.0,x with no patches
Write length:      1, Buffered: n, Time: 137147272ns
Write length:      1, Buffered: y, Time:  18676195ns
Write length:     10, Buffered: n, Time:  10390213ns
Write length:     10, Buffered: y, Time:   2038188ns
Write length:    100, Buffered: n, Time:   2008351ns
Write length:    100, Buffered: y, Time:   1200196ns
Write length:   1000, Buffered: n, Time:   1261317ns
Write length:   1000, Buffered: y, Time:   1217747ns
Write length:  10000, Buffered: n, Time:   1117065ns
Write length:  10000, Buffered: y, Time:   1081037ns
Write length: 100000, Buffered: n, Time:   1471004ns
Write length: 100000, Buffered: y, Time:   1134857ns

Comment 13 Nikolay Diakov 2012-01-11 13:20:41 UTC

Numbers look great! Can't wait to try it out. 

If all well, we would appreciate information about planning/estimation when to expect a 7.0.x release with the patch inside.

Comment 14 Mark Thomas 2012-01-11 21:09:49 UTC

Give it 30 minutes and there should be a full Tomcat 7 distribution (src, bin, docs) uploaded to:
http://people.apache.org/~markt/dev/tomcat-7/v7.0.24-r1230243/

Please note the usual caveats for builds such as this:
- This is *NOT* an ASF release.
- This is a development build intended solely for testing this issue
- There has been zero testing of this build

Regarding a 7.0.24 release. I intend to roll it just as soon as the open 7.0.x bugs are all resolved . How long that takes depends on how many bugs are raised, how hard they are to fix and how much life gets in the way.

Hint: The more help there is investigating and fixing the currently open 7.0.x bugs (excluding enhancements) the quicker there will be a 7.0.24 release.

I am marking this as fixed as is usual once a patch has been applied to the release branch. If any issues are discovered, we can re-open the issue.

Comment 15 Nikolay Diakov 2012-01-12 10:01:51 UTC

Regarding the artifact, we will test it. Do I understand correctly that this artifact contains a current 7.0.24 (candidate) + the fix?

We will test this to the best of our procedure. For a clean cut (no noise from other stuff) relative performance measurement however, ideally we would have preferred to test a stable tag (e.g., 7.0.23) + the fix. Nevertheless, the degradation before showed quite a big number, so I expect to confirm a clear improvement from this fix.

The fix is not critical for our product - we have a sufficiently good workaround, although the additional flushing fix will further improve the situation. I'll note to watch for the 7.0.24 so we update to it in a subsequent release of our product.

Comment 16 Mark Thomas 2012-01-12 10:09:04 UTC

(In reply to comment #15)
> Regarding the artifact, we will test it. Do I understand correctly that this
> artifact contains a current 7.0.24 (candidate) + the fix?

Correct.

> We will test this to the best of our procedure. For a clean cut (no noise from
> other stuff) relative performance measurement however, ideally we would have
> preferred to test a stable tag (e.g., 7.0.23) + the fix. Nevertheless, the
> degradation before showed quite a big number, so I expect to confirm a clear
> improvement from this fix.

Fair point. Reviewing the  for 7.0.24 changelog, there are quite a few changes but none that jump out as having a performance impact so the results should be good enough. Let us know how you get on.

Comment 17 Nikolay Diakov 2012-01-13 13:07:55 UTC

Ok, so we confirm that our application + the patched tomcat 7 is back to the levels of performance of our application and the tomcat 5. We did not see any functional regressions either, so from our side it looks good. Our current workaround for the unpatched tomcat 7 yields the same numbers, so we are safe until a release comes out.

Thanks for the quick reaction!

You may want to add to the general pre-release QA procedure, some performance test similar to the unit tests you made, just to avoid bugs like this drag their dirty feet through 2 major releases :D.