In our application we used to use Tomcat 5, by writing our content directly to the response writer in our servlet: Writer w = response.getWriter(); w.append(...) We have regularly measured performance of our application under stress with multiple simultaneous requests per second. We recently upgraded to Tomcat 7, and immediately observed a performance degradation of 10% on the request roundtrip on the same machine and the same tests, compared to our application deployed on Tomcat 5. Profiling with JProfiler showed that we got massive amount of transient objects {{java.nio.HeapCharBuffer}} produced - 400Mb+ within a few minutes. The way we are writing the response is by multiple statements of {{append}} of small strings or single chars, which used to work properly. For experiment, we wrapped the response writer with a {{BufferedWriter}} with a buffer of 64k. This brought our application's performance back to normal. We see this as either a: - anti-pattern usage since Tomcat 6/7, in which case please provide some documentation about that. - bug in the C2BConverter that is applied per every call to the CoyoteWriter write methods, which generates the garbage buffers per every call. We know that the response stream has a 8k buffer by default. Nevertheless, this buffer comes after the encoding step and as such it is too late to help with the garbage. Perhaps Tomcat can buffer before it encodes? Still, so much transient objects for just a pattern of eligible use of the servlet API seems a bug to me.
Some more information why we did what we did to work around this, and possibly hint how this can be resolved inside Tomcat: the documentation at http://docs.oracle.com/javase/6/docs/api/java/io/OutputStreamWriter.html states there to wrap in a buffered writer in order to "to avoid frequent converter invocations", which have a cost. In this case, the observed garbage.
r411577 looks to be the commit the triggered this issue
Looking at this some more, I think we'll need to be careful around recycling, resetting and flushing but the most efficient way to fix this is to remove the calls to conv.flush() after every write. The C2BConvertor already has a buffer, so lets use it. I'll start looking at a patch along these lines and doing some testing.
(In reply to comment #3) > the most efficient way to fix this is to remove the > calls to conv.flush() after every write. The C2BConvertor already has a buffer, > so lets use it. Interesting, but maybe cut 7.0.24 first? The change sounds risky so it will need some time to review. I do not see what flush() calls you are talking about. Maybe that will be more clear from the patch.
Created attachment 28132 [details] Initial ideas My initial idea (attached) was quite simple. There isn't the buffering I thought there was in the Converter so there is scope to add some buffering and further improve performance.
Forgot to add the performance numbers for adding 100,000 single characters through a Writer. Before 129ms without wrapping the Writer in a BufferedWriter 37ms with wrapping the Writer in a BufferedWriter After 77ms without wrapping 28ms with wrapping I'm still looking at the best place to insert a Buffer before the conversion.
If I insert a Buffer before the converter then the figures drop to: 20ms without wrapping 7ms with wrapping Clearly the CoyoteWriter has some in-built overhead that adding a Buffer helps with. I don't want to provide a Buffer inside CoyoteWriter as that adds more complexity that I want to deal with right now.
That initial patch is no good. It triggers a bunch of unit test failures. I am working on an improved patch now.
Unit tests now work on trunk. I'm just running the TCK before I look at back-porting it.
JSP and Servlet TCK pass for trunk (apart from the Servlet signature tests that are expected to fail since some changes have already been made for Servlet 3.1). I'll look at back-porting these tomorrow. Nikolay, What are the chances of you being able to test your app with these changes? I can provide a snapshot release build if necessary.
Yes, Mark, we can test it pretty quick. If you would provide us with a patched distribution package, we will test: - Query response in our application (having removed our buffering wrapper) deployed in the patched Tomcat. - Memory consumption (whether we get too many transient objects of the originally observed type). Cheers, Nikolay
Excellent. I'll put a dev build up on my people.apache.org space and provide a link when it is available. For reference, both the flushing changes and adding an internal buffer a[ppear to help performance in all cases. Here are some figures from the latest version of the test case which I'll commit shortly: Trunk without internal buffer but with flushing patch Write length: 1, Buffered: n, Time: 86143369ns Write length: 1, Buffered: y, Time: 18922557ns Write length: 10, Buffered: n, Time: 4281841ns Write length: 10, Buffered: y, Time: 1817266ns Write length: 100, Buffered: n, Time: 1687881ns Write length: 100, Buffered: y, Time: 1185908ns Write length: 1000, Buffered: n, Time: 1243862ns Write length: 1000, Buffered: y, Time: 1160367ns Write length: 10000, Buffered: n, Time: 1147934ns Write length: 10000, Buffered: y, Time: 1281448ns Write length: 100000, Buffered: n, Time: 1204381ns Write length: 100000, Buffered: y, Time: 1218129ns Trunk with internal buffer and flushing patch Write length: 1, Buffered: n, Time: 27663008ns Write length: 1, Buffered: y, Time: 15468331ns Write length: 10, Buffered: n, Time: 1718582ns Write length: 10, Buffered: y, Time: 1938994ns Write length: 100, Buffered: n, Time: 1110883ns Write length: 100, Buffered: y, Time: 1233152ns Write length: 1000, Buffered: n, Time: 933836ns Write length: 1000, Buffered: y, Time: 1259017ns Write length: 10000, Buffered: n, Time: 1024246ns Write length: 10000, Buffered: y, Time: 1168732ns Write length: 100000, Buffered: n, Time: 1015159ns Write length: 100000, Buffered: y, Time: 1074527ns 7.0,x with no patches Write length: 1, Buffered: n, Time: 137147272ns Write length: 1, Buffered: y, Time: 18676195ns Write length: 10, Buffered: n, Time: 10390213ns Write length: 10, Buffered: y, Time: 2038188ns Write length: 100, Buffered: n, Time: 2008351ns Write length: 100, Buffered: y, Time: 1200196ns Write length: 1000, Buffered: n, Time: 1261317ns Write length: 1000, Buffered: y, Time: 1217747ns Write length: 10000, Buffered: n, Time: 1117065ns Write length: 10000, Buffered: y, Time: 1081037ns Write length: 100000, Buffered: n, Time: 1471004ns Write length: 100000, Buffered: y, Time: 1134857ns
Numbers look great! Can't wait to try it out. If all well, we would appreciate information about planning/estimation when to expect a 7.0.x release with the patch inside.
Give it 30 minutes and there should be a full Tomcat 7 distribution (src, bin, docs) uploaded to: http://people.apache.org/~markt/dev/tomcat-7/v7.0.24-r1230243/ Please note the usual caveats for builds such as this: - This is *NOT* an ASF release. - This is a development build intended solely for testing this issue - There has been zero testing of this build Regarding a 7.0.24 release. I intend to roll it just as soon as the open 7.0.x bugs are all resolved . How long that takes depends on how many bugs are raised, how hard they are to fix and how much life gets in the way. Hint: The more help there is investigating and fixing the currently open 7.0.x bugs (excluding enhancements) the quicker there will be a 7.0.24 release. I am marking this as fixed as is usual once a patch has been applied to the release branch. If any issues are discovered, we can re-open the issue.
Regarding the artifact, we will test it. Do I understand correctly that this artifact contains a current 7.0.24 (candidate) + the fix? We will test this to the best of our procedure. For a clean cut (no noise from other stuff) relative performance measurement however, ideally we would have preferred to test a stable tag (e.g., 7.0.23) + the fix. Nevertheless, the degradation before showed quite a big number, so I expect to confirm a clear improvement from this fix. The fix is not critical for our product - we have a sufficiently good workaround, although the additional flushing fix will further improve the situation. I'll note to watch for the 7.0.24 so we update to it in a subsequent release of our product.
(In reply to comment #15) > Regarding the artifact, we will test it. Do I understand correctly that this > artifact contains a current 7.0.24 (candidate) + the fix? Correct. > We will test this to the best of our procedure. For a clean cut (no noise from > other stuff) relative performance measurement however, ideally we would have > preferred to test a stable tag (e.g., 7.0.23) + the fix. Nevertheless, the > degradation before showed quite a big number, so I expect to confirm a clear > improvement from this fix. Fair point. Reviewing the for 7.0.24 changelog, there are quite a few changes but none that jump out as having a performance impact so the results should be good enough. Let us know how you get on.
Ok, so we confirm that our application + the patched tomcat 7 is back to the levels of performance of our application and the tomcat 5. We did not see any functional regressions either, so from our side it looks good. Our current workaround for the unpatched tomcat 7 yields the same numbers, so we are safe until a release comes out. Thanks for the quick reaction! You may want to add to the general pre-release QA procedure, some performance test similar to the unit tests you made, just to avoid bugs like this drag their dirty feet through 2 major releases :D.