43925 – org.apache.jasper.runtime.BodyContentImpl causing huge memory allocations

Bug 43925 - org.apache.jasper.runtime.BodyContentImpl causing huge memory allocations

Summary: org.apache.jasper.runtime.BodyContentImpl causing huge memory allocations

Status:	RESOLVED FIXED

Alias:	None

Product:	Tomcat 8
Classification:	Unclassified
Component:	Jasper (show other bugs)
Version:	8.0.9
Hardware:	All All

Importance:	P3 enhancement with 1 vote (vote)
Target Milestone:	----
Assignee:	Tomcat Developers Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-11-21 06:23 UTC by Brian Remmington
Modified:	2018-05-04 09:29 UTC (History)
CC List:	1 user (show)

Attachments
Proposed changes to BodyContentImpl (14.49 KB, application/java-archive) 2008-02-08 07:33 UTC, Brian Remmington	Details
Test case to compare default and suggested implementation (8.98 KB, application/x-bzip) 2014-07-22 10:07 UTC, Issa Gorissen	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Brian Remmington 2007-11-21 06:23:58 UTC

BodyContentImpl buffers all output from a custom tag ready to be written when
the tag execution ends. However, the way in which it grows this buffer is
extremely inefficient and has two undesirable effects:

- garbage collection is triggered very frequently to tidy up the waste.
- CPU load ramps up as large, unnecessary array copies take place.

All that's needed is a more intelligent buffer-management algorithm. I have
rewritten this class and can forward it if that would be useful (can't see a way
of attaching it here).

Comment 1 Mark Thomas 2007-11-23 12:11:42 UTC

There is an option to attach patches just above comment box on the bugzilla page
for each bug.

Comment 2 Brian Remmington 2008-02-08 07:33:57 UTC

Created attachment 21500 [details]
Proposed changes to BodyContentImpl

The attachment "buzilla43925.jar" contains a new implementation of the
BodyContentImpl class (named "NewBodyContentImpl") and a new class on which it
depends named "CharBuffer". I have also included two JUnit test cases.

This new implementation performs 45% faster than the current implementation on
my test machine and is substantially more efficient in memory usage, causing no
character arrays to be left to the garbage collector unnecessarily.

I would greatly appreciate it if people would review this code, and let me have
any comments or suggestions.

Comment 3 Remy Maucherat 2008-02-10 18:20:23 UTC

Ok, so I not sure I got everything, but:
- I don't see where memory usage is bounded in the code (if it's not, you should
compare performance with the unbounded mode of Jasper, which will never
reallocate anything); that's a problem
- The opportunity is the ability to share the buffer list per thread, so the
CharBuffer.bufList should be ThreadLocal<LinkedList> (and should be limited in
size, which should solve item 1); as a configuration option, it could be a
concurrent static structure, for use with thread intensive connectors
- CharBuffer.toArray is expensive (no other option, though), so you should try
to show a (real) test result

Comment 4 Brian Remmington 2008-02-12 01:44:29 UTC

Thanks very much, Remy.

> - I don't see where memory usage is bounded in the code (if it's not, you should
> compare performance with the unbounded mode of Jasper, which will never
> reallocate anything); that's a problem

Perhaps you could help me here. The memory usage doesn't appear to be bounded in
the current implementation of BodyContentImpl which *always* appears to
reallocate exponentially-larger blocks (in the reAllocBuff method at the
bottom). I must be missing something, but am not sure what. I'm looking at the
head of 5.5, which is the version affected by this change.


> - The opportunity is the ability to share the buffer list per thread, so the
> CharBuffer.bufList should be ThreadLocal<LinkedList> (and should be limited in
> size, which should solve item 1); as a configuration option, it could be a
> concurrent static structure, for use with thread intensive connectors

This would require CharBuffer objects to be pooled to be of use. At the moment
they aren't (and neither are BodyContentImpl objects, as far as I can see). If I
understand you correctly, this would have the undesirable effect of holding onto
blocks of memory once allocated rather than making them available for garbage
collection. This would certainly require a bounded memory pool approach, but its
debatable whether that would be preferable. It would certainly be a bigger change.

> - CharBuffer.toArray is expensive (no other option, though), so you should try
> to show a (real) test result

I will put together a suitable test case for this. Is there a set of test cases
for the current implementation? If so I could just add new ones to it.


Regards
Brian

Comment 5 Remy Maucherat 2008-02-14 05:35:08 UTC

About the upper memory bound, I had missed the clear method. I thought the
original purpose was to keep the list of buffers around, now it looks to me the
memory usage is equivalent to the LIMIT_BUFFER mode.

Comment 6 Thomas Jonsson 2008-04-02 23:42:44 UTC

What is the status of this "bug"? This causes problems in our jsf based application with a couple of hundreds users. I tried Brians fix with no improvement, memory is still occupied and not gc:d. However, I deployed the application in Glassfish and there the charcater array gets gc:d. Any ideas what to do?

Comment 7 Tim Funk 2008-04-03 04:07:15 UTC


*** This bug has been marked as a duplicate of bug 37793 ***

Comment 8 Brian Remmington 2008-04-06 00:10:46 UTC

This is not a duplicate of 37793.

That one is related to the unbounded growth of cb without the ability for the garbage collector to kick in to collect it. The LIMIT_BUFFER option goes some way to resolve that problem. Thomas (comment #6) does seem to be suffering from 37793 rather than this one.

*This* problem, however, is about the way in which, while cb grows, large numbers of exponentially-larger char arrays are reallocated, copied, and discarded.

Every time chars are written that cause cb to grow, the following process occurs:

a. Allocate a new char array that is twice as large as the current "cb";
b. Copy the contents of cb into the new array;
c. Assign the new array to cb;
d. Garbage collector collects the old cb array.

This, I think, is obviously inefficient. It fills up the heap with large, unwanted character arrays and forces the garbage collector to activate more frequently, thus reducing the CPU available for doing useful work. It also uses CPU to copy exponentially-growing arrays from one block of memory to another. It is all unnecessary.

My suggested resolution is to improve the way this buffering is managed. No array copies occur. No unnecessarily large heap allocations occur. Nothing is left to the garbage collector that doesn't have to be. The performance of this part of the code is improved. The drain that this code places on system resources is reduced.

Given that widespread use of various tag libraries, I still feel that this is a very worthwhile change.

Regards
Brian

Comment 9 Mark Thomas 2011-12-20 20:35:50 UTC

This Tomcat 5 enhancement request has been moved to Tomcat 7 (the latest version) since Tomcat 5 development is limited and focussed on bugs and security issues whereas Tomcat 7 is still seeing new feature development.

Comment 10 Issa Gorissen 2014-06-23 15:29:42 UTC

Hi,

For this issue,

- is there already a chosen new architecture for the new version of the buffering ?

or

- should it be discussed ?

or

- you let the numbers speak for any new implementation ?


I'm willing to spend time on this for Tomcat 7, so answers from Tomcat devs are expected.

Thx
--
Issa

Comment 11 Mark Thomas 2014-06-23 20:16:54 UTC

I don't recall any solutions being discussed but you should search the archives and review any discussions that there may have been about this feature.

It is up to you whether to discuss it first. Generally, the more complex and invasive the change the more useful early discussion is.

A patch for this enhancement will be judged like any other. It isn't just about raw performance we also take into account the complexity of the resulting code, the invasiveness of the patch, the usefulness of the feature and so on.

Larger changes are best reviewed as a series of smaller patches. See my WebSocket compression changes earlier today as an example of breaking a large patch down into a series of smaller patches to make it easier to review.

Note that you should provide patches against tomcat trunk and they will then be back-ported to the release branches as appropriate.

Comment 12 Issa Gorissen 2014-06-27 14:48:02 UTC

Brian, I've taken a look into the code you've submitted but I failed to understand how this code reduce the number of array copies in comparison with Tomcat's current implementation. I've ran the test cases with different values for the iteration and can't see real improvements.

Calls to write(String) [1] will always induce a array copy, unfortunately, because they are immutable.

Calls to write(char[]) [2] are not improvable because we don't know if the caller will reuse the char array - so we need to copy it.

I don't know if calls to [1] are more frequent in a web app than calls to [2]. If it is the case, then trying to avoid the array copy from the String would be good overall. In fact, having to manage both char[] and Strings makes it harder to improve the whole class!



So as it seems that reducing the number of array copies is difficult while calling write() (even doable ?), maybe improving the way the buffer grows is the way to go - reducing the number of array copies and garbage creation.

Any comment on my reflections ?

Comment 13 Brian Remmington 2014-06-27 17:44:57 UTC

Hi Issa

It's been a while since I raised this (6 years), so I had to refresh my memory.

I've just checked out the trunk of Tomcat 8, and the same problem exists, and I still think my code addresses it.

Your final sentence describes exactly what my suggested solution does: it improves the way the buffer grows to reduce array copies and reduce garbage collection. Your other comments don't relate to my proposal.

In the current Tomcat class (org.apache.jasper.runtime.BodyContentImpl), the problem is with the method reAllocBuff(int) which is right at the bottom of the class. This is called when the buffer array (cb) needs to grow, and it's terribly inefficient.

My code replaces cb with an instance (named "buffer") of a new class CharBuffer which manages a LinkedList of char arrays to ensure that, upon growth (CharBuffer.grow), no array copy occurs, and nothing is left to the garbage collector.

Comment 14 Brian Remmington 2014-06-28 10:37:36 UTC

I've just run the tests I provided again against Tomcat 8, ramping up the iterations in the BodyContentTest steadily to 1500000 (hardware is *so* much better now...).

My version is consistently 40% faster than the current version, however both are pretty quick these days. My machine can chew through 1000000 iterations in 1378ms with the current version (BodyContentImpl) and 824ms with my version (NewBodyContentImpl).

Speed isn't the main point of this, however. It's the very wasteful use of memory that was causing the real problem when I initially raised this issue. The garbage collector was kicking in constantly, completely draining CPU cycles.

For example, take the iterations up to 1500000 on my test machine, and the current version blows up with an OutOfMemory error. My version continues to work and scales fairly linearly (1341ms). Take it up further to 2000000 iterations, and mine starts to plateau (2393ms), but it still continues to work. At 3000000 iterations, my version completes in 4543ms.

Comment 15 Issa Gorissen 2014-07-01 09:35:09 UTC

I made some changes to the test case you provided to excluded the random (so that the random numbers are always the same for both cases) and run the test for one target only. JVM is Java HotSpot(TM) 64-Bit Server VM (25.5-b02) for linux-amd64 JRE (1.8.0_05-b13), ran with settings -server -Xloggc:gc.log -Xmx4G.

I don't get the same results as yours and GC is running a lot more on your class than Tomcat's. (I can send you the results if you'd like)

It's true I can increase the number of iteration with your class before getting a OOE more than with Tomcat's.

But is this test case emulating real usage ? My next tests will be to run both class under Tomcat with a test case of several pages with tags triggered by jmeter. This will take me some time so I will not be back before several weeks.

Thx

Comment 16 Issa Gorissen 2014-07-22 10:07:26 UTC

Created attachment 31841 [details]
Test case to compare default and suggested implementation

I've ran some tests with JMeter 2.11 against some small custom tags running under Tomcat 8.0.9.

The results from the Jmeter test in file show that the default implementation is a little bit faster than suggested one (number of jmeter samples being higher for the same amout of running time).

Can someone run the tests I provided and report the results ? If any improvement or suggestion is found for the tests I provide, please report as well. Thx!

Comment 17 Mark Thomas 2018-01-11 20:09:29 UTC

This is being addressed under the following pull request:
https://github.com/apache/tomcat70/pull/4

I'm therefore marking this as resolved as I can't mark it as a duplicate of a PR.

Comment 18 Konstantin Kolinko 2018-05-04 09:29:54 UTC

(Noting the versions where this fix was applied, 3 months ago)

The fix is available in the following versions and onwards:
9.0.4, 8.5.27, 8.0.49, 7.0.84.


See also bug 62350.