Bug 41921 - JDBC sampler : Add hashing of Data to avoid storing all output into memory when result is arbitrarily large
Summary: JDBC sampler : Add hashing of Data to avoid storing all output into memory wh...
Status: NEEDINFO
Alias: None
Product: JMeter
Classification: Unclassified
Component: Main (show other bugs)
Version: 2.2
Hardware: All All
: P1 enhancement (vote)
Target Milestone: ---
Assignee: JMeter issues mailing list
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2007-03-21 10:29 UTC by Nathan Bryant
Modified: 2019-07-11 13:02 UTC (History)
4 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nathan Bryant 2007-03-21 10:29:40 UTC
JDBCSampler (and I presume other samplers) store all the output received from
their test action. example:

Data data = getDataFromResultSet(rs);					
res.setResponseData(data.toString().getBytes());

This is poor software design because the data could be arbitrarily large and
fill memory. It is causing OutOfMemoryErrors for us, even with not very many
threads. This is major or even critical because it prevents JMeter from being
used to generate significant load. All samplers should be rewritten to just
build an MD5 hash iteratively. The hash should be updated one buffer or row at a
time instead of in bulk.
Comment 1 Sebb 2007-03-21 10:59:04 UTC
The full sample results are needed for some purposes - e.g. the Tree View 
Listener can display the results of an HTTP Sample, and Assertions need the 
response to be present - so it would not make sense to _always_ throw away the 
response data.

And the data needs to be retrieved, otherwise the sample time will not be 
representative.

However sometimes it is not ideal to store all the response data.

As a work-round you could perhaps do one of the following:
* change the query to limit the data returned
* add a BeanShell Post-Processor to zap the responseData field.

As to how to fix this: there could be an option to limit the size of the stored 
data. That should be fairly easy to do.
Comment 2 Nathan Bryant 2007-03-21 11:47:00 UTC
An MD5 or similar hash would be preferable over just storing part of the data,
for people who are using the data for functional testing. Then they could
compare everything for identity at least. I'm not doing functional testing so I
don't care, but I would recommend adding a configuration checkbox for an MD5 mode.
Comment 3 Sebb 2008-04-07 08:49:44 UTC
I've been looking into how to add hashing to the JDBC sampler.

It would be easy enough to collect all the response data and convert it to a hash just before storing it. Would that be enough for your tests? The disadvantage of this approach is that JMeter would need enough memory to store the whole response - but at least it would be only temporary.

A better solution would be to hash the data as it is retrieved. However this is not  particularly easy to do, as the data is all fetched and then formatted into lines and columns.

Also, is it important that the hash is the same as the one that would be obtained by hashing the result data after download? Or does it just need to contain all the response data in a predictable order? This would be easier to do, as there would be no need for the second formatting stage.

Any other suggestions for how to process the JDBC data are welcome...
Comment 4 Gregg 2009-06-19 12:59:55 UTC
For my own curiosity, what magnitude of data is being dealt with here?  Are we talking hundreds of megabytes? Gigabytes?  Tens or hundreds of gigabytes?  The reason I ask is because my first thought was to simply have the user increase the maximum heap size of the JVM.  What is the user currently using as the maximum heap size?
Comment 5 Philippe Mouawad 2011-11-14 12:12:14 UTC
Still missing in 2.5.1
Comment 6 Evan M 2012-06-25 15:43:25 UTC
Gregg: Increasing the JVM memory does not help.  The order of magnitude is gigabytes of data for me, but it doesn't really matter, because the application just ramps up memory until it runs out.  I should be able to run a test for an arbitrarily long amount of time if I don't need to store the result data.

For my use case, I want to test the maximum throughput of a large select statement from my webserver to my database, but the application caps out its memory before I can get any useful data.  If I don't have any listeners that need the response data, it should not be cached.

I am having this issue running 2.7 r1342410 on Windows Server 2008.
Comment 7 Franz Schwab 2019-05-06 14:06:01 UTC
How about enhancing the JDBC sampler to discard a certain amount of rows?

I think about enhancing it the following way:
https://stackoverflow.com/questions/43901408/jmeter-jdbc-sampler-fails-on-large-resultset

Any feedback on this before I start working on it?
Comment 8 Philippe Mouawad 2019-05-06 14:11:59 UTC
Hi Frank,
Thanks for contributing.

What is your use case ?
The SO answer seems to fetch only first row right ?

Regards
Comment 9 Franz Schwab 2019-05-07 14:18:58 UTC
Hi Philippe!
Thank you for your quick response.

My use case is database load testing. In 99% of the cases I use the JDBC sampler for, I am only interested in the time it took the database to run the query.
I am not interested in the time it took the client (JMeter in that case) to fetch the result set.
A BI client for example might run a query with a big result set, but maybe only fetch the first 100 rows and not the whole result set.
Currently, there is no option in JMeter to do so.
Even when you set the "Count Records" option in JMeter, the whole result set is fetched (in order to count the rows). There is no option to get the result set size without fetching it (this is JDBC standard).
It is not an option to add a LIMIT clause at the end of the query, as databases might have an optimization for that.
For the same reason, it is also not an option to use the JDBC parameter ResultSet.setMaxRows(int).
I am only interested in knowing that the query has been processed successfully (= didn't throw an error).

Yes, the code provided in the S.O. link only fetches one row.

Best regards,
Franz
Comment 10 Franz Schwab 2019-05-14 15:02:39 UTC
current work status from my side:
https://github.com/frschwab/jmeter/commit/ca96394e0e4913f7b2407a6bcf7f843d92959310

still need to update documentation.

thanks for any comments!
Comment 11 Philippe Mouawad 2019-05-14 15:16:28 UTC
(In reply to Franz Schwab from comment #10)
> current work status from my side:
> https://github.com/frschwab/jmeter/commit/
> ca96394e0e4913f7b2407a6bcf7f843d92959310
> 
> still need to update documentation.
> 
> thanks for any comments!

Thanks for contribution.
Would it be possible to also add JUnit testing code ?

Thanks
Comment 12 Franz Schwab 2019-05-14 15:20:10 UTC
Yes - I can do that. Are all the tests written in groovy?
I just had a quick look at the code.

Could you also have a look at this one, as nobody replied yet:
https://bz.apache.org/bugzilla/show_bug.cgi?id=63406

Thanks for feedback!

Franz
Comment 13 Philippe Mouawad 2019-05-14 15:57:20 UTC
(In reply to Franz Schwab from comment #12)
> Yes - I can do that. Are all the tests written in groovy?
> I just had a quick look at the code.
> 
> Could you also have a look at this one, as nobody replied yet:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=63406
> 
> Thanks for feedback!
> 
> Franz

I have reviewed it and left a comment
Comment 14 Philippe Mouawad 2019-05-14 15:57:51 UTC
(In reply to Philippe Mouawad from comment #13)
> (In reply to Franz Schwab from comment #12)
> > Yes - I can do that. Are all the tests written in groovy?
> > I just had a quick look at the code.
> > 
> > Could you also have a look at this one, as nobody replied yet:
> > https://bz.apache.org/bugzilla/show_bug.cgi?id=63406
> > 
> > Thanks for feedback!
> > 
> > Franz
> 
> I have reviewed it and left a comment

You can write test using Spock Framework + Groovy or JUnit 4.
Comment 15 Philippe Mouawad 2019-06-08 13:03:09 UTC
Hello Franz,

WIll you submit a PR ?
Thanks
Comment 16 Franz Schwab 2019-06-13 07:23:13 UTC
Hi Philippe,

yes I still want to contribute a pr.
I just didn't find the time (yet) to write some basic tests.
Until when would that be necessary to see the PR in the next release?

By the way - I just realise that the topic of this issue is about hashing the data (on client side), not about limiting the transfer of the result set from server to client (what I implemented). Should I create a new issue and continue there?
Comment 17 Philippe Mouawad 2019-06-13 19:34:36 UTC
(In reply to Franz Schwab from comment #16)
> Hi Philippe,
> 
> yes I still want to contribute a pr.
> I just didn't find the time (yet) to write some basic tests.
> Until when would that be necessary to see the PR in the next release?
> 
> By the way - I just realise that the topic of this issue is about hashing
> the data (on client side), not about limiting the transfer of the result set
> from server to client (what I implemented). Should I create a new issue and
> continue there?

Thanks, yes please create another issue.
Thanks
Comment 18 Franz Schwab 2019-07-11 13:02:05 UTC
Ok, I created a new bug:
https://bz.apache.org/bugzilla/show_bug.cgi?id=63561