54717 – BatchSampleSender/StatisticalSampleSender slows thread handling down

Bug 54717 - BatchSampleSender/StatisticalSampleSender slows thread handling down

Summary: BatchSampleSender/StatisticalSampleSender slows thread handling down

Status:	NEEDINFO

Alias:	None

Product:	JMeter - Now in Github
Classification:	Unclassified
Component:	Main (show other bugs)
Version:	2.9
Hardware:	All All

Importance:	P2 enhancement (vote)
Target Milestone:	---
Assignee:	JMeter issues mailing list

URL:
Keywords:	PatchAvailable

Depends on:
Blocks:

Reported:	2013-03-18 10:59 UTC by Danny Lade
Modified:	2013-08-16 07:52 UTC (History)
CC List:	1 user (show)

Attachments
example code for a totally decoupled statistical sample sender (2.95 KB, application/octet-stream) 2013-03-18 11:14 UTC, Danny Lade	Details
Patch to send SampleEvents outside of synchronized block (1.71 KB, patch) 2013-08-12 21:17 UTC, Philippe Mouawad	Details \| Diff
example code for a totally decoupled statistical sample sender (4.00 KB, application/octet-stream) 2013-08-14 06:33 UTC, Danny Lade	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Danny Lade 2013-03-18 10:59:01 UTC

Because the BatchSampleSender/StatisticalSampleSender synchronizes the whole sampleOccurred() method it blocks the threads of each thread group. 

The differences of the generate load can be seen using the DiskStoreSampleSender. In our case it was a factor of 3 faster. But this one could not be used striped therefore the request body is stored and will be send at the end (in our case several GB of data).

Although the AsynchSampleSender blocks several times on queue.offer() if the traffic could not be send fast enough to the client - which might also be a problem of sending the big request body too.

I suggest to decouple all sample senders from working threads, using a rather simple idea:
Store the (stripped) sample in a ConcurrentHashMap grouped by the current thread name and send it by one (ore more) worker thread(s) to the client.

There is no need to block the load generating threads doing their job.

Comment 1 Danny Lade 2013-03-18 11:14:24 UTC

Created attachment 30065 [details]
example code for a totally decoupled statistical sample sender

Comment 2 Philippe Mouawad 2013-08-12 21:17:59 UTC

Created attachment 30724 [details]
Patch to send SampleEvents outside of synchronized block

Comment 3 Philippe Mouawad 2013-08-12 21:18:51 UTC

Hello,
Could you give the attached patch a try ?
It does not yet do what you propose but it should improve the blocking time.

Thanks

Comment 4 Philippe Mouawad 2013-08-12 21:30:26 UTC

we could also create a StrippedAsyncBatchMode

Finally I think the Disruptor Framework would be a good answer for the async sample sender.

Comment 5 Sebb 2013-08-12 22:53:26 UTC

(In reply to Philippe Mouawad from comment #3)
> It does not yet do what you propose but it should improve the blocking time.

The sampleStore needs to be synchronised for both the clone() and clear() methods otherwise a sample can be added between the clone() and the clear(). Or possibly the store can be updated during the clone(). In either case, samples can be lost if the code is not synchronised.

That's partly why the existing code is synchronised - to make sure samples are not lost.

Comment 6 Sebb 2013-08-12 22:56:20 UTC

(In reply to Sebb from comment #5)
> (In reply to Philippe Mouawad from comment #3)
> > It does not yet do what you propose but it should improve the blocking time.
> 
> The sampleStore needs to be synchronised for both the clone() and clear()
> methods otherwise a sample can be added between the clone() and the clear().
> Or possibly the store can be updated during the clone(). In either case,
> samples can be lost if the code is not synchronised.
> 
> That's partly why the existing code is synchronised - to make sure samples
> are not lost.

Sorry, I see now that the code is still synchronised - the idea of the clone() is presumably to decouple the sending from the extracting of the samples.

Comment 7 Danny Lade 2013-08-14 06:33:33 UTC

Created attachment 30729 [details]
example code for a totally decoupled statistical sample sender

Comment 8 Danny Lade 2013-08-14 06:41:13 UTC

I've uploaded a new version of our implementation, because we've made several bugfixes. Now it works fine and completely without synchronization at all.

In my opinion it will be much harder to find a solution for batched samplers, but for hold samplers this solution seems to be the best.

note: 
Ignore the "marker"-stuff in the code. We're using it to measure different phases of the load test.

Comment 9 Danny Lade 2013-08-14 07:05:52 UTC

About your patch:
It still synchronizes the sample store and this is the most time consuming operation. 

IMHO this is not necessary to synchronize the sample store if you're using an own  store for each thread (see my attachment). 
The only catch is, that testEnded() runs in the main thread. This is why I use the queue of maps to store all maps of the threads on one central point. But the queue is only accessed two times 1) at the initializing for each thread (see statisticalMapSelector) and 2) at the testEnded() method.

This is why I don't need to synchronize or lock at all.

Comment 10 Sebb 2013-08-14 07:23:36 UTC

(In reply to Danny Lade from comment #9)
> 
> IMHO this is not necessary to synchronize the sample store if you're using
> an own  store for each thread (see my attachment). 

Only true if there is no access from other threads.

> The only catch is, that testEnded() runs in the main thread. This is why I
> use the queue of maps to store all maps of the threads on one central point.
> But the queue is only accessed two times 1) at the initializing for each
> thread (see statisticalMapSelector) and 2) at the testEnded() method.
> 
> This is why I don't need to synchronize or lock at all.

Synchronisation is not only needed to guarantee single-threaded access to critical points in the code. It is also needed to ensure safe publication across threads. The Java memory model allows threads cache values locally; threads only have to update/fetch main memory at a memory synch. point. If one thread updates a field, another thread may not see the new value unless both the threads synch. on the same lock (volatile can also be used in some cases).

Comment 11 Danny Lade 2013-08-14 08:14:20 UTC

That's all true and well known, but where is the connection to the SampleSender?

Each Thread (which produces load) can store and send the sample data by it's own using the SampleSender. The only catch is the end of the tests where the main thread calls testEnded() method. 
When does one thread need access to the sample data of another thread, especially inside the SampleSender???

There might be several scenarios where one thread need access to the sample events of another thread. But IMO this is bad practise for a high load test because one would provoke synchronization/blocking.
(we never use these technics, each thread has it's own data, nothing shared)

However, if the data is already given to the SampleSender, to send it to the jmeter (remote) client, no thread needs to share the data stored INSIDE the SampleSender (fire and forget).

As I already mentioned, the solution we've made works quite well in our high load scenarios - believe me.

Comment 12 Sebb 2013-08-14 09:08:15 UTC

(In reply to Danny Lade from comment #11)
> That's all true and well known, but where is the connection to the
> SampleSender?

In comment #9 you wrote:

> But the queue is only accessed two times 1) at the initializing for each
> thread (see statisticalMapSelector) and 2) at the testEnded() method.
 
One thread initialises the queue, another processes it.

> As I already mentioned, the solution we've made works quite well in our high
> load scenarios - believe me.

Unfortunately that does not prove anything.

Comment 13 Danny Lade 2013-08-14 10:14:52 UTC

> In comment #9 you wrote:
> 
> > But the queue is only accessed two times 1) at the initializing for each
> > thread (see statisticalMapSelector) and 2) at the testEnded() method.
>  
> One thread initialises the queue, another processes it.
> 
No, the same thread does this. 

Main: create SampleSender, therefore create queue
Main: start threads
Thread 1: append map to queue
Thread 1: add sample data to own map
...
Thread N: append map to queue
Thread 1: add sample data to own map
...
Thread N: add sample data to own map
Main: all threads ended
Main: call testEnded(), therefore read queue and send sample data

> > As I already mentioned, the solution we've made works quite well in our high
> > load scenarios - believe me.
> 
> Unfortunately that does not prove anything.
>
No it doesn't, but it also sounds like 'Don't trust anyone' to me. 

So, how to prove? What do you have in mind?

Comment 14 Sebb 2013-08-14 10:50:58 UTC

(In reply to Danny Lade from comment #13)
> > In comment #9 you wrote:
> > 
> > > But the queue is only accessed two times 1) at the initializing for each
> > > thread (see statisticalMapSelector) and 2) at the testEnded() method.
> >  
> > One thread initialises the queue, another processes it.
> > 
> No, the same thread does this. 
> 
> Main: create SampleSender, therefore create queue
> Main: start threads
> Thread 1: append map to queue

The queue entry is set up (i.e. initialised) by a different thread.

Also Thread 1 adds items to the queue, which are later accessed by a different thread in testEnded(). This also needs to be done in a way that ensures safe publication, e.g. by using a thread-safe queue (which seems to be the case).

> Thread 1: add sample data to own map
> ...
> Thread N: append map to queue
> Thread 1: add sample data to own map
> ...
> Thread N: add sample data to own map
> Main: all threads ended
> Main: call testEnded(), therefore read queue and send sample data
> 
> > > As I already mentioned, the solution we've made works quite well in our high
> > > load scenarios - believe me.
> > 
> > Unfortunately that does not prove anything.
> >
> No it doesn't, but it also sounds like 'Don't trust anyone' to me. 

I'm just saying it proves nothing.

> So, how to prove? What do you have in mind?

Code inspection should be used to show that objects shared between threads are protected by a common lock.

Comment 15 Danny Lade 2013-08-14 11:28:58 UTC

> > Main: create SampleSender, therefore create queue
> > Main: start threads
> > Thread 1: append map to queue
> 
> The queue entry is set up (i.e. initialised) by a different thread.
> 
> Also Thread 1 adds items to the queue, which are later accessed by a
> different thread in testEnded(). This also needs to be done in a way that
> ensures safe publication, e.g. by using a thread-safe queue (which seems to
> be the case).
> 
I see, yes it is a ConcurrentLinkedQueue.

> > So, how to prove? What do you have in mind?
> 
> Code inspection should be used to show that objects shared between threads
> are protected by a common lock.

As you already found out the lock is only necessary at the beginning (init). Feel free to inspect the ConcurrentLinkedQueue (java.util.concurrent) about it.

However, this code works only for a HoldSampleSender, which sends all data at the end at once. 
I was also thinking about a BatchSampleSender using the same principles in the past. But my project does not needed one, therefore I got no time to implement it.

Comment 16 Philippe Mouawad 2013-08-15 22:34:35 UTC

Looking at your code, I think it is fine regarding multi-threading.
There is something misleading in the comments of this Bugzilla as you spoke about BatchSampleSender at first.

Thanks for the patch, although not in expected format and with too many dependencies that could be avoided.

Comment 17 Danny Lade 2013-08-16 07:52:43 UTC

(In reply to Philippe Mouawad from comment #16)
> Looking at your code, I think it is fine regarding multi-threading.
> There is something misleading in the comments of this Bugzilla as you spoke
> about BatchSampleSender at first.
> 
The BatchSampleSender has exactly the same problem like all samplers which are synchronizing in sampleOccurred() method. We just wrote the  StatisticalHoldSampleSender because it is much easier to implement.

> Thanks for the patch, although not in expected format and with too many
> dependencies that could be avoided.

Sorry for that, but we just use the implementation above as it is by patching the jmeter.properties for "mode=net.bigpoint.jmeter.samplers.StatisticalHoldSampleSender". It should have been just an example how one may solve the problem described first.

Comment 18 The ASF infrastructure team 2022-09-24 20:37:53 UTC

This issue has been migrated to GitHub: https://github.com/apache/jmeter/issues/3075