Bug 63723 - Distributed testing: JMeter master ends distributed test though some threads still are active
Summary: Distributed testing: JMeter master ends distributed test though some threads ...
Status: RESOLVED FIXED
Alias: None
Product: JMeter - Now in Github
Classification: Unclassified
Component: Main (show other bugs)
Version: 5.1.1
Hardware: All All
: P2 major (vote)
Target Milestone: JMETER_5.2
Assignee: JMeter issues mailing list
URL:
Keywords: FixedInTrunk
: 63722 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-09-04 04:48 UTC by isuk
Modified: 2019-10-28 20:17 UTC (History)
1 user (show)



Attachments
Master and four slaves load distribution (332.52 KB, image/png)
2019-09-04 04:48 UTC, isuk
Details
master log file (14.15 KB, text/plain)
2019-09-04 05:41 UTC, isuk
Details
slave 1 - log file (6.65 KB, text/plain)
2019-09-04 05:42 UTC, isuk
Details
slave 2 - log file (3.79 KB, text/plain)
2019-09-04 05:42 UTC, isuk
Details
Capture23 - master and two slaves (196.07 KB, image/png)
2019-09-10 05:17 UTC, isuk
Details
Test Plan (266.41 KB, application/xml)
2019-09-10 05:17 UTC, isuk
Details
Master log file (12.97 KB, text/plain)
2019-09-10 05:18 UTC, isuk
Details
VM1 log file (6.65 KB, text/plain)
2019-09-10 05:19 UTC, isuk
Details
VM2 log file (3.79 KB, text/plain)
2019-09-10 05:20 UTC, isuk
Details
with 4VMs (331.16 KB, image/png)
2019-09-11 04:27 UTC, isuk
Details
Schematic file (136.29 KB, text/html)
2019-09-11 06:45 UTC, isuk
Details
VM1 thread dump (7.80 KB, text/plain)
2019-09-12 01:24 UTC, isuk
Details
VM3 thread dump (7.82 KB, text/plain)
2019-09-12 01:25 UTC, isuk
Details
VM1_jmeter_server file (2.86 KB, text/plain)
2019-09-17 03:56 UTC, isuk
Details
VM3_jmeter_server file (2.82 KB, text/plain)
2019-09-17 03:57 UTC, isuk
Details
VM4_jmeter_server file (2.82 KB, text/plain)
2019-09-17 04:01 UTC, isuk
Details
VM2_jmeter_server file (46.92 KB, application/zip)
2019-09-17 04:02 UTC, isuk
Details
2019-09-24 (293.55 KB, application/zip)
2019-09-24 02:45 UTC, isuk
Details
Master and slave scenario 2 (246.68 KB, image/png)
2019-09-24 05:01 UTC, isuk
Details
28th October (426.34 KB, image/png)
2019-10-27 23:27 UTC, isuk
Details

Note You need to log in before you can comment on or make changes to this bug.
Description isuk 2019-09-04 04:48:57 UTC
Created attachment 36754 [details]
Master and four slaves load distribution

- I'm using Jmeter 5.1.1.
- I have set up a distributed load test environment using windows 10 machines (master/my local with four VMs (as slaves) running on the same subnet)
- Non-GUI mode is used.
- trying to run 1000 concurrent threads on 4VMs (250VU load each).
- JMeter-server batch files are opened on all slaves/hosts.
- Jmeter command is executed to run the distributed load:-

Actual result:
- all the slaves have started the test (JVM) and running
- Load test finishes. Master runs the load ONLY for e.g. 953VUs. Leaving 47 active threads behind (See attachment). Not sure what is happening for the rest 47 threads(tried this case many times still occurs-with different completed threads. E.g. 972VUs completed 28 remained and etc.

NOTE: Have increased the Java HEAP on Master and on all four slaves (min to 2g to XMX 8g). So no memory resource issue at all either.


NOTE: Marked this as a major - as it is critical for our business work.
Comment 1 Felix Schumacher 2019-09-04 05:04:45 UTC
Please give us more information.

Can you provide a minimal test plan, so that we can reproduce this?

Are there any error or warning messages in the log files? Probably best, if you could provide those from the server and the controlled nodes. 

Are you using any third party plugins?
Comment 2 isuk 2019-09-04 05:40:30 UTC
- no third party plugins are used. 
- Have added Master and slave jmeter.log files herewith  ( i can't see jmeter.log files for the other two VM for some weird reason)
Comment 3 isuk 2019-09-04 05:41:39 UTC
Created attachment 36755 [details]
master log file
Comment 4 isuk 2019-09-04 05:42:00 UTC
Created attachment 36756 [details]
slave 1 - log file
Comment 5 isuk 2019-09-04 05:42:17 UTC
Created attachment 36757 [details]
slave 2 - log file
Comment 6 Philippe Mouawad 2019-09-05 20:19:28 UTC
*** Bug 63722 has been marked as a duplicate of this bug. ***
Comment 7 Philippe Mouawad 2019-09-05 20:38:59 UTC
Hello,
Unless you provide all logs and a test plan it will be hard to help.

Please reproduce again your problem and attach required files.

Thank you
Comment 8 isuk 2019-09-10 05:16:01 UTC
Please see the attached. I have run a concurrent user thread of 1000VUs (running on two VMs).

On VM one it says - threads still running
On VM two it says - threads are still running

Master at the bottom has finished running the threads and has received the results from hosts (you can see two Active threads are yet to run only 998 are finished.)

Test Plan, Log files attached.
Comment 9 isuk 2019-09-10 05:17:07 UTC
Created attachment 36772 [details]
Capture23 - master and two slaves
Comment 10 isuk 2019-09-10 05:17:40 UTC
Created attachment 36773 [details]
Test Plan
Comment 11 isuk 2019-09-10 05:18:09 UTC
Created attachment 36774 [details]
Master log file
Comment 12 isuk 2019-09-10 05:19:50 UTC
Created attachment 36775 [details]
VM1 log file
Comment 13 isuk 2019-09-10 05:20:57 UTC
Created attachment 36776 [details]
VM2 log file
Comment 14 isuk 2019-09-11 04:26:51 UTC
Other observations I would like to add is:
- Threads abruptly halfway - times - and load test ends for just 576 users.
- Jmeter is so unpredictable and this kind of behaviors was observed by team members as well.
- I even distribute the load across 4 VMs (this time with 960VUs). It just hangs forever sometimes and does nothing (even after good 5 hours). See attachment "with 4VMs". Its just craps never finish correctly.
Comment 15 isuk 2019-09-11 04:27:31 UTC
Created attachment 36777 [details]
with 4VMs
Comment 16 Felix Schumacher 2019-09-11 04:58:38 UTC
Can you give us thread dumps of the workers that don't stop correctly?

It is strange that there are no exceptions logged in the log files you uploaded. Are there any other log files with exceptions?
Comment 17 Philippe Mouawad 2019-09-11 05:48:47 UTC
Can you open your plan in JMeter and run Tools > Schematic overview and attach the resulting html file ?

Thanks
Comment 18 isuk 2019-09-11 06:44:19 UTC
Schematic file attached herewith.
Comment 19 isuk 2019-09-11 06:45:12 UTC
Created attachment 36778 [details]
Schematic file
Comment 20 isuk 2019-09-11 06:48:00 UTC
There aren't any log files with exceptions.
Comment 21 Felix Schumacher 2019-09-11 09:14:30 UTC
(In reply to isuk from comment #20)
> There aren't any log files with exceptions.

Can you provide thread dumps?
Comment 22 Felix Schumacher 2019-09-11 11:28:54 UTC
Note: You are testing a lot of services, that are most probably not yours and should be left out (like a font service or facebook).

Another thing I noticed is, that you are not defining timeouts on the http samplers. To set them for all, go to the advanced tab of the http default settings and set a reasonable timeout for connect and response.

And remember to give us the thread dumps of the hanging nodes.
Comment 23 isuk 2019-09-12 01:24:22 UTC
thanks for the shout. so the services facebook, font cannot be excluded due to the nature of how the application is developed. We have to live with it at least for the time been.

Thread dumps for VM1 and VM3 are attached herewith. 

Thanks.
Comment 24 isuk 2019-09-12 01:24:49 UTC
Created attachment 36780 [details]
VM1 thread dump
Comment 25 isuk 2019-09-12 01:25:07 UTC
Created attachment 36781 [details]
VM3 thread dump
Comment 26 Felix Schumacher 2019-09-12 19:43:13 UTC
(In reply to isuk from comment #23)
> thanks for the shout. so the services facebook, font cannot be excluded due
> to the nature of how the application is developed. We have to live with it
> at least for the time been.

But it seems, that you are not doing anything with those requests. For example the mozilla site request -- the first sampler -- is usually sent by a browser to check whether you have internet access. The test doesn't check the result nor uses it anything from it. Why do you have to make the request in this test?

> 
> Thread dumps for VM1 and VM3 are attached herewith. 

There are "RMI Reaper" threads that are not stopping. I found an old mail thread, in which sebb chimed in posted a solution for this -- which is already be implemented in JMeter. Bugzilla Id 50659, git commit 76087e7e4dbf358be801e042e89dd828eb2d50c5.

But this is done on the controlling node, if I interpret the code correctly and your worker nodes seem to be hanging. So maybe we have to add such a workaround on the workers, too.

What is strange, is that your logs from the workers seem to be incomplete. There should be info about started threads and RMI and such. Are there any other new files generated on the worker nodes or the controlling node? hprof files, log files, anything?

> 
> Thanks.
Comment 27 Felix Schumacher 2019-09-12 20:03:21 UTC
Re my last comment. Forget everything I said about RMI Reaper not exiting :) I should have thought a bit longer and looked into a normally running worker node. The thread dumps are expected to contain an RMI Reaper.

So it is back to the truncated logs from the worker nodes and the absence of any exceptions in those.
Comment 28 isuk 2019-09-16 05:18:54 UTC
I have more issues on top of this issue. I will try to find any other log files on Nodes. But before doing that. I need to understand why I'm getting:

Problem with SSL certificate for url for 'qa.studentportal.acu.edu.au'? Ensure browser is set to accept the JMeter proxy cert: Remote host terminated the handshake


for no reason when I've correctly set up the proxy certificate. Can't proceed with anything in the proper way as we're getting so many weird issues with JMeter.
Comment 29 isuk 2019-09-17 03:55:09 UTC
@Felix,


I couldn't locate any logs for the workers. Only thing I can see is jmeter_server files. Which I'm attaching herewith (for the 4 VMs).
Comment 30 isuk 2019-09-17 03:56:43 UTC
Created attachment 36785 [details]
VM1_jmeter_server file
Comment 31 isuk 2019-09-17 03:57:54 UTC
Created attachment 36786 [details]
VM3_jmeter_server file
Comment 32 isuk 2019-09-17 04:01:37 UTC
Created attachment 36787 [details]
VM4_jmeter_server file
Comment 33 isuk 2019-09-17 04:02:54 UTC
Created attachment 36788 [details]
VM2_jmeter_server file

This file was too large to attach. SO had to ZIP it (VM2_jmeter_server file)
Comment 34 Felix Schumacher 2019-09-17 14:34:03 UTC
(In reply to isuk from comment #33)
> Created attachment 36788 [details]
> VM2_jmeter_server file
> 
> This file was too large to attach. SO had to ZIP it (VM2_jmeter_server file)

In this log file are exceptions. First one is (there are many more)

2019-09-11 14:39:01,120 ERROR o.a.j.s.StandardSampleSender: sampleOccurred
java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is:.
>-------java.net.SocketException: Connection reset
>-------at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:236) ~[?:?]
>-------at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) ~[?:?]
>-------at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:209) ~[?:?]
>-------at java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:161) ~[?:?]
>-------at com.sun.proxy.$Proxy19.sampleOccurred(Unknown Source) ~[?:?]
>-------at org.apache.jmeter.samplers.StandardSampleSender.sampleOccurred(StandardSampleSender.java:67) [ApacheJMeter_core.jar:5.1.1 r1855137]
>-------at org.apache.jmeter.samplers.RemoteListenerWrapper.sampleOccurred(RemoteListenerWrapper.java:95) [ApacheJMeter_core.jar:5.1.1 r1855137]
>-------at org.apache.jmeter.threads.ListenerNotifier.notifyListeners(ListenerNotifier.java:70) [ApacheJMeter_core.jar:5.1.1 r1855137]
>-------at org.apache.jmeter.threads.JMeterThread.notifyListeners(JMeterThread.java:980) [ApacheJMeter_core.jar:5.1.1 r1855137]
>-------at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:569) [ApacheJMeter_core.jar:5.1.1 r1855137]
>-------at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:486) [ApacheJMeter_core.jar:5.1.1 r1855137]
>-------at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:253) [ApacheJMeter_core.jar:5.1.1 r1855137]
>-------at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.net.SocketException: Connection reset
>-------at java.net.SocketInputStream.read(SocketInputStream.java:186) ~[?:?]
>-------at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?]
>-------at java.io.BufferedInputStream.fill(BufferedInputStream.java:252) ~[?:?]
>-------at java.io.BufferedInputStream.read(BufferedInputStream.java:271) ~[?:?]
>-------at java.io.DataInputStream.readByte(DataInputStream.java:270) ~[?:?]
>-------at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222) ~[?:?]
>-------... 12 more
Comment 35 isuk 2019-09-18 05:37:48 UTC
@Felix

Thanks for coming back. I can't see any relevance these exceptions and the application under test. Could you let us know what might be the issue for this? OR any suggestions for us to follow? We were running so many test runs one after the other just to get the things right....maybe this server-log file was something related to that.....

Any thoughts as we are still struggling to run a full load test run.

Thanks.
Comment 36 isuk 2019-09-24 02:44:54 UTC
In order to help you dig this issue, I have done another load test run. This time for 1000 VUs across 4VMs each with a 250VM load. Load test ends when 376 threads are still active. Also, if you noticed 1st VM is still running (rest is finished). 

I have included all the files for you to investigate (log files, thread dumps, server files and etc.). See attachment name "2019-09-24"

There aren't any exceptions that I can find during this run. Yet, load runs don't achieve what we needed/expected.

If you can kindly look into this, please.

Thanks
Comment 37 isuk 2019-09-24 02:45:28 UTC
Created attachment 36795 [details]
2019-09-24
Comment 38 isuk 2019-09-24 04:59:58 UTC
Same above scenario ran again and observed "Master and slave scenario 2". See attached.
Comment 39 isuk 2019-09-24 05:01:11 UTC
Created attachment 36796 [details]
Master and slave scenario 2
Comment 40 Philippe Mouawad 2019-09-30 20:03:14 UTC
pmouawad@apache.org
	
10:02 PM (0 minutes ago)
	
to commits@jmeter.apache.org
This is an automated email from the ASF dual-hosted git repository.

pmouawad pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/jmeter.git


The following commit(s) were added to refs/heads/master by this push:
     new 105a399  This commit fixes bugs 63614 and 63723 Bug 63614 - Distributed testing: Unable to generate Dashboard report at end of load test
105a399 is described below

commit 105a3999f44c73200fb08c249e120937840ee799
Author: pmouawad <p.mouawad@ubik-ingenierie.com>
AuthorDate: Mon Sep 30 22:02:20 2019 +0200

    This commit fixes bugs 63614 and 63723
    Bug 63614 - Distributed testing: Unable to generate Dashboard report at
    end of load test

    Bug 63723 - JMeter master ends distributed test though some threads
    still are active
---
 .../src/main/java/org/apache/jmeter/JMeter.java    | 136 +++++++++++----------
 .../apache/jmeter/engine/ClientJMeterEngine.java   |   5 +
 .../apache/jmeter/engine/DistributedRunner.java    |  27 +++-
 xdocs/changes.xml                                  |   2 +
 4 files changed, 99 insertions(+), 71 deletions(-)
Comment 41 Philippe Mouawad 2019-10-11 21:54:27 UTC
Hello,
Any chance to have feedback on this fix ?

Thank you
Regards
Comment 42 isuk 2019-10-15 03:36:14 UTC
(In reply to Philippe Mouawad from comment #41)
> Hello,
> Any chance to have feedback on this fix ?
> 
> Thank you
> Regards

Hi,

Sorry for the delay in replying. I was away from work. I had a look and I have to say I will wait for the official 5.2 release. Simply because nightly build page says"The nightly builds are interim builds that are untested and unsupported. Use at your own risk!"


Otherwise, our work will be doubled if we need to re-install 5.11 if things are not fine.

Let me know.

Thank you.
Comment 43 isuk 2019-10-27 06:23:31 UTC
bug still exists. In fact, there are plenty of other issues on top of this. No results are returned to Master at all (now) and slaves finished running tests. Sorry, we are moving away from Jmeter simply because it is time-consuming with finding answers on-line for issues we're facing every day, tool not accurate at all. We are evaluating another tool at the moment and team is going to go ahead with it. Thanks again for your help.
Comment 44 Philippe Mouawad 2019-10-27 07:05:01 UTC
(In reply to isuk from comment #43)
> bug still exists. 

I don’ think so as I tested this scenario.
We’re still waiting for elements anyway.


> In fact, there are plenty of other issues on top of this.

I guess you won’t report any, so ...

> No results are returned to Master at all (now) and slaves finished running
> tests. Sorry, we are moving away from Jmeter simply because it is
> time-consuming with finding answers on-line for issues we're facing every
> day, tool not accurate at all. 

Do as you feel is right for you

> We are evaluating another tool at the moment
> and team is going to go ahead with it. Thanks again for your help.

good luck ...
Comment 45 Philippe Mouawad 2019-10-27 07:06:39 UTC
At team, anybody could double check this ?

Thanks
Comment 46 isuk 2019-10-27 23:27:12 UTC
@Philippe Mouawad

I have no personal issues with you or with Jmeter as a tool. But, you seem taking this personal. As an institution, we're trying to get performance answers for our software not waiting and querying for issues like this forever.

With distributed load testing what we as a team did mostly tried finding answers to JMeter Master/slave issues rather get our work done. We simply couldn't get any satisfying answer yet. There is some issue somewhere with JMeter. 

Last Friday (25th Oct), we couldn't get any single result back to Master though slaves have stopped/finished running. Today (28th Oct)it seems we are back to square one. See attachment. Slaves have Finished running, Master yet have unfinished threads. 

Is there an issue with the way we have set up this nightly build of yours? We just deleted the previous Jmeter. But the folder still indicates 5.1.1. Maybe the files are not correct. We can't say what is what
Comment 47 isuk 2019-10-27 23:27:49 UTC
Created attachment 36865 [details]
28th October
Comment 48 Philippe Mouawad 2019-10-28 06:51:26 UTC
I cannot help if your team doesn’t know how to upgrade software as I don’t have information about your setup.
Installing jmeter is just about unzipping version in a folder and using it.

Regarding non ending threads, as per Felix previous answer weeks ago, you didn’t set connect and read timeout on your Http Request, as a result if a connection hangs due to non responsive server, the corresponding thread ends up being stuck waiting infinitely for response.

Set connect and read timeout on HTTP request default element so that it applies to all HTTP Requests and this problem will be sorted out.

As of now, there is no evidence of any issue in JMeter and your assumption about lot of bugs in JMeter is pointless.

Provide your test plan if you want precise feedback or switch to another tool if you find it better.
Comment 49 Vladimir Sitnikov 2019-10-28 08:48:58 UTC
"VM1 thread dump" and "VM3 thread dump" contain "DestroyJavaVM" which means the JVM was trying to shut down.

In order to shutdown, it needs to wait all the non-daemon threads.
Both thread dumps include "RMI Reaper" thread which is not marked with "daemon".


"VM1..4 server file" does not contain "Exiting" log message => the master did not send "exit" command.


I don't follow the fix, however, it should improve the logging at least.

> Is there an issue with the way we have set up this nightly build of yours? We >just deleted the previous Jmeter. But the folder still indicates 5.1.1. Maybe >the files are not correct. We can't say what is what

Where did you download the nightly from?
https://jmeter.apache.org/nightly.html lists archives that have apache-jmeter-5.2-SNAPSHOT inside.
Comment 50 Vladimir Sitnikov 2019-10-28 08:48:58 UTC
"VM1 thread dump" and "VM3 thread dump" contain "DestroyJavaVM" which means the JVM was trying to shut down.

In order to shutdown, it needs to wait all the non-daemon threads.
Both thread dumps include "RMI Reaper" thread which is not marked with "daemon".


"VM1..4 server file" does not contain "Exiting" log message => the master did not send "exit" command.


I don't follow the fix, however, it should improve the logging at least.

> Is there an issue with the way we have set up this nightly build of yours? We >just deleted the previous Jmeter. But the folder still indicates 5.1.1. Maybe >the files are not correct. We can't say what is what

Where did you download the nightly from?
https://jmeter.apache.org/nightly.html lists archives that have apache-jmeter-5.2-SNAPSHOT inside.
Comment 51 Philippe Mouawad 2019-10-28 20:12:44 UTC
(In reply to isuk from comment #47)
> Created attachment 36865 [details]
> 28th October

From log messages of screenshot,
 you are using 5.1.1 not 5.2 ..., or if using a nightly 5.2, not the intermediate version that contains the bugfix.

So regarding "bug still exists. In fact, there are plenty of other issues on top of this", I'll let you think about it.
Comment 52 Philippe Mouawad 2019-10-28 20:17:24 UTC
(In reply to Vladimir Sitnikov from comment #50)
> "VM1 thread dump" and "VM3 thread dump" contain "DestroyJavaVM" which means
> the JVM was trying to shut down.
> 
> In order to shutdown, it needs to wait all the non-daemon threads.
> Both thread dumps include "RMI Reaper" thread which is not marked with
> "daemon".
> 
> 
> "VM1..4 server file" does not contain "Exiting" log message => the master
> did not send "exit" command.
> 
> 
> I don't follow the fix, however, it should improve the logging at least.

If you look at old code (https://github.com/apache/jmeter/blob/v5_1_1/src/core/org/apache/jmeter/JMeter.java), ListenToTest#startedEngines would always be equal to 0, so stopper thread would be triggered on first testEnded call.



> 
> > Is there an issue with the way we have set up this nightly build of yours? We >just deleted the previous Jmeter. But the folder still indicates 5.1.1. Maybe >the files are not correct. We can't say what is what
> 
> Where did you download the nightly from?
> https://jmeter.apache.org/nightly.html lists archives that have
> apache-jmeter-5.2-SNAPSHOT inside.
Comment 53 The ASF infrastructure team 2022-09-24 20:38:17 UTC
This issue has been migrated to GitHub: https://github.com/apache/jmeter/issues/5153