Bug 51888 - Occasional deadlock when stopping a testplan
Occasional deadlock when stopping a testplan
Status: RESOLVED FIXED
Product: JMeter
Classification: Unclassified
Component: HTTP
2.5.1
All All
: P2 normal (vote)
: ---
Assigned To: JMeter issues mailing list
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2011-09-25 11:39 UTC by Sebb
Modified: 2011-09-27 14:43 UTC (History)
1 user (show)



Attachments
Sample dump with deadlock (185.10 KB, text/plain)
2011-09-25 11:39 UTC, Sebb
Details
Thread Dump Of Issue (100.30 KB, text/plain)
2011-09-26 07:48 UTC, Philippe Mouawad
Details
Log File of test that generates issue (29.01 KB, text/plain)
2011-09-26 07:48 UTC, Philippe Mouawad
Details
Test Case that produced Deadlock (38.58 KB, application/xml)
2011-09-26 07:50 UTC, Philippe Mouawad
Details
Log in DEBUG mode (532.67 KB, application/octet-stream)
2011-09-26 07:55 UTC, Philippe Mouawad
Details
Fix to the issue (2.45 KB, patch)
2011-09-26 08:30 UTC, Philippe Mouawad
Details | Diff
Fix to JMeterThread (3.75 KB, patch)
2011-09-26 21:05 UTC, Philippe Mouawad
Details | Diff
2 Thread Dumps (19.80 KB, text/plain)
2011-09-27 08:18 UTC, Philippe Mouawad
Details
Thread dump showing interrupt holds lock for end of life (7.24 KB, text/plain)
2011-09-27 09:02 UTC, Philippe Mouawad
Details
Log file (7.95 KB, text/plain)
2011-09-27 09:03 UTC, Philippe Mouawad
Details
Patch to put unlock in a finally (1.56 KB, patch)
2011-09-27 14:20 UTC, Philippe Mouawad
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sebb 2011-09-25 11:39:17 UTC
Created attachment 27584 [details]
Sample dump with deadlock

Have see occasional deadlocks when stopping a test plan with multiple HC4 test samplers.

The cause of the deadlock may be due to HC4 - see https://issues.apache.org/jira/browse/HTTPCLIENT-1127 - but it's not yet clear why JMeter appears to be running threadFinished when interrupt is still active. See attached dump.
Comment 1 Philippe Mouawad 2011-09-26 07:48:13 UTC
Created attachment 27587 [details]
Thread Dump Of Issue
Comment 2 Philippe Mouawad 2011-09-26 07:48:48 UTC
Created attachment 27588 [details]
Log File of test that generates issue
Comment 3 Philippe Mouawad 2011-09-26 07:50:06 UTC
Created attachment 27589 [details]
Test Case that produced Deadlock

Hello,
I reproduced by One run the deadlock.
Just add a mirror server on 8081 and run test case.
When number of Running Threads reaches 40, call stop, deadlock occurs.

Regards
Philippe Mouawad
Comment 4 Philippe Mouawad 2011-09-26 07:55:17 UTC
Created attachment 27590 [details]
Log in DEBUG mode

Zip that contains log file in DEBUG mode
Comment 5 Philippe Mouawad 2011-09-26 08:25:04 UTC
Hello,
For me the issue is an HTTPClient one.
It is not a problem that appears to be running threadFinished when interrupt is still active.
The explanation for this would be this kind of sequence:
1) Stopper thread (Thread-77 in my Thread Dump) first calls StandardJMeterEngine.tellThreadsToStop which calls stop() => Thread is suspended here
2) "Thread Group 1-32" gets stop call , run thread is stopped and we enter threadFinished => Thread is suspended
3) Stopper thread enters interrupt call and calls abort which is suspended just before AbstractClientConnAdapter.abortConnection
4) "Thread Group 1-32" locks  org.apache.http.impl.conn.SingleClientConnManager through shutdown call

Deadlock occurs.

Regards
Philippe Mouawad
Comment 6 Philippe Mouawad 2011-09-26 08:30:14 UTC
Created attachment 27591 [details]
Fix to the issue

Hello,
Here is my fix that should be removed once bug is fixed in HTTPCLient.

To workaround the pb related to a different sequence of calls on shutdown from HTTPHC4Impl, I synchronize both request.abort(); and the for loop that shutdowns the connection managers of a thread.

It should not have an important performance impact since it only concerns stop and shutdown sequences.

Since I applied it I never reproduce issue.
Regards
Philippe Mouawad
Comment 7 stef 2011-09-26 09:12:28 UTC
Hi,

I don't think the bug is only related to HTTPClient as it appears also (every times) with the DummySampler from our plugins (http://code.google.com/p/jmeter-plugins/). This sampler is basic and defined as "public class DummySampler extends AbstractSampler".

The problem was introduced with 2.5.1, it was working fine with 2.4 and 2.5.

Maybe this info can help to investigate the problem...

Stef
Comment 8 Sebb 2011-09-26 11:23:20 UTC
(In reply to comment #6)
> Created attachment 27591 [details]
> Fix to the issue
> 
> Hello,
> Here is my fix that should be removed once bug is fixed in HTTPCLient.
> 
> To workaround the pb related to a different sequence of calls on shutdown from
> HTTPHC4Impl, I synchronize both request.abort(); and the for loop that
> shutdowns the connection managers of a thread.
> 
> It should not have an important performance impact since it only concerns stop
> and shutdown sequences.
>
> Since I applied it I never reproduce issue.

That's a neat solution to the HC4 deadlock issue; thanks.
Comment 9 Sebb 2011-09-26 11:24:33 UTC
(In reply to comment #7)
> Hi,
> 
> I don't think the bug is only related to HTTPClient as it appears also (every
> times) with the DummySampler from our plugins
> (http://code.google.com/p/jmeter-plugins/). This sampler is basic and defined
> as "public class DummySampler extends AbstractSampler".
> 
> The problem was introduced with 2.5.1, it was working fine with 2.4 and 2.5.
> 
> Maybe this info can help to investigate the problem...

What would really help would be the thread dump and jmeter log file for a deadlock.
Comment 10 Sebb 2011-09-26 16:03:55 UTC
What appears to be the issue is that the interrupt may(In reply to comment #5)
> Hello,
> For me the issue is an HTTPClient one.
> It is not a problem that appears to be running threadFinished when interrupt is
> still active.

Actually, I think this is a general problem; however so far only HC4 seems to be affected by it.

> The explanation for this would be this kind of sequence:
> 1) Stopper thread (Thread-77 in my Thread Dump) first calls
> StandardJMeterEngine.tellThreadsToStop which calls stop() => Thread is
> suspended here
> 2) "Thread Group 1-32" gets stop call , run thread is stopped and we enter
> threadFinished => Thread is suspended
> 3) Stopper thread enters interrupt call and calls abort which is suspended just
> before AbstractClientConnAdapter.abortConnection
> 4) "Thread Group 1-32" locks  org.apache.http.impl.conn.SingleClientConnManager
> through shutdown call

The explanation I have is slightly different; threads are S (engine) and T (JMeterThread); numbers are time sequence:

T1) active sample in progress
S2) interrupt active sample
T3) sample completes
S4) [interrupt cleanup in progress]
T5) thread completes
S6) [interrupt cleanup still in progress]
T7) shutdown invoked

I tried adding a short pause between S6 and T7 and that fixed the problem for me, as it allowed the interrupt to complete.

The fix in comment 6 works because it causes the JMeterThread code to wait until the engine interrupt has finished.

I think the fix actually needs to be applied to JMeterThread instead so it applies to all samplers, not just HC4.
Comment 11 Philippe Mouawad 2011-09-26 20:08:55 UTC
Hello Sebb,
I think issue (Deadlock) only occurs with HC4 implementation because in this one we use for interrupt  request#abort while in HC3 we use the same method used in shutdown.

My fix was just for deadlock, by locking a HttpHC4Impl level, I ensure sequence that request.abort and ConnectionManager().shutdown() are never called at the same time which would cause the deadlock.

I am not sure to understand the general issue you are talking about:
- Is it that interrupt might be called after shutdown ?


Regards
Philippe Mouawad
Comment 12 Sebb 2011-09-26 20:29:38 UTC
(In reply to comment #11)
> I am not sure to understand the general issue you are talking about:
> - Is it that interrupt might be called after shutdown ?

No, the issue is that shutdown might be called before interrupt has finished.

But I don't think it's sensible to allow interrupt to be called after shutdown either.
Comment 13 Philippe Mouawad 2011-09-26 21:05:40 UTC
Created attachment 27599 [details]
Fix to JMeterThread

Hello Sebb,
Ok I agree.

Here is my proposition of a patch to ensure interrupt goes to the end before shutdown gets called.

I tested by removing my previous patch and it works.


Regards
Philippe Mouawad
Comment 14 Sebb 2011-09-26 21:36:04 UTC
(In reply to comment #13)
> Created attachment 27599 [details]
...
> Here is my proposition of a patch to ensure interrupt goes to the end before
> shutdown gets called.

Thanks, but it's interrupt() that needs to be protected.

> I tested by removing my previous patch and it works.

Yes, that will work because stop() happens to be called before interrupt() in this case.

However, interrupt() is a public method and could be called from elsewhere without a preceeding stop() call.

No need to redo the patch, I'm testing one now.
Comment 15 Sebb 2011-09-26 22:49:19 UTC
Patched JMeterThread:

URL: http://svn.apache.org/viewvc?rev=1176077&view=rev
Log:
Bug 51888 - Occasional deadlock when stopping a testplan

Modified:
   jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java
   jakarta/jmeter/trunk/xdocs/changes.xml

==

It would be helpful to have a DummySampler test case that fails on 2.5.1 RC2.

I've been unable to generate deadlocks with DummySampler using either 2.5.1 RC2 or the current revision r1176077 (available from nightly builds dir)
Comment 16 stef 2011-09-27 05:07:13 UTC
(In reply to comment #15)
> Patched JMeterThread:
> 
> URL: http://svn.apache.org/viewvc?rev=1176077&view=rev
> Log:
> Bug 51888 - Occasional deadlock when stopping a testplan
> 
> Modified:
>    jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java
>    jakarta/jmeter/trunk/xdocs/changes.xml
> 
> ==
> 
> It would be helpful to have a DummySampler test case that fails on 2.5.1 RC2.
> 
> I've been unable to generate deadlocks with DummySampler using either 2.5.1 RC2
> or the current revision r1176077 (available from nightly builds dir)

Hi,

Sorry I was wrong the problem with the DummySampler is not a deadlock, but rather related to Bug 51880 which I reopened and submitted logs/dumps. I have no deadlock, just threads waiting end of rampup phase before shuting down.

Stef
Comment 17 Philippe Mouawad 2011-09-27 07:09:14 UTC
Hello Sebb,
I tested the fix and for me it introduces an issue:
- I get the popup saying : "One or more test threads won't exit; see log file."


To reproduce issue , I run with 3 threads on 15 s rampup.
I let 2 thread start and call stop at that time, you will see that the stopper thread may stay alive.


Furthermore I think there is something wrong in the fix:
- number of lock/unlock don't match, is this regular ?
- what forbidds stop from taking the lock before interrupt ? for me the latch did it because it was incremented in stop so run thread would be stopped until interrupt released the latch (I agree making that on public method may no be the best) that interrupt would countdown. But here if stop goes faster it will take the lock and interrupt will just wait, or maybe I have missed something.

Regards
Philippe Mouawad
Comment 18 Sebb 2011-09-27 08:11:58 UTC
(In reply to comment #17)
> Hello Sebb,
> I tested the fix and for me it introduces an issue:
> - I get the popup saying : "One or more test threads won't exit; see log file."
> 
> 
> To reproduce issue , I run with 3 threads on 15 s rampup.
> I let 2 thread start and call stop at that time, you will see that the stopper
> thread may stay alive.

I don't get the problem. Are you running against the mirror, or a live site?

Can you attach a thread dump and jmeter log (INFO will do)?

> 
> Furthermore I think there is something wrong in the fix:
> - number of lock/unlock don't match, is this regular ?

The shutdown code takes the lock but does not release it. As the thread then exits, it should not matter, but we could add an unlock at the end.

> - what forbidds stop from taking the lock before interrupt ? for me the latch
> did it because it was incremented in stop so run thread would be stopped until
> interrupt released the latch (I agree making that on public method may no be
> the best) that interrupt would countdown. But here if stop goes faster it will
> take the lock and interrupt will just wait, or maybe I have missed something.

stop does not take the lock; interrupt does. 

It rechecks the currentSampler within the lock to ensure it cannot run after shutdown starts - if interrupt thread ran slowly it could see the value was non-null but the main thread could start shutdown before the interrupt lock started. The double-check prevents interrupt running after shutdown.
Comment 19 Philippe Mouawad 2011-09-27 08:18:48 UTC
Created attachment 27604 [details]
2 Thread Dumps

Hello,
First thread dump was done while interrupt was still going on after I called stop.
Then after some time threads ended but as you can see interrupt still waiting on lock.

Regards
Philippe
Comment 20 Philippe Mouawad 2011-09-27 08:24:00 UTC
I test with TestPlan and mirror server on 8081.
Comment 21 Sebb 2011-09-27 08:49:07 UTC
(In reply to comment #19)
> Created attachment 27604 [details]
> 2 Thread Dumps
> 
> Hello,
> First thread dump was done while interrupt was still going on after I called
> stop.
> Then after some time threads ended but as you can see interrupt still waiting
> on lock.

I see; I'd not considered that.

What happens if you call unlock at the end of the JMeterThread shutdown section?
i.e. at the end of the finally block around lines 308-316.

Does that allow interrupt to finish?

> Regards
> Philippe
Comment 22 Philippe Mouawad 2011-09-27 09:00:58 UTC
Hello Sebb,
I have finally the scenario that shows that fix does not make stop wait for interrupt.

To reproduce it:
1) point to a web facing site (not mirror because Eclipse will go crazy) and configure 3 threads
2) Start JMeter in Debug mode
3) Put a breakpoint at line 310 and 585
4) Start test
5) When number of threads reaches 2, call stop
6) Eclipse will stop you at both 310 and 585 and depending on your choice interrupt will run before stop (OK) or stop will run before interrupt (NOT OK)


I will attach log file, thread dump.
Regards
Philippe Mouawad
Comment 23 Philippe Mouawad 2011-09-27 09:02:31 UTC
Created attachment 27605 [details]
Thread dump showing interrupt holds lock for end of life
Comment 24 Philippe Mouawad 2011-09-27 09:03:34 UTC
Created attachment 27606 [details]
Log file
Comment 25 Sebb 2011-09-27 09:18:04 UTC
Added fix to release the lock:

URL: http://svn.apache.org/viewvc?rev=1176291&view=rev
Log:
Release the interrupt lock to allow late interrupts to complete

Modified:
   jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java


Can you try the test again with this version?

(It's in nightly builds).
Comment 26 Philippe Mouawad 2011-09-27 09:24:52 UTC
This time no lock but shutdown of HttpClient Conn Manager occurs before interrupt so I get:
2011/09/27 11:22:06 WARN  - jmeter.threads.JMeterThread: Caught Exception interrupting sampler: java.lang.NullPointerException


To reproduce, do same scenario as described and wait 3 sec after thread 2 has started before calling stop.
Regards
Philippe
Comment 27 Sebb 2011-09-27 10:16:00 UTC
(In reply to comment #26)
> This time no lock but shutdown of HttpClient Conn Manager occurs before
> interrupt so I get:
> 2011/09/27 11:22:06 WARN  - jmeter.threads.JMeterThread: Caught Exception
> interrupting sampler: java.lang.NullPointerException

What is the actual NPE?

It may be in the console.

> 
> To reproduce, do same scenario as described and wait 3 sec after thread 2 has
> started before calling stop.
> Regards
> Philippe
Comment 28 Philippe Mouawad 2011-09-27 10:28:30 UTC
I added exception as parameter to logger.warn:
2011/09/27 12:26:10 WARN  - jmeter.threads.JMeterThread: Caught Exception interrupting sampler: java.lang.NullPointerException java.lang.NullPointerException
	at org.apache.jmeter.threads.JMeterThread.interrupt(JMeterThread.java:602)
	at org.apache.jmeter.engine.StandardJMeterEngine.tellThreadsToStop(StandardJMeterEngine.java:552)
	at org.apache.jmeter.engine.StandardJMeterEngine.access$2(StandardJMeterEngine.java:547)
	at org.apache.jmeter.engine.StandardJMeterEngine$StopTest.run(StandardJMeterEngine.java:284)
	at java.lang.Thread.run(Thread.java:680)


It's currentSampler that has been nullified by run finally block while interrupt is executing.
Regards
Philippe
Comment 29 Sebb 2011-09-27 10:49:29 UTC
(In reply to comment #28)
> I added exception as parameter to logger.warn:
> 2011/09/27 12:26:10 WARN  - jmeter.threads.JMeterThread: Caught Exception
> interrupting sampler: java.lang.NullPointerException
> java.lang.NullPointerException
>     at org.apache.jmeter.threads.JMeterThread.interrupt(JMeterThread.java:602)
>     at
> org.apache.jmeter.engine.StandardJMeterEngine.tellThreadsToStop(StandardJMeterEngine.java:552)
>     at
> org.apache.jmeter.engine.StandardJMeterEngine.access$2(StandardJMeterEngine.java:547)
>     at
> org.apache.jmeter.engine.StandardJMeterEngine$StopTest.run(StandardJMeterEngine.java:284)
>     at java.lang.Thread.run(Thread.java:680)
> 
> 
> It's currentSampler that has been nullified by run finally block while
> interrupt is executing.

The code fetches the the value again, but fails to check it for null. 
That's a mistake; there should be a null check, which is used to skip the code when shutdown runs before interrup.
No wonder there's an NPE...

Committed another fix; hopefully that's enough now:

URL: http://svn.apache.org/viewvc?rev=1176320&view=rev
Log:
Forgot to check if currentSampler had been set to null, e.g. by shutdown

Modified:
   jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java
Comment 30 Philippe Mouawad 2011-09-27 11:51:30 UTC
Works now, great, thanks.

Although Sebb, we can still have shutdown called before interrupt, I thought you thought it was an issue.

Regards
Philippe
Comment 31 Sebb 2011-09-27 12:01:18 UTC
(In reply to comment #30)
> Works now, great, thanks.
> 
> Although Sebb, we can still have shutdown called before interrupt, I thought
> you thought it was an issue.

The thread sets currentSampler to null before running shutdown.

So if interrupt runs afterwards, it will do nothing.

Or do you see a problem with that?
Comment 32 Philippe Mouawad 2011-09-27 12:19:05 UTC
It means shutdown of Conn Manager can occur while a sample is running (if server takes some time to respond).
This will give an error like this I think:
2011/09/27 13:50:15 INFO  - org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$3: I/O exception (java.net.SocketException) caught when connecting to the target host: Software caused connection abort 
2011/09/27 13:50:15 INFO  - org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$3: Retrying connect 


To reproduce put a Breakpoint in org.apache.http.impl.client.DefaultRequestDirector line 393 just before stopping test, then let run() method end, add breakpoint line 323 of JMeterThread and let JMeterThread end then let DefaultRequestDirector go and you will get Error% > 0.

So it impacts a little the error % but I think it is acceptable.
Philippe
Comment 33 Sebb 2011-09-27 12:44:40 UTC
(In reply to comment #32)
> It means shutdown of Conn Manager can occur while a sample is running (if
> server takes some time to respond).
> This will give an error like this I think:
> 2011/09/27 13:50:15 INFO  -
> org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$3: I/O exception
> (java.net.SocketException) caught when connecting to the target host: Software
> caused connection abort 
> 2011/09/27 13:50:15 INFO  -
> org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$3: Retrying connect 

I think that's a bug in HC4 - it should not retry the connect once the request has been aborted, which is what I think is happening here.

> 
> To reproduce put a Breakpoint in
> org.apache.http.impl.client.DefaultRequestDirector line 393 just before
> stopping test, then let run() method end, add breakpoint line 323 of
> JMeterThread and let JMeterThread end then let DefaultRequestDirector go and
> you will get Error% > 0.
> 
> So it impacts a little the error % but I think it is acceptable.

Yes, I'm pretty sure that the error will go away when the HC4 bug is fixed.

It should be permissible to call abort() then shutdown() without generating a connect error message from HC4.

The hack I added fixes the problem for request retry; I'm hoping I might be able to extend the fix to connect retries.
Comment 34 Philippe Mouawad 2011-09-27 14:20:17 UTC
Created attachment 27609 [details]
Patch to put unlock in a finally

Hello Sebb,
Then it is OK for me.
Just a little note, I would put unlock inside a finally to ensure it is executed.
Attached is the little patch for this.

From my side, you can close issue as resolved.
Regards
Philippe Mouawad
Comment 35 Sebb 2011-09-27 14:41:51 UTC
(In reply to comment #34)
> Created attachment 27609 [details]
> Patch to put unlock in a finally
> 
> Hello Sebb,
> Then it is OK for me.
> Just a little note, I would put unlock inside a finally to ensure it is
> executed.
> Attached is the little patch for this.

URL: http://svn.apache.org/viewvc?rev=1176398&view=rev
Log:
Ensure unlock is called

Modified:
   jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java
Comment 36 Sebb 2011-09-27 14:42:44 UTC
(In reply to comment #33)
> 
> The hack I added fixes the problem for request retry; I'm hoping I might be
> able to extend the fix to connect retries.


URL: http://svn.apache.org/viewvc?rev=1176404&view=rev
Log:
More workrounds for HC4 bug

Modified:
   jakarta/jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/sampler/HTTPHC4Impl.java
Comment 37 Sebb 2011-09-27 14:43:35 UTC
I hope this is now finally resolved.