Bug 55827 - Iteration over a view on a synchronized collection without obtaining a lock on the collection
Summary: Iteration over a view on a synchronized collection without obtaining a lock o...
Status: REOPENED
Alias: None
Product: JMeter
Classification: Unclassified
Component: Main (show other bugs)
Version: 5.0
Hardware: PC Linux
: P3 major (vote)
Target Milestone: JMETER_5.0
Assignee: JMeter issues mailing list
URL:
Keywords: APIBug, FAQ, NeedsReleaseNote
Depends on:
Blocks:
 
Reported: 2013-11-28 16:44 UTC by Bob Atkey
Modified: 2022-03-08 21:36 UTC (History)
2 users (show)



Attachments
Jmeter Test Error When Itr next and remove (530.54 KB, patch)
2022-03-07 12:07 UTC, TaoLu
Details | Diff
Slow down recoverRunningVersion to show problems (2.79 KB, patch)
2022-03-08 20:29 UTC, Felix Schumacher
Details | Diff
Slow down recoverRunningVersion to show problems (3.06 KB, patch)
2022-03-08 21:01 UTC, Felix Schumacher
Details | Diff
sl (3.19 KB, patch)
2022-03-08 21:17 UTC, Felix Schumacher
Details | Diff
Slow down recoverRunningVersion to show problems (3.19 KB, patch)
2022-03-08 21:24 UTC, Felix Schumacher
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bob Atkey 2013-11-28 16:44:13 UTC
We ran our static analysis tool ThreadSafe [1] on version 2.10 of JMeter, which appeared to uncover a couple of concurrency issues. One of the most interesting was the possibility of an unsynchronised iteration over a synchronised collection in the class 'AbstractTestElement'.

The relevant line is 499:
   Iterator<Map.Entry<String, JMeterProperty>>  iter = propMap.entrySet().iterator();

where the propMap field always contains a synchronised collection created at line 57 in the same file.

The JDK documentation for Collections.synchronizedMap states that:

>  It is imperative that the user manually synchronize on the
> returned map when iterating over any of its collection views:
>
>  Map m = Collections.synchronizedMap(new HashMap());
>      ...
>  Set s = m.keySet();  // Needn't be in synchronized block
>      ...
>  synchronized (m) {  // Synchronizing on m, not s!
>      Iterator i = s.iterator(); // Must be in synchronized block
>      while (i.hasNext())
>          foo(i.next());
>  }
>
> Failure to follow this advice may result in non-deterministic behavior. 

It appears that the code on line 499 does not correctly synchronize on propMap, leading to the possibility of the non-deterministic behaviour the JDK documentation warns about.

We're not sure that this can actually result in a user-visible bug, but we thought you'd like to know.

We are also planning to use this finding as an example of Android-related concurrency mistakes in an article about ThreadSafe. Obviously, if you, as the developers of JMeter, have any objections to our using this as an example, then we won't.

[1] ThreadSafe is a static analysis tool for Java concurrency, developed by Contemplate Ltd.:
     http://www.contemplateltd.com/
Comment 1 Bob Atkey 2013-11-28 16:46:22 UTC
I wrote:

> We are also planning to use this finding as an example of Android-related 
> concurrency mistakes in an article about ThreadSafe.

Obviously, "Android-related" was a mistake here. I meant to write "Java-related". Sorry about that.
Comment 2 Philippe Mouawad 2013-12-06 21:27:01 UTC
Thanks for report, I was wondering why your tool does not also report line 456:
return new PropertyIteratorImpl(propMap.values());


Did you report all findings of your tool or only some examples ?
Comment 3 Bob Atkey 2013-12-11 20:12:18 UTC
> Thanks for report, I was wondering why your tool does not also report line 456:
> return new PropertyIteratorImpl(propMap.values());

Actually, our tool doesn't report the instance on line 456 because it doesn't look into the implementation of PropertyIteratorImpl while analysing the AbstractTestElement class. Therefore, it doesn't realise that everything that is passed in will have .iterator() called on it. This is certainly something that we will look at adding to ThreadSafe in the future. Thanks!

> Did you report all findings of your tool or only some examples ?

I only reported two examples of findings that I thought looked interesting. If I get more time soon I will report the rest of the interesting looking reports that ThreadSafe gives.
Comment 4 Graham Russell 2017-01-29 10:14:01 UTC
Created PR for the example identified in this issue: https://github.com/apache/jmeter/pull/260

Are there any more that the tool found (in the latest code) which should be addressed?
Comment 5 Vladimir Sitnikov 2017-02-07 22:04:45 UTC
As far as I can see, the syncrhonizedMap was added in attempt to fix https://bz.apache.org/bugzilla/show_bug.cgi?id=16304
The latest reply suggests that the synchronization did not help.

So I do not see much reason to complicate code.
I don't think it would hurt, yet I see no reason how it would help.
Comment 6 Philippe Mouawad 2017-02-07 22:29:19 UTC
(In reply to Vladimir Sitnikov from comment #5)
> As far as I can see, the syncrhonizedMap was added in attempt to fix
> https://bz.apache.org/bugzilla/show_bug.cgi?id=16304
> The latest reply suggests that the synchronization did not help.
> 
> So I do not see much reason to complicate code.
> I don't think it would hurt, yet I see no reason how it would help.

Thanks Felix for digging, interesting indeed.

A question for you, I have a doubt. In my understanding, calling synchronized in monothreaded call has a cost:
http://stackoverflow.com/questions/973518/are-synchronized-methods-slower-in-single-threaded-applications

Do you confirm this from benchmarks you did ?
Comment 7 Graham Russell 2017-02-24 22:23:41 UTC
(In reply to Vladimir Sitnikov from comment #5)
> As far as I can see, the syncrhonizedMap was added in attempt to fix
> https://bz.apache.org/bugzilla/show_bug.cgi?id=16304
> The latest reply suggests that the synchronization did not help.
> 
> So I do not see much reason to complicate code.
> I don't think it would hurt, yet I see no reason how it would help.

It does say the problem still exists in the comment yet the issue was set to resolved.

Maybe we should remove the synchronizedMap as we are accessing it in an unsafe way anyway - or just comment the code and close this bz.
Comment 8 TaoLu 2022-03-07 12:07:29 UTC
Created attachment 38216 [details]
Jmeter Test Error When Itr next and remove

    public void recoverRunningVersion() {
        Iterator iter = this.propMap.entrySet().iterator();

        while(iter.hasNext()) {
            Entry<String, JMeterProperty> entry = (Entry)iter.next();
            JMeterProperty prop = (JMeterProperty)entry.getValue();
            if (this.isTemporary(prop)) {
                iter.remove();
                this.clearTemporary(prop);
            } else {
                prop.recoverRunningVersion(this);
            }
        }

        this.emptyTemporary();
    }
Comment 9 TaoLu 2022-03-07 12:19:12 UTC
For the 10 slave Linux machines used in the test environment, each 10 thread is 100 concurrent, and most of them report an error.ConcurrentModificationException, The sampler is a downloaded soap sampler. Most of the machines report this error. The  script includes two CSV config and some listeners. I don't know whether the component is used incorrectly or where there is a problem. I haven't found the problem for a day, Some people say that problem is 'recycle on EOF' set by CSV config is false. Is it the problem caused by soap sampler, but there is no error when running locally in windows os
Comment 10 Vladimir Sitnikov 2022-03-07 14:01:10 UTC
@TaoLu, could you please share the script that reproduces the issue?
Comment 11 Felix Schumacher 2022-03-08 20:29:42 UTC
Created attachment 38218 [details]
Slow down recoverRunningVersion to show problems

I think we have to use a lock around the clean-up of the temporary properties like Graham did.

With the attached patch I can reproduce an overlap in the recoverRunningVersion method between more than one thread.

The exceptions (with are for example:
2022-03-08 21:23:14,089 WARN o.a.j.t.AbstractTestElement: Recover in different threads: [Thread[Thread-33,6,main], Thread[Thread-19,6,main]]; Element: org.apache.jmeter.sampler.DebugSampler@7ea357f0
java.lang.RuntimeException: Thread mismatch
>-------at org.apache.jmeter.testelement.AbstractTestElement.recoverRunningVersion(AbstractTestElement.java:511) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.SamplePackage.recoverRunningVersion(SamplePackage.java:126) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.TestCompiler.done(TestCompiler.java:123) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:593) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:501) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:268) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_201]
2022-03-08 21:23:14,579 WARN o.a.j.t.AbstractTestElement: Recover in different threads: [Thread[Thread-34,6,main], Thread[Thread-20,6,main]]; Element: org.apache.jmeter.sampler.DebugSampler@521c345a
java.lang.RuntimeException: Thread mismatch
>-------at org.apache.jmeter.testelement.AbstractTestElement.recoverRunningVersion(AbstractTestElement.java:511) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.SamplePackage.recoverRunningVersion(SamplePackage.java:126) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.TestCompiler.done(TestCompiler.java:123) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:593) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:501) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:268) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_201]
2022-03-08 21:23:14,590 WARN o.a.j.t.AbstractTestElement: Recover in different threads: [Thread[Thread-35,6,main], Thread[Thread-30,6,main], Thread[Thread-21,6,main], Thread[Thread-16,6,main]]; Element: org.apache.jmeter.control.LoopController@4b4d66fb
java.lang.RuntimeException: Thread mismatch
>-------at org.apache.jmeter.testelement.AbstractTestElement.recoverRunningVersion(AbstractTestElement.java:511) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.control.LoopController.reInitialize(LoopController.java:203) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.control.LoopController.nextIsNull(LoopController.java:157) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.control.GenericController.next(GenericController.java:170) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.control.LoopController.next(LoopController.java:134) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.AbstractThreadGroup.next(AbstractThreadGroup.java:99) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:303) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_201]
Comment 12 Felix Schumacher 2022-03-08 20:43:12 UTC
Forget my last comment for now, as I forgot to remove the thread from the stack.
Comment 13 Felix Schumacher 2022-03-08 21:01:19 UTC
Created attachment 38219 [details]
Slow down recoverRunningVersion to show problems

Now a working version of the "slow" recoverRunningVersion method.

When the threads are removed at the end (in my simple test plan) one Test Element is getting flagged, only.

2022-03-08 21:55:40,771 WARN o.a.j.t.AbstractTestElement: Recover in different threads: [Thread[Thread-53,6,main], Thread[Thread-54,6,main]]; Element: org.apache.jmeter.reporters.ResultCollector@42f6fa63
java.lang.RuntimeException: Thread mismatch
>-------at org.apache.jmeter.testelement.AbstractTestElement.recoverRunningVersion(AbstractTestElement.java:513) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.SamplePackage.recoverRunningVersion(SamplePackage.java:110) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.SamplePackage.recoverRunningVersion(SamplePackage.java:120) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.TestCompiler.done(TestCompiler.java:123) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:593) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:501) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:268) ~[ApacheJMeter_core.jar:5.5-SNAPSHOT]
>-------at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_201]

The code for ResultCollector explicitly mentions, that it has to be thread-safe (which recoverRunningVersion is not).

So maybe, we can make it for ResultCollector thread-safe instead of all? But on the other hand, if we obtain a lock for a single thread, it is not that expensive any more. So I think we could try to add it.
Comment 14 Felix Schumacher 2022-03-08 21:17:38 UTC
Created attachment 38220 [details]
sl
Comment 15 Felix Schumacher 2022-03-08 21:24:30 UTC
Created attachment 38221 [details]
Slow down recoverRunningVersion to show problems

Adding more debug output (size of the temporary properties), it does look like ResultCollector has no temporary properties and should not be the culprit.

So, the question is, which Element is it and where does the concurrent access comes from?

@TaoLu, are you using a third party plugin? What elements do you use apart from CSV Data Source? How big are your CSV sources, how many elements do they have?
Comment 16 Felix Schumacher 2022-03-08 21:36:07 UTC
For comparison I did a simple test plan (dummy sampler + csv data source) with 2000 threads. First time with additional locking and second time without. The numbers are not that different:

(with locking)
$ ./bin/jmeter -t LAST -n
Creating summariser <summary>
Created the tree successfully using /home/felix/jmeter-tests/csv-thread-mismatch.jmx
Starting standalone test @ Tue Mar 08 22:29:44 CET 2022 (1646774984565)
Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
summary + 889238 in 00:00:15 = 58664.6/s Avg:     0 Min:     0 Max:    26 Err:     0 (0.00%) Active: 2000 Started: 2000 Finished: 0
summary + 2046430 in 00:00:30 = 68214.3/s Avg:     0 Min:     0 Max:    21 Err:     0 (0.00%) Active: 2000 Started: 2000 Finished: 0
summary = 2935668 in 00:00:45 = 65008.8/s Avg:     0 Min:     0 Max:    26 Err:     0 (0.00%)
summary + 2015928 in 00:00:30 = 67197.6/s Avg:     0 Min:     0 Max:    20 Err:     0 (0.00%) Active: 2000 Started: 2000 Finished: 0
summary = 4951596 in 00:01:15 = 65882.5/s Avg:     0 Min:     0 Max:    26 Err:     0 (0.00%)
summary + 2011866 in 00:00:30 = 67062.2/s Avg:     0 Min:     0 Max:    18 Err:     0 (0.00%) Active: 2000 Started: 2000 Finished: 0
summary = 6963462 in 00:01:45 = 66219.0/s Avg:     0 Min:     0 Max:    26 Err:     0 (0.00%)
summary + 1128224 in 00:00:17 = 65874.0/s Avg:     0 Min:     0 Max:    27 Err:     0 (0.00%) Active: 0 Started: 2000 Finished: 2000
summary = 8091686 in 00:02:02 = 66170.7/s Avg:     0 Min:     0 Max:    27 Err:     0 (0.00%)
Tidying up ...    @ Tue Mar 08 22:31:47 CET 2022 (1646775107127)
... end of run

(without locking):
$ ./bin/jmeter -t LAST -n
Creating summariser <summary>
Created the tree successfully using /home/felix/jmeter-tests/csv-thread-mismatch.jmx
Starting standalone test @ Tue Mar 08 22:32:12 CET 2022 (1646775132163)
Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
summary + 1197546 in 00:00:18 = 68232.4/s Avg:     0 Min:     0 Max:    27 Err:     0 (0.00%) Active: 2000 Started: 2000 Finished: 0
summary + 2029381 in 00:00:30 = 67670.8/s Avg:     0 Min:     0 Max:    26 Err:     0 (0.00%) Active: 2000 Started: 2000 Finished: 0
summary = 3226927 in 00:00:48 = 67878.1/s Avg:     0 Min:     0 Max:    27 Err:     0 (0.00%)
summary + 1888713 in 00:00:30 = 62957.1/s Avg:     0 Min:     0 Max:    23 Err:     0 (0.00%) Active: 2000 Started: 2000 Finished: 0
summary = 5115640 in 00:01:18 = 65974.2/s Avg:     0 Min:     0 Max:    27 Err:     0 (0.00%)
summary + 1769647 in 00:00:30 = 58988.2/s Avg:     0 Min:     0 Max:    28 Err:     0 (0.00%) Active: 2000 Started: 2000 Finished: 0
summary = 6885287 in 00:01:48 = 64025.4/s Avg:     0 Min:     0 Max:    28 Err:     0 (0.00%)
summary + 977769 in 00:00:14 = 67736.0/s Avg:     0 Min:     0 Max:    16 Err:     0 (0.00%) Active: 0 Started: 2000 Finished: 2000
summary = 7863056 in 00:02:02 = 64464.5/s Avg:     0 Min:     0 Max:    28 Err:     0 (0.00%)
Tidying up ...    @ Tue Mar 08 22:34:14 CET 2022 (1646775254435)
... end of run