Created attachment 37447 [details] Fix of the deadlock We experienced a (reproducable) deadlock in https://github.com/apache/ant/blob/7e6cc33e04ecf6fae6f51d24f67c21bd321d13ee/src/main/org/apache/tools/ant/taskdefs/optional/junitlauncher/LauncherSupport.java#L406 when the IOException has been triggered in https://github.com/apache/ant/blob/7e6cc33e04ecf6fae6f51d24f67c21bd321d13ee/src/main/org/apache/tools/ant/taskdefs/optional/junitlauncher/LauncherSupport.java#L513 and a test writes something to System.err This happens because PipedOutputStream & PipedInputStream has been used in a single thread which is not recommended (see https://stackoverflow.com/a/9171708/2523032). I've attached a patch that fixes the problem and will attach a Testcase to reproduce it. This is how we can reproduce the deadlock on every run: - Add "throw new IOException("TEST");" after line https://github.com/apache/ant/blob/7e6cc33e04ecf6fae6f51d24f67c21bd321d13ee/src/main/org/apache/tools/ant/taskdefs/optional/junitlauncher/LauncherSupport.java#L511 (I did not find any other solution to trigger a IOException reproducable) - Run my attached TestCase with the attached Ant script (I've created a zip-file with all the libs, too) - The ant process should now hang in the deadlock - After applying the patch the deadlock should be gone Here is the threaddump of the problem: "junitlauncher-syserr-stream-reader" #6070 daemon prio=5 os_prio=0 tid=0x000000005739e800 nid=0x1778 in Object.wait() [0x000000005eabe000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:273) at java.io.PipedInputStream.receive(PipedInputStream.java:231) - locked <0x00000000c749a770> (a java.io.PipedInputStream) at java.io.PipedOutputStream.write(PipedOutputStream.java:149) at java.io.PrintStream.write(PrintStream.java:480) - locked <0x00000000c74c1fe0> (a java.io.PrintStream) at java.io.PrintStream.write(PrintStream.java:480) - locked <0x00000000c74c1fc0> (a java.io.PrintStream) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) - locked <0x00000000c74c2140> (a java.io.OutputStreamWriter) at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.write(PrintStream.java:527) - locked <0x00000000c74c1fc0> (a java.io.PrintStream) at java.io.PrintStream.print(PrintStream.java:669) at java.io.PrintStream.println(PrintStream.java:823) - locked <0x00000000c74c1fc0> (a java.io.PrintStream) at java.lang.Throwable$WrappedPrintStream.println(Throwable.java:748) at java.lang.Throwable.printStackTrace(Throwable.java:658) - locked <0x00000000c74c1fc0> (a java.io.PrintStream) at java.lang.Throwable.printStackTrace(Throwable.java:643) at java.lang.Throwable.printStackTrace(Throwable.java:634) at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport.log(LauncherSupport.java:406) at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport.access$0(LauncherSupport.java:396) at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport$SysOutErrStreamReader.run(LauncherSupport.java:514) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - None "main" #1 prio=5 os_prio=0 tid=0x000000000210a800 nid=0x25fc waiting for monitor entry [0x000000000231c000] java.lang.Thread.State: BLOCKED (on object monitor) at java.io.PrintStream.println(PrintStream.java:805) - waiting to lock <0x00000000c74c1fc0> (a java.io.PrintStream) at de.guhsoft.eDHR.regressionTest.tools.PrepareDatabase.removeWorkplaceConfiguration(PrepareDatabase.java:499) at de.guhsoft.eDHR.regressionTest.production.AbstractProductionTest.resetWorkplaceConfiguration(AbstractProductionTest.java:816) at de.guhsoft.eDHR.regressionTest.production.packaging.PackagingTest.testSuccessWithBoxSizes(PackagingTest.java:2656) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:686) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor$$Lambda$186/555826066.apply(Unknown Source) at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall$$Lambda$187/1386767190.apply(Unknown Source) at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) at org.junit.jupiter.engine.execution.ExecutableInvoker$$Lambda$248/815992954.apply(Unknown Source) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:205) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor$$Lambda$304/2024542466.execute(Unknown Source) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:201) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:135) at org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$112/1286783232.execute(Unknown Source) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125) at org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$111/66233253.invoke(Unknown Source) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123) at org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$110/1393931310.execute(Unknown Source) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80) at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService$$Lambda$202/2142003995.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139) at org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$112/1286783232.execute(Unknown Source) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125) at org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$111/66233253.invoke(Unknown Source) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123) at org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$110/1393931310.execute(Unknown Source) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80) at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService$$Lambda$202/2142003995.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139) at org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$112/1286783232.execute(Unknown Source) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125) at org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$111/66233253.invoke(Unknown Source) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123) at org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$110/1393931310.execute(Unknown Source) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80) at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32) at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57) at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51) at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:248) at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$5(DefaultLauncher.java:211) at org.junit.platform.launcher.core.DefaultLauncher$$Lambda$76/1205044462.accept(Unknown Source) at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:226) at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:199) at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:132) at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport.launch(LauncherSupport.java:141) at org.apache.tools.ant.taskdefs.optional.junitlauncher.StandaloneLauncher.main(StandaloneLauncher.java:114) Locked ownable synchronizers: - None
Created attachment 37448 [details] Test Case to trigger the deadlock (read explanation)
Created attachment 37449 [details] example ant script
Created attachment 37450 [details] Archive with everything included to reproduce the bug (read explanation)
One more thing. This is the exception i get after patching ant: Failed while streaming syserr data java.io.IOException: Pipe broken at java.io.PipedInputStream.read(PipedInputStream.java:321) at java.io.PipedInputStream.read(PipedInputStream.java:377) at java.io.InputStream.read(InputStream.java:101) at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport$SysOutErrStreamReader.run(LauncherSupport.java:539) at java.lang.Thread.run(Thread.java:748)
Whether the deadlock occurs or not might depend on the JVM. I've been unable to reproduce it with Ant 1.10.9 on Linux (tried OpenJDK 8 and 14). This doesn't mean I doubt you've identified a genuine bug.
We are using Adoptopenjdk 8 & 11 to run around 1.600 tests per run with mutliple runs on vm's overnight. Every night i get around a dozen of these "deadlock exceptions" (that happen after the deadlock fix). Might depend on JVM or computer power, but the code can only lead to a deadlock when an IOException is thrown because PipedOutputStream & PipedInputStream are used in a single thread,
yes, I see that. An alternative approach that occurred to me would be to capture the current sysout/err Readers in instance fields (likely final; set inside the constructor) and make log switch streams (if necessary) explicitly.
I tried a few things (including what you suggested about explicitly throwing an IOException just for the sake of reproducing this issue), but I haven't been able to reproduce this deadlock. But like Stefan says, it doesn't mean that this code has an issue. So I went ahead and added a potential fix to this in upstream master branch. The commit is https://github.com/apache/ant/commit/71c230dd93ab98df16c61cc949fd8ffe3c8cbaa2. Given that I haven't been able to reproduce this, I won't be able to verify if this indeed works. We publish an Ant nightly build here https://ci-builds.apache.org/job/Ant/job/Ant_Nightly/lastSuccessfulBuild/artifact/. An hour or two from now, it should have the nightly distribution which contains this fix. It would be great if you could give tihs a try on your setup where this seems to be happening regularly. If you do give it a try there, please do report back with your observations. For now, I'm marking this issue as resolved. If however, your tests show that this hasn't fixed the issue, please feel free to reopen.
> But like Stefan says, it doesn't mean that this code has an issue. I can't find the edit button and reading that sentence again, I'm not sure I was clear on what I meant. So rephrasing - it's clear that the code in question has a bug although we haven't been able to trigger that deadlock.
Thank you. I'll give it a try and will report back if that solves the deadlock
I can confirm the dealock is fixed. I got the following output per "previous deadlock": [junitlauncher] Failed in syserr streaming [junitlauncher] java.io.UncheckedIOException: java.io.IOException: Pipe broken [junitlauncher] at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport$SysOutErrStreamReader.run(LauncherSupport.java:531) [junitlauncher] at java.base/java.lang.Thread.run(Thread.java:836) [junitlauncher] Caused by: java.io.IOException: Pipe broken [junitlauncher] at java.base/java.io.PipedInputStream.read(PipedInputStream.java:321) [junitlauncher] at java.base/java.io.PipedInputStream.read(PipedInputStream.java:377) [junitlauncher] at java.base/java.io.InputStream.read(InputStream.java:205) [junitlauncher] at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport$SysOutErrStreamReader.run(LauncherSupport.java:525) [junitlauncher] ... 1 more
Thank you for testing this and posting the results. It's good to know that the deadlock no longer exists. So this bug is actually fixed. As for the underlying IOException which was triggering this deadlock, which is now reported back to the System.err: [junitlauncher] Caused by: java.io.IOException: Pipe broken [junitlauncher] at java.base/java.io.PipedInputStream.read(PipedInputStream.java:321) [junitlauncher] at java.base/java.io.PipedInputStream.read(PipedInputStream.java:377) [junitlauncher] at java.base/java.io.InputStream.read(InputStream.java:205) [junitlauncher] at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport$SysOutErrStreamReader.run(LauncherSupport.java:525) [junitlauncher] ... 1 more I am not sure why that would be the case. Is that impacting any of the test runs? I will pay a closer attention to the code in this task to see if there's a possibility which could trigger this error and if so whether it's harmless enough to be ignored.
> I will pay a closer attention to the code in this task to see if there's a possibility which could trigger this error I think I found the cause of this IOException. I have pushed a commit which I think should fix this IOException https://github.com/apache/ant/commit/04be9be9ab98776e2faf5cad7f2a3823a9f723b2. As usual, this will be available in our nightly build in the next hour or so (the build with this change has already started). If you get a chance, please do test this fix too and let us know how it goes. Thank you for your help and patience so far.
(In reply to Jaikiran Pai from comment #13) > I think I found the cause of this IOException. I have pushed a commit which > I think should fix this IOException > https://github.com/apache/ant/commit/ > 04be9be9ab98776e2faf5cad7f2a3823a9f723b2. As usual, this will be available > in our nightly build in the next hour or so (the build with this change has > already started). If you get a chance, please do test this fix too and let > us know how it goes. Thank you for your help and patience so far. Thank you. I will do a test soon and report back
The IOException is not fixed with your last commit. Still got [junitlauncher] Failed in syserr streaming [junitlauncher] java.io.UncheckedIOException: java.io.IOException: Pipe broken [junitlauncher] at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport$SysOutErrStreamReader.run(LauncherSupport.java:555) [junitlauncher] at java.base/java.lang.Thread.run(Thread.java:836) [junitlauncher] Caused by: java.io.IOException: Pipe broken [junitlauncher] at java.base/java.io.PipedInputStream.read(PipedInputStream.java:321) [junitlauncher] at java.base/java.io.PipedInputStream.read(PipedInputStream.java:377) [junitlauncher] at java.base/java.io.InputStream.read(InputStream.java:205) [junitlauncher] at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport$SysOutErrStreamReader.run(LauncherSupport.java:549) [junitlauncher] ... 1 more with the newest nightly build (#42)
Thank you for testing it. It's interesting that it still runs into this error. I'll spend some time on this later. Is this exception affecting the test runs in any way in your setup? That will let me decide how critical this issue is.
(In reply to Jaikiran Pai from comment #16) > Thank you for testing it. It's interesting that it still runs into this > error. I'll spend some time on this later. Is this exception affecting the > test runs in any way in your setup? That will let me decide how critical > this issue is. After the deadlock fix it does not affect our tests (except we have this stack traces in the log, but that's not the biggest problem). Would a simple testcase where i can reproduce the "IOException: Pipe broken" help you?
> After the deadlock fix it does not affect our tests (except we have this stack traces in the log, but that's not the biggest problem). That's good to hear. That way we can proceed with the Ant release, if necessary, even if this IOException issue isn't solved in the meantime. > Would a simple testcase where i can reproduce the "IOException: Pipe broken" help you? Yes that will definitely help. Although, looking at the JDK code which deals with the PipedInputStream/PipedOutputStream, I think I see what the issue could be - the way the implementation of those classes decides about the "writerThread" can lead to issues like these where the tests could be run in different threads and each one of them writing to System.out/System.err although not concurrently. I'm not too sure I can fix this soon, but at least having a testcase which reproduces this will help us.
>[junitlauncher] Caused by: java.io.IOException: Pipe broken >[junitlauncher] at java.base/java.io.PipedInputStream.read(PipedInputStream.java:321) >[junitlauncher] at java.base/java.io.PipedInputStream.read(PipedInputStream.java:377) >[junitlauncher] at java.base/java.io.InputStream.read(InputStream.java:205) >[junitlauncher] at org.apache.tools.ant.taskdefs.optional.junitlauncher.LauncherSupport$SysOutErrStreamReader.run(LauncherSupport.java:549) >[junitlauncher] ... 1 more Hello Michael, I've pushed a commit to upstream Ant repo yesterday which I think should help prevent these IOExceptions that you are seeing. If you get a chance please test our latest Ant nightly build https://ci-builds.apache.org/job/Ant/job/Ant_Nightly/lastSuccessfulBuild/artifact/ and let us know how it goes.
Hi Jaikiran, seems like the "IOException: Pipe broken" is gone after your last commit. Thanks for your work.
That's really good to hear. Thank you for testing this.