Steps to produce: 1. Install nghttp (https://github.com/http2/http2-spec/wiki/Tools) 2. Compile the following java code. https://github.com/dapengzhang0/servlettest/blob/master/src/main/java/AsyncReadEmbedded.java (The program runs a simple servlet using Embedded Tomcat server. It creates a file with 271828 bytes, and use nghttp to send a POST request over HTTP/2 to the servlet uploading the data from the file. It will fail if the request timeout or any exception is thrown, otherwise it succeeds.) 3. Write a script to repeatedly run the code until it fails. (My experiments showed it sporadically fails once every less than 100 runs. The failure is read timeout near the end of request data, when there is half close from the client and a window update) Thanks.
Many, many thanks for the test case. It makes fixing bugs so much easier when a reliable test case is provided. I can confirm that I can reproduce the issue locally. Unfortunately, when I enable debug logging on the Tomcat side the issue does not occur. At least that points to some form of timing issue.
Thank you for the confirmation. I'm not able to reproduce it when turning on logging in this simple test case either. I can reproduce it in my own project (which is very complex) though even when turning on logging. I can show you the logs that I saw. Hope that could help a bit. Nov 08, 2018 11:20:32 AM io.grpc.servlet.ServletAdapter$GrpcReadListener onDataAvailable FINEST: [io.grpc.servlet.ServletAdapter-7] inbound data: length = 4,096, bytes = 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...00000000000000000000000000000000 Nov 08, 2018 11:20:32 AM io.grpc.servlet.ServletAdapter$GrpcReadListener onDataAvailable FINEST: [io.grpc.servlet.ServletAdapter-7] inbound data: length = 4,095, bytes = 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...00000000000000000000000000000000 Nov 08, 2018 11:20:32 AM org.apache.tomcat.util.net.SocketWrapperBase populateReadBuffer FINE: Socket: [org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3cc365d2:org.apache.tomcat.util.net.NioChannel@33ef3bbf:java.nio.channels.SocketChannel[connected local=/127.0.0.1:38401 remote=/127.0.0.1:46436]], Read from buffer: [8183] Nov 08, 2018 11:20:32 AM org.apache.coyote.http2.StreamStateMachine stateChange FINE: Connection [0], Stream [3], State changed from [OPEN] to [HALF_CLOSED_REMOTE] Nov 08, 2018 11:20:32 AM org.apache.coyote.http2.Stream$StreamInputBuffer onDataAvailable FINE: Data added to inBuffer when read thread is waiting. Signalling that thread to continue Nov 08, 2018 11:20:32 AM io.grpc.servlet.ServletAdapter$GrpcReadListener onDataAvailable FINEST: [io.grpc.servlet.ServletAdapter-7] thread = Thread[http-nio-auto-1-exec-4,5,FailOnTimeoutGroup] Nov 08, 2018 11:20:32 AM org.apache.coyote.http2.Http2UpgradeHandler upgradeDispatch FINE: Exit, Connection [0], SocketState [UPGRADED] Nov 08, 2018 11:20:32 AM io.grpc.servlet.ServletAdapter$GrpcReadListener onDataAvailable FINEST: [io.grpc.servlet.ServletAdapter-7] EXIT Nov 08, 2018 11:20:32 AM org.apache.coyote.AbstractProcessorLight process FINE: Socket: [org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3cc365d2:org.apache.tomcat.util.net.NioChannel@33ef3bbf:java.nio.channels.SocketChannel[connected local=/127.0.0.1:38401 remote=/127.0.0.1:46436]], Status in: [OPEN_READ], State out: [UPGRADED] Nov 08, 2018 11:20:32 AM org.apache.coyote.AbstractProcessorLight process FINE: Socket: [org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3cc365d2:org.apache.tomcat.util.net.NioChannel@33ef3bbf:java.nio.channels.SocketChannel[connected local=/127.0.0.1:38401 remote=/127.0.0.1:46436]], Status in: [OPEN_READ], State out: [LONG] Nov 08, 2018 11:20:32 AM org.apache.coyote.AbstractProcessorLight process FINE: Socket: [org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3cc365d2:org.apache.tomcat.util.net.NioChannel@33ef3bbf:java.nio.channels.SocketChannel[connected local=/127.0.0.1:38401 remote=/127.0.0.1:46436]], State after async post processing: [LONG] Nov 08, 2018 11:20:33 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704833217; nextExpiration=1541704833216; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:33 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704833258; nextExpiration=1541704832769; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:34 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704834219; nextExpiration=1541704834217; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:34 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704834260; nextExpiration=1541704834258; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:35 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704835220; nextExpiration=1541704835219; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:35 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704835261; nextExpiration=1541704835260; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:36 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704836221; nextExpiration=1541704836220; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:36 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704836263; nextExpiration=1541704836261; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:37 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704837223; nextExpiration=1541704837221; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:37 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704837264; nextExpiration=1541704837263; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:38 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704838224; nextExpiration=1541704838223; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:38 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704838265; nextExpiration=1541704838264; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:39 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704839225; nextExpiration=1541704839224; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:39 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704839267; nextExpiration=1541704839265; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:40 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704840227; nextExpiration=1541704840225; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:40 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704840268; nextExpiration=1541704840267; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:41 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704841228; nextExpiration=1541704841227; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:41 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704841269; nextExpiration=1541704841268; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:42 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704842229; nextExpiration=1541704842228; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:42 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704842271; nextExpiration=1541704842269; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:43 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704843231; nextExpiration=1541704843229; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:43 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704843272; nextExpiration=1541704843271; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:44 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704844232; nextExpiration=1541704844231; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:44 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704844273; nextExpiration=1541704844272; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:45 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704845233; nextExpiration=1541704845232; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:45 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=1; now=1541704845275; nextExpiration=1541704845273; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:46 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1541704846234; nextExpiration=1541704846233; keyCount=0; hasEvents=false; eval=false Nov 08, 2018 11:20:46 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
I believe I have found the root cause and I have applied a fix to 9.0.x. If you are able to build from source and confirm that fix that would be very helpful. Alternatively, a snapshot build can be provided for you to test with on request.
I tested with your patch (commit afdadae11b6876f3df25d3105230556b3ab27ab7) locally with my own project. Both on Linux and Mac. I didn't see such timeout failure any more. Thank you for the fix! However, on Mac, unlike testing with tomcat-embed-core-9.0.12 or tomcat-embed-core-9.0.13, testing with the patch I have to add an extra runtime dependency for the artifact https://search.maven.org/search?q=a:javax.annotation-api otherwise, my test will see ClassNotFoundException: Exception in thread "main" org.apache.catalina.LifecycleException: A child container failed during start at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:928) at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:261) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.StandardService.startInternal(StandardService.java:523) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:801) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.startup.Tomcat.start(Tomcat.java:394) at AsyncReadEmbedded.main(AsyncReadEmbedded.java:76) ...... Caused by: java.lang.ClassNotFoundException: javax.annotation.Resource at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:185) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
Thanks for the confirmation. I'll back-port the fix to 8.5.x. You'll see that CNFE if you run on later versions of Java that either don't provide those classes by default or don't provide them at all.
Fixed in: - trunk for 9.0.14 onwards - 8.5.x for 8.5.36 onwards