Bug 62899 - Async servlet over HTTP/2 on embedded server sporadically timeout forever when reading near end of request
Summary: Async servlet over HTTP/2 on embedded server sporadically timeout forever whe...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Servlet (show other bugs)
Version: 9.0.12
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-09 18:43 UTC by Dapeng Zhang
Modified: 2018-11-14 14:52 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dapeng Zhang 2018-11-09 18:43:07 UTC
Steps to produce:

1. Install nghttp (https://github.com/http2/http2-spec/wiki/Tools)

2. Compile the following java code. 

https://github.com/dapengzhang0/servlettest/blob/master/src/main/java/AsyncReadEmbedded.java

(The program runs a simple servlet using Embedded Tomcat server. It creates a file with 271828 bytes, and use nghttp to send a POST request over HTTP/2 to the servlet uploading the data from the file. It will fail if the request timeout or any exception is thrown, otherwise it succeeds.)

3. Write a script to repeatedly run the code until it fails. (My experiments showed it sporadically fails once every less than 100 runs. The failure is read timeout near the end of request data, when there is half close from the client and a window update)

Thanks.
Comment 1 Mark Thomas 2018-11-12 15:28:56 UTC
Many, many thanks for the test case. It makes fixing bugs so much easier when a reliable test case is provided.

I can confirm that I can reproduce the issue locally. Unfortunately, when I enable debug logging on the Tomcat side the issue does not occur. At least that points to some form of timing issue.
Comment 2 Dapeng Zhang 2018-11-12 18:43:26 UTC
Thank you for the confirmation. I'm not able to reproduce it when turning on logging in this simple test case either. I can reproduce it in my own project (which is very complex) though even when turning on logging. I can show you the logs that I saw. Hope that could help a bit.

Nov 08, 2018 11:20:32 AM io.grpc.servlet.ServletAdapter$GrpcReadListener onDataAvailable
FINEST: [io.grpc.servlet.ServletAdapter-7] inbound data: length = 4,096, bytes = 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...00000000000000000000000000000000
Nov 08, 2018 11:20:32 AM io.grpc.servlet.ServletAdapter$GrpcReadListener onDataAvailable
FINEST: [io.grpc.servlet.ServletAdapter-7] inbound data: length = 4,095, bytes = 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...00000000000000000000000000000000
Nov 08, 2018 11:20:32 AM org.apache.tomcat.util.net.SocketWrapperBase populateReadBuffer
FINE: Socket: [org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3cc365d2:org.apache.tomcat.util.net.NioChannel@33ef3bbf:java.nio.channels.SocketChannel[connected local=/127.0.0.1:38401 remote=/127.0.0.1:46436]], Read from buffer: [8183]
Nov 08, 2018 11:20:32 AM org.apache.coyote.http2.StreamStateMachine stateChange
FINE: Connection [0], Stream [3], State changed from [OPEN] to [HALF_CLOSED_REMOTE]
Nov 08, 2018 11:20:32 AM org.apache.coyote.http2.Stream$StreamInputBuffer onDataAvailable
FINE: Data added to inBuffer when read thread is waiting. Signalling that thread to continue
Nov 08, 2018 11:20:32 AM io.grpc.servlet.ServletAdapter$GrpcReadListener onDataAvailable
FINEST: [io.grpc.servlet.ServletAdapter-7] thread = Thread[http-nio-auto-1-exec-4,5,FailOnTimeoutGroup]
Nov 08, 2018 11:20:32 AM org.apache.coyote.http2.Http2UpgradeHandler upgradeDispatch
FINE: Exit, Connection [0], SocketState [UPGRADED]
Nov 08, 2018 11:20:32 AM io.grpc.servlet.ServletAdapter$GrpcReadListener onDataAvailable
FINEST: [io.grpc.servlet.ServletAdapter-7] EXIT
Nov 08, 2018 11:20:32 AM org.apache.coyote.AbstractProcessorLight process
FINE: Socket: [org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3cc365d2:org.apache.tomcat.util.net.NioChannel@33ef3bbf:java.nio.channels.SocketChannel[connected local=/127.0.0.1:38401 remote=/127.0.0.1:46436]], Status in: [OPEN_READ], State out: [UPGRADED]
Nov 08, 2018 11:20:32 AM org.apache.coyote.AbstractProcessorLight process
FINE: Socket: [org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3cc365d2:org.apache.tomcat.util.net.NioChannel@33ef3bbf:java.nio.channels.SocketChannel[connected local=/127.0.0.1:38401 remote=/127.0.0.1:46436]], Status in: [OPEN_READ], State out: [LONG]
Nov 08, 2018 11:20:32 AM org.apache.coyote.AbstractProcessorLight process
FINE: Socket: [org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3cc365d2:org.apache.tomcat.util.net.NioChannel@33ef3bbf:java.nio.channels.SocketChannel[connected local=/127.0.0.1:38401 remote=/127.0.0.1:46436]], State after async post processing: [LONG]
Nov 08, 2018 11:20:33 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704833217; nextExpiration=1541704833216; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:33 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704833258; nextExpiration=1541704832769; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:34 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704834219; nextExpiration=1541704834217; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:34 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704834260; nextExpiration=1541704834258; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:35 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704835220; nextExpiration=1541704835219; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:35 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704835261; nextExpiration=1541704835260; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:36 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704836221; nextExpiration=1541704836220; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:36 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704836263; nextExpiration=1541704836261; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:37 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704837223; nextExpiration=1541704837221; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:37 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704837264; nextExpiration=1541704837263; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:38 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704838224; nextExpiration=1541704838223; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:38 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704838265; nextExpiration=1541704838264; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:39 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704839225; nextExpiration=1541704839224; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:39 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704839267; nextExpiration=1541704839265; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:40 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704840227; nextExpiration=1541704840225; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:40 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704840268; nextExpiration=1541704840267; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:41 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704841228; nextExpiration=1541704841227; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:41 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704841269; nextExpiration=1541704841268; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:42 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704842229; nextExpiration=1541704842228; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:42 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704842271; nextExpiration=1541704842269; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:43 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704843231; nextExpiration=1541704843229; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:43 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704843272; nextExpiration=1541704843271; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:44 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704844232; nextExpiration=1541704844231; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:44 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704844273; nextExpiration=1541704844272; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:45 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704845233; nextExpiration=1541704845232; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:45 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=1; now=1541704845275; nextExpiration=1541704845273; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:46 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1541704846234; nextExpiration=1541704846233; keyCount=0; hasEvents=false; eval=false
Nov 08, 2018 11:20:46 AM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
Comment 3 Mark Thomas 2018-11-13 23:14:11 UTC
I believe I have found the root cause and I have applied a fix to 9.0.x. If you are able to build from source and confirm that fix that would be very helpful. Alternatively, a snapshot build can be provided for you to test with on request.
Comment 4 Dapeng Zhang 2018-11-14 00:32:44 UTC
I tested with your patch (commit afdadae11b6876f3df25d3105230556b3ab27ab7) locally with my own project. Both on Linux and Mac. I didn't see such timeout failure any more. Thank you for the fix!

However, on Mac, unlike testing with tomcat-embed-core-9.0.12 or tomcat-embed-core-9.0.13, testing with the patch I have to add an extra runtime dependency for the artifact

https://search.maven.org/search?q=a:javax.annotation-api

otherwise, my test will see ClassNotFoundException:

Exception in thread "main" org.apache.catalina.LifecycleException: A child container failed during start
	at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:928)
	at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:261)
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
	at org.apache.catalina.core.StandardService.startInternal(StandardService.java:523)
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
	at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:801)
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
	at org.apache.catalina.startup.Tomcat.start(Tomcat.java:394)
	at AsyncReadEmbedded.main(AsyncReadEmbedded.java:76)
......
Caused by: java.lang.ClassNotFoundException: javax.annotation.Resource
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:185)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
Comment 5 Mark Thomas 2018-11-14 08:50:29 UTC
Thanks for the confirmation. I'll back-port the fix to 8.5.x.

You'll see that CNFE if you run on later versions of Java that either don't provide those classes by default or don't provide them at all.
Comment 6 Mark Thomas 2018-11-14 14:52:18 UTC
Fixed in:
- trunk for 9.0.14 onwards
- 8.5.x for 8.5.36 onwards