Bug 65118 - Spurious null pointer exception in Http2UpgradeHandler when reloading web page
Summary: Spurious null pointer exception in Http2UpgradeHandler when reloading web page
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 9.0.41
Hardware: PC All
: P2 normal (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-01 09:43 UTC by Mattias
Modified: 2021-02-22 16:35 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mattias 2021-02-01 09:43:40 UTC
We have a web application using Tomcat configured to use HTTP/2

Sometimes (about 1 out of 10) when reloading a web page one or more resources fails to load, typically style.css, with error connection reset reported by web browser.

In the catalina log file the following it logged:
>27-Jan-2021 16:59:02.435 FINE [https-openssl-apr-0.0.0.0-8443-exec-2] org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch Entry, Connection [2], SocketStatus [OPEN_READ]
>27-Jan-2021 16:59:02.435 FINE [https-openssl-apr-0.0.0.0-8443-exec-2] org.apache.coyote.http2.Http2UpgradeHandler.init Connection [2], State [CONNECTED]
>27-Jan-2021 16:59:02.436 FINE [https-openssl-apr-0.0.0.0-8443-exec-2] org.apache.coyote.http2.Http2Parser.validateFrame Connection [2], Stream [1019], Frame type [HEADERS], Flags [37], Payload size [44]
>27-Jan-2021 16:59:02.436 FINE [https-openssl-apr-0.0.0.0-8443-exec-2] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [2] Starting pruning of old streams. Limit is [100] + 10% and there are currently [509] streams.
>27-Jan-2021 16:59:02.436 FINE [https-openssl-apr-0.0.0.0-8443-exec-2] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [2] Pruned completed stream [1]
>27-Jan-2021 17:00:23.798 FINE [https-openssl-apr-0.0.0.0-8443-exec-2] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [2] Pruned completed stream [3]
>27-Jan-2021 17:00:57.370 FINE [https-openssl-apr-0.0.0.0-8443-exec-2] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [2] Pruned completed stream [1]
>27-Jan-2021 17:01:18.505 SEVERE [https-openssl-apr-0.0.0.0-8443-exec-2] org.apache.coyote.AbstractProtocol$ConnectionHandler.process Error reading request, ignored
>	java.lang.NullPointerException
>		at org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams(Http2UpgradeHandler.java:1249)
>		at org.apache.coyote.http2.Http2UpgradeHandler.createRemoteStream(Http2UpgradeHandler.java:1129)
>		at org.apache.coyote.http2.Http2UpgradeHandler.headersStart(Http2UpgradeHandler.java:1508)
>		at org.apache.coyote.http2.Http2Parser.readHeadersFrame(Http2Parser.java:225)
>		at org.apache.coyote.http2.Http2Parser.readFrame(Http2Parser.java:99)
>		at org.apache.coyote.http2.Http2Parser.readFrame(Http2Parser.java:71)
>		at org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch(Http2UpgradeHandler.java:339)
>		at org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:60)
>		at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:59)
>		at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:888)
>		at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:1991)
>		at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
>		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>		at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>		at java.lang.Thread.run(Thread.java:748)

We see this with multiple web browsers including Chrome (88.0.4324.104) and Firefox (84.0.2)

Tomcat is configured to use HTTP/2 with default values.

In order to reproduce this issue we continuously reload a web page until issue occurs.

After examining the logs and the code (Http2UpgradeHandler.java:1249) it seems to me that the same stream is pruned twice, in the same go, and the second time its parent is already removed resulting in a NPE due to variable parent is null.
Comment 1 Mark Thomas 2021-02-01 11:04:23 UTC
The issue looks like concurrent modification of Http2UpgradeHandler.streams to me. I think pruning needs to be performed while holding the priorityTreeLock but I want to try and recreate this to confirm that theory before looking at a fix.
Comment 2 Mark Thomas 2021-02-01 15:37:35 UTC
Fixed in:
- 10.0.x for 10.0.3 onwards
- 9.0.x for 9.0.44 onwards
- 8.5.x for 8.5.63 onwards

7.0.x is not affected.
Comment 3 Mattias 2021-02-08 13:43:15 UTC
I have now tested the fix for this issue using SNAPSHOT tomcat-9.0-20210208.120325-3060.

I am afraid that the fix does not solve the issue. I still get a null pointer exception at the same place in the code (however other row number 1252).

>08-Feb-2021 14:07:10.564 FINE [https-openssl-apr-0.0.0.0-8443-exec-34] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [1] Starting pruning of old streams. Limit is [500] and there are currently [509] streams.
>08-Feb-2021 14:07:10.564 FINE [https-openssl-apr-0.0.0.0-8443-exec-34] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [1] Pruned completed stream [1]
>08-Feb-2021 14:07:10.564 FINE [https-openssl-apr-0.0.0.0-8443-exec-34] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [1] Pruned completed stream [3]
>08-Feb-2021 14:07:10.564 FINE [https-openssl-apr-0.0.0.0-8443-exec-34] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [1] Pruned completed stream [5]
>08-Feb-2021 14:07:10.564 FINE [https-openssl-apr-0.0.0.0-8443-exec-34] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [1] Pruned completed stream [7]
>08-Feb-2021 14:07:10.564 FINE [https-openssl-apr-0.0.0.0-8443-exec-34] org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams Connection [1] Pruned completed stream [5]
>08-Feb-2021 14:07:10.565 SEVERE [https-openssl-apr-0.0.0.0-8443-exec-34] org.apache.coyote.AbstractProtocol$ConnectionHandler.process Error reading request, ignored
>	java.lang.NullPointerException
>		at org.apache.coyote.http2.Http2UpgradeHandler.pruneClosedStreams(Http2UpgradeHandler.java:1252)
>		at org.apache.coyote.http2.Http2UpgradeHandler.createRemoteStream(Http2UpgradeHandler.java:1129)
>		at org.apache.coyote.http2.Http2UpgradeHandler.headersStart(Http2UpgradeHandler.java:1512)
>		at org.apache.coyote.http2.Http2Parser.readHeadersFrame(Http2Parser.java:225)
>		at org.apache.coyote.http2.Http2Parser.readFrame(Http2Parser.java:99)
>		at org.apache.coyote.http2.Http2Parser.readFrame(Http2Parser.java:71)
>		at org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch(Http2UpgradeHandler.java:339)
>		at org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:60)
>		at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:59)
>		at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:893)
>		at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:1991)
>		at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
>		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>		at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>		at java.lang.Thread.run(Thread.java:748)


For testing purposes I changed the source code in Http2UpgradeHandler line 1252. 

From 
> while (toClose > 0 && parent.getIdAsInt() > 0 && parent.getIdAsInt() < stream.getIdAsInt() && parent.getChildStreams().isEmpty()) {
To
>  while (toClose > 0 && parent != null && parent.getIdAsInt() > 0 && parent.getIdAsInt() < stream.getIdAsInt() && parent.getChildStreams().isEmpty()) {
In order to avoid the NPE. This fix solved the issue. At least no NPE was thrown and no problem with loading resources could be seen in the web browsers. 
I hope this can help you pin point the root cause.
Comment 4 Mark Thomas 2021-02-18 09:38:47 UTC
That patch looks to be tackling the symptom rather than the root cause. My concern is that the root cause may trigger other errors.

What would really help here is a test case (ideally a simple WAR) that demonstrates the issue.
Comment 5 Mattias 2021-02-18 09:41:08 UTC
(In reply to Mark Thomas from comment #4)
> That patch looks to be tackling the symptom rather than the root cause. My
> concern is that the root cause may trigger other errors.
> 
> What would really help here is a test case (ideally a simple WAR) that
> demonstrates the issue.

I understand. I will try to create a test case that you can use.
Comment 6 Mark Thomas 2021-02-22 13:06:54 UTC
Moving to NEEDINFO as we need a test case for this.
Comment 7 Mark Thomas 2021-02-22 13:51:39 UTC
Woot! I managed to trigger this locally just with Tomcat's default home page. Investigating...
Comment 8 Mark Thomas 2021-02-22 16:35:27 UTC
Found it. The priority hierarchy was being corrupted when a closed stream was replaced with a RecycledStream instance.

Fixed in:
- 10.0.x for 10.0.3 onwards
- 9.0.x for 9.0.44 onwards
- 8.5.x for 8.5.63 onwards

7.0.x is not affected.