Bug 64839 - HTTP2: Exception in thread "http-nio-x.y.z-1090-ClientPoller" java.lang.NullPointerException
Summary: HTTP2: Exception in thread "http-nio-x.y.z-1090-ClientPoller" java.lang.NullP...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 9.0.38
Hardware: HP Linux
: P2 normal (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 68908 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-10-22 07:05 UTC by Arshiya
Modified: 2024-04-17 08:44 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arshiya 2020-10-22 07:05:30 UTC
Sub-Component - Coyote

OS : Redhat Linux

Overview:

Embedded Tomcat version 9.0.38 is implemented to transport http/2 packets between 2 systems (h2c connection). The number of threads configured in tomcat is 200, all other values are the tomcat defaults.

We see the below NullPointerException is printed , after which the poller thread is killed and tomcat is not processing any requests.

Exception in thread "http-nio-x.y.z-1090-ClientPoller" java.lang.NullPointerException

We did not see any other Exceptions .

We suspect this is because the NullPointerException is not handled in org.apache.tomat.util.net.NioEndPoint$Poller run() method .

On what scenarios does this occur ?
Can you please help fix this issue. 

Build:
Embedded tomcat 9.0.38

Thanks in Advance!!
Comment 1 Remy Maucherat 2020-10-22 08:35:15 UTC
Can you provide the full stack trace of the exception ?
Comment 2 Mark Thomas 2020-10-22 08:37:18 UTC
We need the full stack trace to investigate this further.

Note: maxThreads="200" is the Tomcat default. If you are referring to a different setting please be explicit about which one.
Comment 3 Arshiya 2020-10-22 09:09:31 UTC
This is was the only line that was printed in the logs in production , and we are not aware why this will happen .
Comment 4 Mark Thomas 2020-10-22 09:46:10 UTC
Without the stack trace there isn't much we can do. The most that looks possible at this stage is to add some additional try/catch blocks with logging of exceptions.

It would be helpful to see the following:

- hardware specification
- exact OS version
- exact JRE vendor and version

Given our inability so far to recreate these issues with the provided test cases and that the issues don't seem to be occurring for other users, I am beginning to suspect an issue in a component other than Tomcat.
Comment 5 Remy Maucherat 2020-10-22 13:24:30 UTC
(In reply to Mark Thomas from comment #4)
> The most that looks
> possible at this stage is to add some additional try/catch blocks with
> logging of exceptions.

I don't see any legitimate place where a NPE can occur, nor any place to add such a try/catch in a useful way.

Would setting -XX:-OmitStackTraceInFastThrow allow getting the NPE trace ?

If not, then IMO it's a NIO error with the JVM and this should be closed. And actually, I think it's probably good to check a new JVM (11 or more recent, lots of fixes and refactorings in NIO).
Comment 6 Mark Thomas 2020-10-22 14:03:48 UTC
+1 to adding that JVM option and then review tge full stack trace as the next step.
Comment 7 Arshiya 2020-10-22 14:56:26 UTC
The BufferOverFlow Exception was printed for about 4 times and then the NullPointer Exception.. The exact time stamp of the trace is not known.

Exception in thread "http-nio-x.y.z-1090-exec-20" java.nio.BufferOverflowException
        at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
        at org.apache.tomcat.util.net.SocketBufferHandler.unReadReadBuffer(SocketBufferHandler.java:100)
        at org.apache.tomcat.util.net.SocketWrapperBase.unRead(SocketWrapperBase.java:401)
        at org.apache.coyote.http2.Http2AsyncParser$FrameCompletionHandler.completed(Http2AsyncParser.java:306)
        at org.apache.coyote.http2.Http2AsyncParser$FrameCompletionHandler.completed(Http2AsyncParser.java:163)
        at org.apache.tomcat.util.net.SocketWrapperBase$VectoredIOCompletionHandler.completed(SocketWrapperBase.java:1087)
        at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper$NioOperationState.run(NioEndpoint.java:1511)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

Please let us know if this is of any help ?

Please find the hardware specs /Java version
Hardware Spec
Environment - OpenStack Compute hosted on VM
RAM - RAM - 119478416 - 119 GB
Cores of CPU - 12 
OS - RHEL 7.4
Kernel version - 3.10.0-693.58.1.el7.x86_64
Java Version:1.8.0_241
Comment 8 Remy Maucherat 2020-10-23 12:53:11 UTC
You previously reported this BufferOverflowException on the user list, and Mark found and fixed the (actually totally harmless) issue in Tomcat 9.0.39.

Please provide details on the NPE using the JVM setting.
Comment 9 Arshiya 2020-10-27 11:51:05 UTC
Please find the trace of the NullPointerException:

Exception in thread "http-nio-x.y.x-1090-ClientPoller" java.lang.NullPointerException
at org.apache.tomcat.util.net.NioEndpoint$Poller.events(NioEndpoint.java:614)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:730)
at java.lang.Thread.run(Thread.java:748)
Comment 10 Remy Maucherat 2020-10-27 14:27:56 UTC
Ok, so this NPE will be caught and logged with no major consequences. There is normally no way it could happen, however (the SocketChannel of the NioChannel is null, which only happens for the closed channel, which is not supposed to be in the poller).
Comment 11 Arshiya 2020-10-27 15:12:52 UTC
Thank you for the swift response Remy.

For a few hours the application accepts requests and processes fine , but suddenly after this exception is logged , none of the requests are accepted (tps drops to 0) .

Is this because the external client closes the connection causing this issue in the Poller ?

If the issue is due to the environment , what can the suspect be on ?
Comment 12 Remy Maucherat 2020-10-27 15:42:16 UTC
I made a mistake, I was not looking at the right call to events(). So this should be tightened up [although I don't see how it can end up in this situation].
Comment 13 Remy Maucherat 2020-10-27 16:30:03 UTC
A previous refactoring of the Poller.events() could cause this uncaught exception to occur. This is fixed in 10.0.0-M10 and 9.0.40 where the NPE will be logged properly and Tomcat should be able to continue processing requests.
Comment 14 Arshiya 2020-10-27 16:36:19 UTC
Thanks a ton Remy!

Any update on the tentative release date of 9.0.40 please .
Comment 15 Mark Thomas 2020-10-27 17:12:14 UTC
Also back-ported to 8.5.x for 8.5.60 onwards.

As always, the next round of releases will start around the beginning of the month once all the open issues have been addressed.
Comment 16 Venkat 2020-10-28 06:27:08 UTC
(In reply to Remy Maucherat from comment #13)
> A previous refactoring of the Poller.events() could cause this uncaught
> exception to occur. This is fixed in 10.0.0-M10 and 9.0.40 where the NPE
> will be logged properly and Tomcat should be able to continue processing
> requests.

Iterator<SelectionKey> iterator =
                    keyCount > 0 ? selector.selectedKeys().iterator() : null;
                // Walk through the collection of ready keys and dispatch
                // any active event.
                while (iterator != null && iterator.hasNext()) {
                    SelectionKey sk = iterator.next();
                    iterator.remove();
                    NioSocketWrapper socketWrapper = (NioSocketWrapper) sk.attachment();
                    // Attachment may be null if another thread has called
                    // cancelledKey()
                    if (socketWrapper != null) {
                        processKey(sk, socketWrapper);
                    }
                }

                // Process timeouts
                timeout(keyCount,hasEvents);

This piece of code in Poller run() is still not in try catch ,any unexpected exception can cause this thread to die , are you taking care of this as well.

Any hints to reproduce and test this NPE exception in Poller run().
Comment 17 Remy Maucherat 2020-10-28 07:47:29 UTC
Please do not reopen the BZ.

No problem has been reported with this code, which never had any try/catch, so no try/catch needed.
Comment 18 Mark Thomas 2024-04-17 08:44:37 UTC
*** Bug 68908 has been marked as a duplicate of this bug. ***