Bug 54513 - NullPointerException in AprEndpoint$Poller.run
Summary: NullPointerException in AprEndpoint$Poller.run
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 8.0.x-trunk
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-31 23:15 UTC by Konstantin Kolinko
Modified: 2013-02-03 10:09 UTC (History)
0 users



Attachments
TEST-org.apache.catalina.loader.TestVirtualContext.APR.txt (6.46 KB, text/plain)
2013-01-31 23:18 UTC, Konstantin Kolinko
Details
thread_dump_2013-02-01.txt (14.90 KB, text/plain)
2013-01-31 23:20 UTC, Konstantin Kolinko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Konstantin Kolinko 2013-01-31 23:15:59 UTC
Trunk at r1441188, running the test suite with APR connector, Native 1.1.26 Windows 32-bit with JDK 7u11.

I encountered the following when running org.apache.catalina.loader.TestVirtualContext test

On second re-run of the test the same issue happened, so it is as if there is something specific here. I have not encountered the issue with the rest of testsuite yet.

[[[
Feb 01, 2013 2:07:26 AM org.apache.tomcat.util.net.AprEndpoint$Poller run
WARNING: Unexpected poller error
java.lang.NullPointerException
	at org.apache.tomcat.util.net.AprEndpoint$AprSocketWrapper.access$200(AprEndpoint.java:2198)
	at org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1632)
	at java.lang.Thread.run(Thread.java:722)
]]]

The code at AprEndpoint$Poller.run(AprEndpoint.java:1632) is

1628 for (int n = 0; n < rv; n++) {
1629  timeouts.remove(desc[n*2+1]);
1630  AprSocketWrapper wrapper = connections.get(
1631          Long.valueOf(desc[n*2+1]));
1632  wrapper.pollerFlags = wrapper.pollerFlags & ~((int) desc[n*2]);

I suspect that "wrapper" is null here.

The above code is specific to Tomcat trunk and is not present in 7.0.x.


When this happened, the test hung for 15 minutes (1000 secs) waiting on a socket read. When time out passed, it continued to run all the other tests.

[[[
Testcase: testAdditionalWebInfClassesPaths took 1 006,735 sec
	Caused an ERROR
Read timed out
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:150)
	at java.net.SocketInputStream.read(SocketInputStream.java:121)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633)
	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322)
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
	at org.apache.catalina.startup.TomcatBaseTest.methodUrl(TomcatBaseTest.java:247)
	at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:219)
	at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:213)
	at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:202)
	at org.apache.catalina.loader.TestVirtualContext.assertPageContains(TestVirtualContext.java:326)
	at org.apache.catalina.loader.TestVirtualContext.assertPageContains(TestVirtualContext.java:320)
	at org.apache.catalina.loader.TestVirtualContext.testAdditionalWebInfClassesPaths(TestVirtualContext.java:311)
]]]
Comment 1 Konstantin Kolinko 2013-01-31 23:18:25 UTC
Created attachment 29911 [details]
TEST-org.apache.catalina.loader.TestVirtualContext.APR.txt

JUnit test result file
Comment 2 Konstantin Kolinko 2013-01-31 23:20:09 UTC
Created attachment 29912 [details]
thread_dump_2013-02-01.txt

Thread dump, taken with jvisualvm during the 15-minutes pause
Comment 3 Konstantin Kolinko 2013-01-31 23:55:20 UTC
> I suspect that "wrapper" is null here.

Yes, it is null. I added the following code at line 1632: 

[[[
if (wrapper==null) throw new NullPointerException("wrapper is null, n: " + n + ", rv: "+rv+", desc[n*2+1]: " + desc[n*2+1]);
]]]

It results in the following on console:
    [junit] WARNING: Unexpected poller error
    [junit] java.lang.NullPointerException: wrapper is null, n: 0, rv: 1, desc[n*2+1]: 52898296
    [junit] 	at org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1632)
    [junit] 	at java.lang.Thread.run(Thread.java:722)
Comment 4 Konstantin Kolinko 2013-02-01 01:16:12 UTC
Reproduced this issue with Native 1.1.24, but a bit different.
I was not able to reproduce the NPE, but SocketInputStream.socketRead0 in the test hung in the same way. There were two variants:
a) no error message is printed
b) Feb 01, 2013 H:MM:SS AM org.apache.tomcat.util.net.AprEndpoint$Poller run
WARNING: Error processing socket timeout


There were a pair of runs with 1.1.26 and with 1.1.24 when the issue did not reproduce.


>  thread_dump_2013-02-01.txt (14.90 KB, text/plain) 

What is odd in the thread dump is that there are two threads:
"http-apr-127.0.0.1-auto-2-Poller"
"http-apr-127.0.0.1-auto-1-Poller"

All other threads besides this poller one are "auto-2".
It is the second test, but the poller thread from the first test has not stopped.
Comment 5 Mark Thomas 2013-02-01 13:33:54 UTC
I can repeat this on 64-bit Windows.
Comment 6 Mark Thomas 2013-02-01 21:05:02 UTC
I believe I have found the root cause of this. There is a bug in tc native introduced in r1412919 in the implementation of Poll.pollset(). n is incremented twice inside the for each loop for each entry in the poller. That means the return value is 2 * number of entries rather than number of entries.
Comment 7 Mark Thomas 2013-02-01 21:10:33 UTC
Hmm. There may be more to this problem than just that. I still see issues even with a workaround for the Poll.poollset() return value problem.
Comment 8 Mladen Turk 2013-02-02 11:41:25 UTC
Hmm, reading the code it seems that a socket is member of two pollsets (timeouts and connections). Is this correct. Note that socket can be member of only one pollset and 1.1.26 kind of enforces that.
Comment 9 Mark Thomas 2013-02-02 18:49:35 UTC
(In reply to comment #8)
> Hmm, reading the code it seems that a socket is member of two pollsets
> (timeouts and connections). Is this correct. Note that socket can be member
> of only one pollset and 1.1.26 kind of enforces that.

Neither of those are pollsets. There are local lists of sockets.

There are a handful of bugs I have found in the new AprEndpoint. Fixes are on the way. In the meantime, I would be helpful if you could look at the return value of Poll.pollset. The use of n++ twice inside the loop looks like a bug to me.
Comment 10 Mladen Turk 2013-02-02 19:22:59 UTC
Not sure what you mean by double increment. Is it?

        p->set[n++]   = (jlong)(fd->rtnevents);
        p->set[n++]   = P2J(fd->client_data);

That should be fine.
The returned array consist of pair of values odd being event and even being the socket. It was always like that.

But you are right, there is a bug. Returned number should be n/2 not n.
I'll fix that.
If you create a shortcut in Java which calls .pollset(...)
and divide by two in case the value is > 0 doesn't that help?
Anyhow, I'll fix the native and create test 1.1.27-dev binaries for further test.
Comment 11 Mark Thomas 2013-02-02 19:30:07 UTC
(In reply to comment #10)
> Not sure what you mean by double increment. Is it?
> 
>         p->set[n++]   = (jlong)(fd->rtnevents);
>         p->set[n++]   = P2J(fd->client_data);
> 
> That should be fine.
> The returned array consist of pair of values odd being event and even being
> the socket. It was always like that.
> 
> But you are right, there is a bug. Returned number should be n/2 not n.
> I'll fix that.

Thanks.

> If you create a shortcut in Java which calls .pollset(...)
> and divide by two in case the value is > 0 doesn't that help?

Already doing this locally. That was how I moved past this issue and found the others.

> Anyhow, I'll fix the native and create test 1.1.27-dev binaries for further
> test.

Perfect. I see your commit, so I'm going to move this back to Tomact 8 to track the remaining AprEndpoint issies.
Comment 12 Mark Thomas 2013-02-02 20:41:02 UTC
The root cause of the initial issue is the tc native bug that Mladen has now fixed. This affected Tomcat 7 and 8.

The other issues (the poller threads not stopping, socket reads hanging) are Tomcat 8 specific and have been fixed in trunk.
Comment 13 Mladen Turk 2013-02-02 21:02:17 UTC
Please try with the binaries from
http://people.apache.org/~mturk/native/1.1.27-dev-20130202/
Comment 14 Mark Thomas 2013-02-02 21:10:28 UTC
Sure. Just as soon as you make those files world readable ;)
Comment 15 Mladen Turk 2013-02-03 05:21:12 UTC
Attrs should be fine now.
Comment 16 Mark Thomas 2013-02-03 10:09:33 UTC
Looks good to me. The failing tests pass. I haven't done a full test run though.