Trunk at r1441188, running the test suite with APR connector, Native 1.1.26 Windows 32-bit with JDK 7u11. I encountered the following when running org.apache.catalina.loader.TestVirtualContext test On second re-run of the test the same issue happened, so it is as if there is something specific here. I have not encountered the issue with the rest of testsuite yet. [[[ Feb 01, 2013 2:07:26 AM org.apache.tomcat.util.net.AprEndpoint$Poller run WARNING: Unexpected poller error java.lang.NullPointerException at org.apache.tomcat.util.net.AprEndpoint$AprSocketWrapper.access$200(AprEndpoint.java:2198) at org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1632) at java.lang.Thread.run(Thread.java:722) ]]] The code at AprEndpoint$Poller.run(AprEndpoint.java:1632) is 1628 for (int n = 0; n < rv; n++) { 1629 timeouts.remove(desc[n*2+1]); 1630 AprSocketWrapper wrapper = connections.get( 1631 Long.valueOf(desc[n*2+1])); 1632 wrapper.pollerFlags = wrapper.pollerFlags & ~((int) desc[n*2]); I suspect that "wrapper" is null here. The above code is specific to Tomcat trunk and is not present in 7.0.x. When this happened, the test hung for 15 minutes (1000 secs) waiting on a socket read. When time out passed, it continued to run all the other tests. [[[ Testcase: testAdditionalWebInfClassesPaths took 1 006,735 sec Caused an ERROR Read timed out java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) at org.apache.catalina.startup.TomcatBaseTest.methodUrl(TomcatBaseTest.java:247) at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:219) at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:213) at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:202) at org.apache.catalina.loader.TestVirtualContext.assertPageContains(TestVirtualContext.java:326) at org.apache.catalina.loader.TestVirtualContext.assertPageContains(TestVirtualContext.java:320) at org.apache.catalina.loader.TestVirtualContext.testAdditionalWebInfClassesPaths(TestVirtualContext.java:311) ]]]
Created attachment 29911 [details] TEST-org.apache.catalina.loader.TestVirtualContext.APR.txt JUnit test result file
Created attachment 29912 [details] thread_dump_2013-02-01.txt Thread dump, taken with jvisualvm during the 15-minutes pause
> I suspect that "wrapper" is null here. Yes, it is null. I added the following code at line 1632: [[[ if (wrapper==null) throw new NullPointerException("wrapper is null, n: " + n + ", rv: "+rv+", desc[n*2+1]: " + desc[n*2+1]); ]]] It results in the following on console: [junit] WARNING: Unexpected poller error [junit] java.lang.NullPointerException: wrapper is null, n: 0, rv: 1, desc[n*2+1]: 52898296 [junit] at org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1632) [junit] at java.lang.Thread.run(Thread.java:722)
Reproduced this issue with Native 1.1.24, but a bit different. I was not able to reproduce the NPE, but SocketInputStream.socketRead0 in the test hung in the same way. There were two variants: a) no error message is printed b) Feb 01, 2013 H:MM:SS AM org.apache.tomcat.util.net.AprEndpoint$Poller run WARNING: Error processing socket timeout There were a pair of runs with 1.1.26 and with 1.1.24 when the issue did not reproduce. > thread_dump_2013-02-01.txt (14.90 KB, text/plain) What is odd in the thread dump is that there are two threads: "http-apr-127.0.0.1-auto-2-Poller" "http-apr-127.0.0.1-auto-1-Poller" All other threads besides this poller one are "auto-2". It is the second test, but the poller thread from the first test has not stopped.
I can repeat this on 64-bit Windows.
I believe I have found the root cause of this. There is a bug in tc native introduced in r1412919 in the implementation of Poll.pollset(). n is incremented twice inside the for each loop for each entry in the poller. That means the return value is 2 * number of entries rather than number of entries.
Hmm. There may be more to this problem than just that. I still see issues even with a workaround for the Poll.poollset() return value problem.
Hmm, reading the code it seems that a socket is member of two pollsets (timeouts and connections). Is this correct. Note that socket can be member of only one pollset and 1.1.26 kind of enforces that.
(In reply to comment #8) > Hmm, reading the code it seems that a socket is member of two pollsets > (timeouts and connections). Is this correct. Note that socket can be member > of only one pollset and 1.1.26 kind of enforces that. Neither of those are pollsets. There are local lists of sockets. There are a handful of bugs I have found in the new AprEndpoint. Fixes are on the way. In the meantime, I would be helpful if you could look at the return value of Poll.pollset. The use of n++ twice inside the loop looks like a bug to me.
Not sure what you mean by double increment. Is it? p->set[n++] = (jlong)(fd->rtnevents); p->set[n++] = P2J(fd->client_data); That should be fine. The returned array consist of pair of values odd being event and even being the socket. It was always like that. But you are right, there is a bug. Returned number should be n/2 not n. I'll fix that. If you create a shortcut in Java which calls .pollset(...) and divide by two in case the value is > 0 doesn't that help? Anyhow, I'll fix the native and create test 1.1.27-dev binaries for further test.
(In reply to comment #10) > Not sure what you mean by double increment. Is it? > > p->set[n++] = (jlong)(fd->rtnevents); > p->set[n++] = P2J(fd->client_data); > > That should be fine. > The returned array consist of pair of values odd being event and even being > the socket. It was always like that. > > But you are right, there is a bug. Returned number should be n/2 not n. > I'll fix that. Thanks. > If you create a shortcut in Java which calls .pollset(...) > and divide by two in case the value is > 0 doesn't that help? Already doing this locally. That was how I moved past this issue and found the others. > Anyhow, I'll fix the native and create test 1.1.27-dev binaries for further > test. Perfect. I see your commit, so I'm going to move this back to Tomact 8 to track the remaining AprEndpoint issies.
The root cause of the initial issue is the tc native bug that Mladen has now fixed. This affected Tomcat 7 and 8. The other issues (the poller threads not stopping, socket reads hanging) are Tomcat 8 specific and have been fixed in trunk.
Please try with the binaries from http://people.apache.org/~mturk/native/1.1.27-dev-20130202/
Sure. Just as soon as you make those files world readable ;)
Attrs should be fine now.
Looks good to me. The failing tests pass. I haven't done a full test run though.