Summary: | http NIO connector crash after update from 8.0.27 to 8.0.30 | ||
---|---|---|---|
Product: | Tomcat 8 | Reporter: | slash |
Component: | Connectors | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | julien, reda.housnialaoui |
Priority: | P2 | ||
Version: | 8.0.30 | ||
Target Milestone: | ---- | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: |
Graph of network connection status during the crash of the connector
Thread dump of a tomcat 8.0.30 with http connector frozen |
Description
slash
2016-02-04 17:18:07 UTC
Thread dump when the problem occurs and logs leading up to the problem please. Best guess at this point in that the Poller thread stopped but without information that is nothing more than a wild guess. I know it's difficult to debug like this, unfortunately I had to rollback the production to 8.0.27 for now to restore our websocket services. I'll see what I can do to give you relevant logs/thread dump. Created attachment 33732 [details]
Thread dump of a tomcat 8.0.30 with http connector frozen
Hello,
Please find the required thread dump in attachment.
Thread dump of a tomcat 8.0.30 with a frozen http nio connector.
Regards
The dump looks slightly weird (lots of APR AJP, this seems more active to me than the NIO connector). However, the NIO connector is indeed stuck on its max connections which probably have been leaked due to the Atmosphere use, which may or may not be doing bad things. maxConnections is 10000 and often does not make sense (I disabled it by default for the NIO2 connector). So I'll switch it back to need info since there's no proof this is valid (or the same issue that was originally reported, although I'd say it's likely). I am sorry, I wasn't clear enough. Slash and me are working in the same company, so I can assure you that the uploaded thread dump is about this issue. We have a lot of trafic on AJP and less on http NIO because all non websocket traffic is going through httpd modjk and then AJP connector. Since modjk can't deal with websocket connections, http NIO connector is here to only manage websocket traffic. Here is what we do to systematically reproduce the issue: - From a nodejs application we try to establish 20 000 atmosphere connections using websocket transport to the app running in tomcat 8.0.30 - Once we hit the max connection, we wait about 1 minute - Then we kill violently the node application and relaunch it to establish 20 000 new atmosphere connections - If the http connector is still alive, we repeat the whole operation It takes about 3 attemps to crash the http connector. In the end, the node app is totally stopped, there is no more connection to the tomcat http nio connector and yet the connector is totally frozen. From what I have seen, comparing healty tomcat tdump and tomcat with frozen connector tdump, I can see that when connector is frozen, all http nio acceptors thread are in PARKING status. I don't know if you can see this in the tdump but we are using the JSR356 websocket implementation. The problem is with the current connection count tracking. There are code paths where this isn't being decremented when a connection closes in error. I'm currently looking for a reliable way to track the open connection count. I (think I) found the root cause. This has been fixed in: - 9.0.x for 9.0.0.M5 - 8.5.x for 8.5.1 - 8.0.x for 8.0.34 - 7.0.x for 7.0.70 Thank you for the fix. When can we expect the 8.0.34 release? Would it be wise to use the current 8.0.34 snapshot in production? Simply set maxConnections to unlimited (-1) in your configuration and you're done. Hello, We still have the issue on tomcat 8.0.37 and 8.0.38 with the same configuration. New jstack attached. The dump is too big to be attached. Here is a link to download it: http://s000.tinyupload.com/index.php?file_id=00903516386387493654 The dump comes from a tomcat 8.0.38 with crashed http connector. Do the same reproduction steps still create the issue? Can you provide a (simple as possible) web application and client we can use to recreate this problem? I still don't understand if this is caused by maxConnections or not. Can the unlimited setting be tried and/or the connection count be monitored ? Usually unplugging a network cable is the worst test since the network connection may never be actually noticed by the other server as being dead. However, the server connectionTimeout should work, but it doesn't necessarily apply in all cases (websockets, etc, and precisely that's the scenario here). No further response from OP, no info on how to reproduce this and no similar reports from other users. If you believe you are experiencing this issue or one similar, please open a new issue with the steps to reproduce the issue on clean install of the latest 7.0.x, 8.0.x, 8.5.x or 9.0.x release. |