Bug 51881

Summary: Server incorrectly parses incoming HTTP requests and loses them
Product: Tomcat 7 Reporter: Edward Alexandrov <stfighter>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Severity: critical    
Priority: P2    
Version: 7.0.21   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Screenshots from my debugger
Test case
war file

Description Edward Alexandrov 2011-09-23 11:55:42 UTC
We are developing Comet-based web application with native Tomcat Comet support via Atmosphere framework. We use HTTP streaming for transferring data from our servlet to the client. In our scenario, the user connects to the server with XMLHTTPRequest sending GET. In response, we suspend his connection and then, when needed, send data back to him. When the user opens another browser tab and connects to our web application, our web application should close the connection in the first (previously opened) tab and then suspend and use connection from the second. So, my serlvet detect that another GET is incoming in the same session, finds previous connection and closes it (internally using CometEvent.close()). This works as intended. However, the next HTTP request to the web server is not correctly handled, and in my servlet I receive the strange HttpServletRequest with previous GET query string and duplicate request headers from initial GET and the following request. I traced through the Tomcat code, and noticed, that in org.apache.coyote.http11.Http11Processor.process() request field is usually empty (contains no HTTP method, request headers, etc.) However, after Comet closing this object is not empty and contain previoud HTTP request. Later the request headers are parsed and appended to the fields existed in previous request object. After that, this invalid object is passed to my Servlet.
I suppose, that clearing request object when closing comet event and all assosiated data in Http11NioProcessor.actionInternal() can solve this issue.
Comment 1 Edward Alexandrov 2011-09-23 11:57:20 UTC
Created attachment 27576 [details]
Screenshots from my debugger
Comment 2 Mark Thomas 2011-09-27 11:22:26 UTC
The relevant objects (Processor, Request, Response) are always recycled between requests. From the description of the problem, it sounds like something is keeping a reference to one  or more of these between requests and that is triggering the mix-up.

To investigate this further you'll need to provide the simplest possible web application (or unit test) that demonstrates this issue. Without the ability to re-create the issue there is little the Tomcat team can do. Also, the majority of issues of this nature turn out to be application issues so without the ability to recreate the problem, the assumption will be that this is also an application issue and the issue closed as invalid.
Comment 3 Edward Alexandrov 2011-09-27 14:22:12 UTC
Created attachment 27610 [details]
Test case

This is the test servlet which reproduces the issue. It can be used as follows: run the servlet under debugger and set breakpoint on TransportServlet.java: 102. Access it in Chrome in a tab. Click 'Reflect message' a couple of times. Open another tab (not closing the first one) and access the servlet. The breakpoint should trigger, indicating the broken request.

Ideally, you should see Error:204 in the first tab, after which the 'Reflect message' button takes no action, and working app in the second tab.

When I deploy my server on Galssfish it works as intended.
Comment 4 Edward Alexandrov 2011-09-27 14:23:06 UTC
Testcase included
Comment 5 Mark Thomas 2011-09-27 15:20:55 UTC
Thanks for that, I'll take a look now.
Comment 6 Mark Thomas 2011-09-27 16:10:02 UTC
A couple of notes on the sample:
- The class name is TransportHandler not TransportServlet
- The <web-app> element in web.xml isn't quite right and Tomcat complains if validation is enabled
- There is an invalid <comet-support-enabled> tag that Tomcat complains about if validation is enabled
- The <init-param> tags are too late in the <servlet> block and Tomcat complains if validation is enabled

I have tested the supplied test case and am unable to recreate the issue.
The test environment was:
OS: Windows 64-bit
Java: 1.6.0_27 64-bit
Chrome: 14.0.835.186 m
Tomcat: 7.0.21 with NIO and APR, trunk with NIO

In order to look at this further you are going to have to provide exact steps that recreate this issue on a clean install of either the latest Tomcat release or the latest code from svn trunk. It may also help if you provide a complete WAR file to ensure commonality between environments.
Comment 7 Edward Alexandrov 2011-09-28 06:06:05 UTC
The issue reproduces 100% on all Tomcat installations I have (7.0.21 on my Windows 32-bit machine, 7.0.19 on Ubuntu Server 32-bit test server and 7.0.16 on my test VM). On all of them Tomcat is cleanly installed, and in server.xml default HTTP1.1 connector changed to

<Connector port="8080" protocol="org.apache.coyote.http11.Http11NioProtocol"
               redirectPort="8443" />

Also, management user is added. All other settings are set to their defaults.

Do you run the sample under the debugger? If not, it works because my workaround method validateRequest() which drops invalid requests, it shouldn't be in the production code. Even with this, you can encounter the delayed request processing, which is highly undesirable. The point of the sample I provided is to put a breakpoint on TransportHandler.java:102 and inspect HttpServletRequest.coyoteRequest object, and see incorrect headers, as in screenshots I attached earlier. I will also comment out this workaround method and attach the war, you will see that both connections and both tabs are closed (error 204), which should never happen.

So, to reproduce the issue, you need to make Tomcat clean install, reconfigure Connector in server.xml, open the project in IDE (we use IntelliJ IDEA for development, project files are included), and set the breakpoint I mentioned.

Then, follow the steps I described earlier.

Thanks for your comments to the sample. THe class is TransportHandler because it is not the servlet class itself (which is AtmosphereServlet, built-in Atmosphere  library), but Atmosphere handler. I will also validate the web.xml, however it is generated almost automatically by my IDE.
Comment 8 Edward Alexandrov 2011-09-28 06:25:23 UTC
Created attachment 27620 [details]
war file

Please also note the message flow. Normally, when opening the page, you should see 3 messages under 'Received messages' 
1. State: messageReceived, Transport: polling, Body: { "success": true}
2. Pushing body: { "command": "player/reauth", "params": { "token": "ABC" } }
3. State: messageReceived, Transport: polling, Body: { "command": "system/hello" ...

And when you open the second tab, when the issue arisies you will see only two of them
1. Pushing body: { "command": "player/reauth", "params": { "token": "ABC" } }
2. State: messageReceived, Transport: polling, Body: { "command": "system/hello"

This is because incoming POST (for Pushing body: { "command": "player/reauth") is "swallowed" by Tomcat and presented to TransportHandler as GET with duplicate headers. And if you wait about 20-30sec, you will see the third line to appear, maybe because of the request is handled for the second time or something after the timeout.
Comment 9 Konstantin Kolinko 2011-09-28 07:07:12 UTC
Can you test with 7.0.22 release candidate?
The download link is in the VOTE thread on dev@.
Comment 10 Mark Thomas 2011-09-28 08:53:19 UTC
Thanks for the additional detail. I can see the problem now. It dies reproduce with the 7.0.22 tag. I am currently trying to track down the root cause.
Comment 11 Mark Thomas 2011-10-03 17:03:17 UTC
This took a while to track down but was eventually relatively simple.

When using NIO and a comet request is closed from a different thread then the poller is used to trigger the processing of the close event. The problem was that that request was marked as non-comet before this poller task was triggered meaning that when it happened the request was treated as a non-Comet request rather than a comet request. This meant that the necessary clean-up was not make and then next request received on the connection (i.e. from the same client) was corrupted.

This has been fixed - along with some additional clean-up - in trunk and 7.0.x and will be included in 7.0.23 onwards.

Thanks again for the test case. Bugs that involve multiple threads are always tricky to track down but the test case made it a lot easier.