Bug 57921 - HTTP/1.1 without keep-alive Connection response header uses infinite keep-alive and fails
Summary: HTTP/1.1 without keep-alive Connection response header uses infinite keep-ali...
Status: RESOLVED FIXED
Alias: None
Product: JMeter
Classification: Unclassified
Component: HTTP (show other bugs)
Version: 2.13
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: JMeter issues mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-12 09:23 UTC by Andrey Pokhilko
Modified: 2015-06-01 10:10 UTC (History)
1 user (show)



Attachments
packet capture for failed session (2.61 KB, application/vnd.tcpdump.pcap)
2015-05-12 09:23 UTC, Andrey Pokhilko
Details
Reproduction test plan (6.13 KB, application/xml)
2015-05-12 09:23 UTC, Andrey Pokhilko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Pokhilko 2015-05-12 09:23:21 UTC
Created attachment 32731 [details]
packet capture for failed session

The issue happens when pause between requests longer than server keep-alive timeout. See attached JMX for reproduction.

TCP session is:

1. JMeter establishes connection, makes first query and gets response
2. after 60 seconds server closes the connection by timeout
3. JMeter sleep time ends and it tries to make new request through old connection and predictably fails

JMeter should re-establish connection, as socket has received FIN packet. Packet capture file attached.

It fails both on Linux and Windows. I tried to upgrade HTTPClient libs to 4.4.1, it does not help.
Comment 1 Andrey Pokhilko 2015-05-12 09:23:55 UTC
Created attachment 32732 [details]
Reproduction test plan
Comment 2 Philippe Mouawad 2015-05-12 20:00:50 UTC
Hi Andrey,
I suggest asking question on httpclient user mailing list. 
Oleg usually helps a lot.
Thanks
Comment 3 Andrey Pokhilko 2015-05-12 20:01:44 UTC
Are we sure this is HTTPClient issue? How can we verify this?
Comment 4 Philippe Mouawad 2015-05-29 21:23:14 UTC
Maybe we should change retry defaults:
- https://bugzilla.mozilla.org/show_bug.cgi?id=92224
Comment 5 Andrey Pokhilko 2015-05-29 22:16:07 UTC
Changing retry default will affect all other situations. I think this is not desired. For example, I personally would not like to have it as my default, because it will masq possible server-side issues when it sometimes refuses to accept the connection once, but succeeds on retry.
Comment 6 Sebb 2015-05-30 00:11:57 UTC
Agreed, changing the default seems wrong
Comment 7 Rainer Jung 2015-05-30 13:00:52 UTC
(In reply to Philippe Mouawad from comment #4)
> Maybe we should change retry defaults:
> - https://bugzilla.mozilla.org/show_bug.cgi?id=92224

That is about a special situation for requests including a request body.

The problem here is about a request (with or without body) that tried to use a stale connection. Stale in the sense that the HTTP keepa-live timeout was reached or almost reached.

It is also not about general retries. It would be good, if we could fix the keep-alive timeout issue. To me it seems that HTTPClient should handle keep alives itself well. After the keep alive timeout happened, it should not resuse a connection. There's a DefaultConnectionKeepAliveStrategy that supports that. Maybe we have to activate it or so.

But there's always a small chance for a race condition (timeout not yet over when sending data but timeout reached before data arrives at server), it might be best to not reuse a keep alive connection if at the moment we want to use it the keep alive timeout is nearly over.

It seems that HTTPClient allows to define a ConnectionKeepAliveStrategy which can be derived from DefaultConnectionKeepAliveStrategy and would set the keep alive timeout e.g. to the one determined by DefaultConnectionKeepAliveStrategy from the server header minus e.g. 200 ms (configurable). Not perfect but should reduce those race conditions a lot.

I think we first have to find out whether and why the general keep alive case is broken and then can tackle the rare race condition.
Comment 8 Philippe Mouawad 2015-05-30 13:20:16 UTC
(In reply to Rainer Jung from comment #7)
> (In reply to Philippe Mouawad from comment #4)
> > Maybe we should change retry defaults:
> > - https://bugzilla.mozilla.org/show_bug.cgi?id=92224
> 
> That is about a special situation for requests including a request body.
> 
> The problem here is about a request (with or without body) that tried to use
> a stale connection. Stale in the sense that the HTTP keepa-live timeout was
> reached or almost reached.
> 
> It is also not about general retries. It would be good, if we could fix the
> keep-alive timeout issue. To me it seems that HTTPClient should handle keep
> alives itself well. After the keep alive timeout happened, it should not
> resuse a connection. There's a DefaultConnectionKeepAliveStrategy that
> supports that. Maybe we have to activate it or so.
> 
> But there's always a small chance for a race condition (timeout not yet over
> when sending data but timeout reached before data arrives at server), it
> might be best to not reuse a keep alive connection if at the moment we want
> to use it the keep alive timeout is nearly over.
> 
> It seems that HTTPClient allows to define a ConnectionKeepAliveStrategy
> which can be derived from DefaultConnectionKeepAliveStrategy and would set
> the keep alive timeout e.g. to the one determined by
> DefaultConnectionKeepAliveStrategy from the server header minus e.g. 200 ms
> (configurable). Not perfect but should reduce those race conditions a lot.
> 

sebb implemented in 2.11 I think a httpclient4.idletimeout property that if > 0, would setup custom DefaultConnectionKeepAliveStrategy that would artificially set a default keep alive if server didn't set it correctly.
Maybe we could enhance it
> I think we first have to find out whether and why the general keep alive
> case is broken and then can tackle the rare race condition.
Comment 9 Rainer Jung 2015-05-31 12:02:05 UTC
The test case and the capture belong to different client server communications.

Furthermore I couldn't reproduce with the test case, because at the time I tried the server seemed to use a much longer keep-alive timeout than the 60 seconds assumed in the test case (I observed 4 minutes timeout). Using the server from the original packet dump and increasing the timer sleep to 5 minutes, I could reproduce.

In this case the server uses HTTP/1.1 but does *not* send a Connection header in the reponse. The request contains one, but not the response.

Looking at the HttpClient code this situation means, that the default reuse strategy decides to reuse the connection (since it is HTTTP/1.1) and asks the keep alive strategy for how long. The default keep alive strategy when called without a Connection header will return the value "-1" which will disable the keep alive. But our own IDLE_STRATEGY will overwrite any value <=0 with the value 0 by default. Value 0 for HttpClient does not mean 0 seconds, but infinite keep-alive. So IMHO IDLE_STRATEGY needs fixing for the default case (value 0 for "httpclient4.idletimeout).

I'm running out of time right now, but will come back to this if noone beats me to it. The likely fix is something like

@@ -145,7 +145,7 @@
         @Override
         public long getKeepAliveDuration(HttpResponse response, HttpContext context) {
             long duration = super.getKeepAliveDuration(response, context);
-            if (duration <= 0) {// none found by the superclass
+            if (duration <= 0 && IDLE_TIMEOUT > 0) {// none found by the superclass
                 log.debug("Setting keepalive to " + IDLE_TIMEOUT);
                 return IDLE_TIMEOUT;
             }
Comment 10 Andrey Pokhilko 2015-05-31 13:39:31 UTC
I've spenc several hours investigating the whole topic of closed keep-alive sockets with HTTPClient and I found:
  0. There is no way to tell if server has closed socket in Java without reading from socket
  1. HTTPClient throws NoHTTPResponse exception when it failed to read anything from server, most of the times this means server has closed the socket
  2. It is suggested by HTTPClient authors to do one retry in case of NoHTTPResponse to make sure the problem persists
  3. Since HTTPClient 4.3 there is special "RetryExec" class to cover this set of cases (JMeter using version 4.2)  

I see no efficient steps here until we will upgrade HTTPClient. Maybe just catching and retrying once on NoHTTPResponse might mitigate the case and separate it from other exception cases.
Comment 11 Rainer Jung 2015-05-31 19:15:53 UTC
I'm going to change the title of this issue to describe more precisely the original problem. To discuss further optimizations we can use the dev list or another specific ticket.

I don't think the problem is that big. The most common case that after applying the fix for this issue here (HTTP/1.1 with no Connection keep-alive header leads to infinite keep alive behavior) IMHO remains is the possible race between the client sending a followon request very shortly before the keep-alive ends and arriving at the server shortly after. This race could be made rare by subtracting a small delta from the keep-alive timeout determined by a connection header.

Another case would be a user who configures a fixed keep-alive timeout using IDLE_STRATEGY which is longer than what the server supports. That would then be a user error.

Finally a server could announce some timeout but then due to increasing load decide to close a connection earlier than announced. That indeed could be a case for RetryExec.
Comment 12 Rainer Jung 2015-06-01 10:10:34 UTC
Author: rjung
Date: Sun May 31 19:24:32 2015
New Revision: 1682775

URL: http://svn.apache.org/r1682775
Log:
Bug 57921 - HTTP/1.1 without keep-alive Connection response header uses infinite keep-alive and fails
Bugzilla Id: 57921

Modified:
    jmeter/trunk/bin/jmeter.properties
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/sampler/HTTPHC4Impl.java