This is a fairly impossible problem but I thought I'd raise it in case it supports something you've already seem. In the attached plan, if I run it without a timer it is successful for each request. However, if I enable the timer the the first upload request works but the second fails, then the third works, and the forth fails, etc... The only difference is enabling and disabling the timer. Unfortunately, I can't actually give you a working jmx - the attached will always fail but hopefully it is still useful. Failure is: org.apache.http.NoHttpResponseException: The target server failed to respond Success: 204 - blank response The type of timer makes no difference. The value in the timer DOES make a difference. 100ms and + = problem, 30ms = no problem. The request I am making to S3 is valid, were it not I would get an error response. In this case I get NoHTTPResponse.
Created attachment 31292 [details] Screenshot of issue
I should also have stated that running this test with multiple threads does not change the per thread behaviour. Thread 1 > works, then fails, then works, then fails... Thread 2 > works, then fails, then works, then fails... Thread 3 > works, then fails, then works, then fails... etc...
Created attachment 31293 [details] jmx
*** Bug 56118 has been marked as a duplicate of this bug. ***
Could you clarify which timer makes test fail ? Is it Constant Timer or Gaussian Random Timer ? Why does last request use a different server than others ? Would it be possible that activating an additional Timer (if it's the case (see my first question)) leads randomly to having a timeout on created access key ? Reading the S3 docs I don't see anything like this although.
It fails with both timers. The last POST to actually upload the file is different because the server (our server) only gives the client/browser some security tokens, we do not receive the uploaded file. The actual file transfer is done from the browser directly to Amazon S3 and this is the last request. See: http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingHTTPPOST.html There is an expiration for the credentials we pass back to the browser but this is set to 5 minutes. See: http://docs.aws.amazon.com/AmazonS3/latest/dev/HTTPPOSTForms.html#HTTPPOSTExpiration At first glance, I can see how this looks like a config issue with the request and, to be fair, it probably is but it's weird that it relates to timing like this and that this can cause a null response - I was curious if there was anything you could think of. But it's quite a minor problem: this will not occur in real life and we can easily work around it for testing.
Thanks for feedback. Issue does not seem to be related to JMeter. Closing issue but would be very interested by the problem explanation if you find it.
We're seeing this problem in other testplans now. Where an external request is failing every second attempt. This is not related to uploads but does appear to relate to calls to external servers. We ran using 2.9 & 2.10 and did not see the problem - it is 2.11 specific. If you contact me at oliver.lloyd@news.co.uk I can provide you with a testplan to replicate.
Hello, After exchange with Oliver it seems these 2 settings improve: 1) uncomment hc.parameters.file=hc.parameters in user.properties 2) Set this in hc.parameters: http.connection.stalecheck$Boolean=true Answer from Oliver: "That improves things a lot. But not 100% - Now we see 1 in 10 requests fail (instead of 1 in 2)."
@Oliver, Regarding the remaining 10% failure if it's confirmed, are you sure you don't get it with 2.9 and 2.10. I think it might come from this behaviour of S3. - http://aws.amazon.com/articles/1904 >"Also, don't overuse a connection. Amazon S3 will accept up to 100 requests before it closes a connection (resulting in 'connection reset'). Rather than having this happen, use a connection for 80-90 requests before closing and re-opening a new connection." In this particular case, the settings change impacts highly the test as you will get a lot of Stale Connections. We would need maybe to add ability to discard a connection after a certain number of Requests. Finally, I am not sure we should rollback this change, but maybe document it better ?
Also, note that the "Use keepalive" option can be deselected to force the next sampler to use a new connection.
No, we only see failures with 2.11. Also, we see the same issue with other testplans that go out to other external endpoints, not just s3. Plus, we start to get failures early on, after just 10 or 20 requests. The common factors are external calls, 2.11 and the use of some sort of timer - if we remove all delays between requests then we do not get failures. Sorry. it is hard to give a more reliable description for reproducing the problem. I will see if I can put something together next week to help.
Do you see the same behaviour if you use HttpClient 3.1 as the HTTP implementation?
@Oliver, after changing the 2 following settings: 1) uncomment hc.parameters.file=hc.parameters in user.properties 2) Set this in hc.parameters: http.connection.stalecheck$Boolean=true You should not see any difference between 2.10 and 2.11 as AFAIK there is no other change in HTTP stack that could explain it. Do you confirm you see differences between 2.10 and 2.11 ? or is it between 2.9 and 2.11 ? In 2.9, the retry count for the HttpClient 3.1 and HttpClient 4.x samplers has been changed to 0 . Previously the default was 1, this one could also explain differences in your use case.
Okay, so coming back to this today I was not able to reproduce the problem (failed external calls) when using the settings advised by Phillipe. We had previously seen lower failure rates of around 5% - 10% but it looks like that was an unfortunate co-incidence. Apologies for that. So, yes, the following changes: 1) uncomment hc.parameters.file=hc.parameters in user.properties 2) Set this in hc.parameters: http.connection.stalecheck$Boolean=true Work around the problem of every other external request failing in 2.11.
Many thanks Oliver for your feedback. In the end, do you consider this a bug ? I don't as current settings work for websites which do not have S3 way of invalidating connection after N requests. But would be interested by your opinion ? At JMeter Team, what's your opinion ? Should we rollback this setting change or not ? At least we should document it but where ? Wiki or component reference ?
My feeling is that this should be considered a bug. The two cases where we recently saw this problem were when going out to S3 and also when sending requests to Zuora, a payment provider. But I can think of several other scenarios where a web app might need to make a call to a third party and where this call might need to be included in a testplan; in fact, we do this all the time and I don't think we're unusual in this.
On the face of it, I don't think it is a bug - perhaps an enhancement request. I'm not sure I fully understand how the timer interacts with the staleCheck setting, unless there is some S3 setting that disallows long-lasting sessions? I think further tests are needed to establish exactly how enabling the staleCheck prevents the problem. It would be interesting to know if disabling keep-alive solved the issue in the absence of staleCheck. Ideally we need a test server (either public or easy to set up locally) that shows the same behaviour as being reported here. JMeter already supports closing a connection, i.e. forcing the next sample to use a new connection. It looks like the problem may be that the connection is expiring somehow. If this is indeed the case, then there might be a need to provide automated connection closing based on re-use count and/or idle time. But first we need to know if that is really what the problem is.
2.11 Use KeepAlive, OFF - 100% Success 2.11 HTTPClient 3.1 - 100% Success 2.11 DEFAULT - 50% Failures 2.9 DEFAULT - 100$ Success 2.10 DEFAULT - 100% Success Note. This is a test NOT using S3, but another, different, external call.
Thanks. That shows the issue is something to do with connection re-use. However I don't understand why dropping the staleCheck should affect 50% of requests. Or indeed why HC 3.1 should be 100% successful, but perhaps that does retries - I don't know without looking. I've just had another look at the error on the screenshot, which shows "Target Server failed to respond", yet the elapsed time seems to be close to sero. This suggests that the request may be failing before it gets to the server. Is there a server against which we can test? Or can you run a very short test showing the errors with full wire logging (or using WireShark)?
Testing with the privately provided test plan shows that the problem occurs on one of the hosts only. This host only occurs once in the plan, which means that its connection is relatively idle compared with the others. When you add in all the other samples and delays, the elapsed time between samples of this particular host is about 60 seconds. I just tried a simple GET on the same host, and that also fails when the elapsed time between samples is around 15 secs or more (10 secs seems to be OK) I think this is the immediate cause of the issue. The staleCheck option adds an extra sample just before real sample. This causes the old connection to be dropped by the server, but the stale check code does not report this: it just creates a new connection, which then works because it has not been idle. That is why disabling Keep-Alive works - there is no attempt to reuse the connection which has been idle for a while. The work round - if one does not want to use staleCheck for everything - is to disable keep-alive for samples that are likely to trigger the idle timeout. We need to have a dev@ discussion on what we might want to do to fix this in JMeter. I'll start that off shortly.
Just discovered something else. I tried testing timeouts against the ASF server. Even if the idle timeout is exceeded, the samples don't fail. Debug shows that this is because the ASF server returns the following header: Keep-Alive: timeout=5, max=100 This is detected by the HC 4.x code: 2014/02/19 19:37:47:619 GMT [DEBUG] headers - << Keep-Alive: timeout=5, max=100 2014/02/19 19:37:47:619 GMT [DEBUG] headers - << Connection: Keep-Alive 2014/02/19 19:37:47:619 GMT [DEBUG] headers - << Content-Type: text/html; charset=utf-8 2014/02/19 19:37:47:697 GMT [DEBUG] BasicClientConnectionManager - Connection can be kept alive for 5000 MILLISECONDS 2014/02/19 19:39:17:712 GMT [DEBUG] BasicClientConnectionManager - Get connection for route {s}->https://www.apache.org 2014/02/19 19:39:17:712 GMT [DEBUG] BasicClientConnectionManager - Connection [id:0][route:{s}->https://www.apache.org][state:null] expired @ Wed Feb 19 19:37:52 GMT 2014 2014/02/19 19:39:17:712 GMT [DEBUG] DefaultClientConnection - Connection 0.0.0.0:4368<->192.87.106.229:443 closed 2014/02/19 19:39:17:712 GMT [DEBUG] DefaultClientConnectionOperator - Connecting to www.apache.org:443 Note that the connection is automatically dropped by the connection manager. I have just reviewed the logs from the private test plan, and these don't return a Keep-Alive header, so the HC code assumes the connection is not timed out. The log shows the following: BasicClientConnectionManager - Connection can be kept alive indefinitely This means that the HC code does not know when to drop the connection. I think it is probably a configuration error on the server; it really ought to tell the client what the keep-alive period is. It probably explains why this issue has not been reported by others as well. It gives a possible solution for the test errors - fix the server config!
We also see the same issue in another testplan, this one using S3, it's feasible that both servers are setup wrongly but this suggests this server config error is common. That or we're just unlucky!
Check the response headers from the servers. Do they include the Keep-Alive header?
No, I'm not seeing the keep-alive header in the response. I've emailed you the plan. I see what you're saying, that the server is sending the wrong response so how can the client know how to act, and sure, I can work around this easily, my concern though is that not sending this header seems to be pretty common so this same issue could impact others. It's not immediately obvious though what the cause of the problem is and I can see how others might first blame the external server, I know I did - it might be that the problem is happening for other people but they are not raising it here... Just a thought.
Looking here: http://en.wikipedia.org/wiki/HTTP_persistent_connection It seems as though the Keep-Alive header is effectively depreciated for HTTP1.1 as all connections are assumed persistent.
(In reply to oliver lloyd from comment #25) > my concern though is that not sending this header seems to be pretty common > so this same issue could impact others. AFAIK you are the first and only person to report it so far. I don't know whether that is because: + 2.11 has not been out for long, or + there are not many misbehaving servers or + not many test plans are using connections that can become idle.
(In reply to oliver lloyd from comment #26) > Looking here: http://en.wikipedia.org/wiki/HTTP_persistent_connection > > It seems as though the Keep-Alive header is effectively depreciated for > HTTP1.1 as all connections are assumed persistent. That is the Keep-Alive _request_ header. The server should send the Keep-Alive _response_ header which has a different (but related) function.
@Oliver, Really big thanks for your follow up and investigations on this. @sebb, I don't think Oliver is the only person who is facing this, I think I wrote somewhere here or on dev list that another person was facing this with 2.11 and tweeted about this. As discussed on dev list, we should: - write a wiki page about this setting - Select a way to invalidate idle connections or connections that have run N requests (s3 case) for non 'clean' servers
(In reply to Philippe Mouawad from comment #29) > @Oliver, > Really big thanks for your follow up and investigations on this. Yes, thanks for the prompt responses to our queries. > @sebb, I don't think Oliver is the only person who is facing this, I think I > wrote somewhere here or on dev list that another person was facing this with > 2.11 and tweeted about this. PK, it would be interesting to know if they were also using S3. > As discussed on dev list, we should: > - write a wiki page about this setting > > - Select a way to invalidate idle connections or connections that have run N > requests (s3 case) for non 'clean' servers I have just run a test against Amazon.com. They also have a server that fails to send a Keep-Alive header. It seems to try to drop the connection after a certain number of requests - by sending the following header: Cneonction: close Yes, it really is spelt that way! What is odd is that it seems the connection still keeps going. In any case, I suspect there is not a problem handling connection drop after N requests, because that will be initiated by the server at the end of the response. But I suppose it may still be worth considering recycling connections after a certain number of requests. The problem with the idle timeout is that it occurs at the start of a new request, when disconnects are not currently expected. Maybe we can handle this in JMeter, or it may need some help from HC.
The typical explanation for a Cneonction header is here: http://stackoverflow.com/questions/4798461/cneonction-and-nncoection-http-headers They say it happens if the web server is fronted with a load balancer that does its own connection handling but doesn't want to do to much protocol adjustment work. By permuting some characters in the name of the connection header the client no longer reacts on the close message of the origin server but the packet sizes and probably TCP checksums do not change. So the implementation on the packet level is cheap. The LB reacts on the connection close itself but has decoupled connection handling between client and LB from the connections between LB and origin server. Now if that is true, it might well be, that the LB itself doesn't shut down the connections cleanly once it decides itself to close a connection from the client. That would fit into sebb's observation. Asa result it should be OK to ignore those misspelled headers, because they are misspelled intentionally to get them ignored. And in this situation the next hop might infact expose a somewhat strange connection close behavior itself.
Added a work-round to provide an idle timeout value if the server does not send the Keep-Alive header. This works, but is not ideal as the same timeout applies to all hosts.
URL: http://svn.apache.org/r1572390 Log: File uploads fail every other attempt using timers. Enable idle timeouts for servers that don't send Keep-Alive headers. Bugzilla Id: 56119 Modified: jmeter/trunk/bin/jmeter.properties jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/sampler/HTTPHC4Impl.java jmeter/trunk/xdocs/changes.xml
Note: if you want to try a nightly build with this fix, it will be in builds after version r1572390. Define the property httpclient4.idletimeout=5000 to set the timeout to 5 seconds for any servers that don't send Keep-Alive headers
Hello Oliver, And feedback ? Thanks
*** Bug 56417 has been marked as a duplicate of this bug. ***
@pam <pramod.desai@abyeti.com>: please don't re-open an old issue without providing further details.
Hello,have anyone know why I met the exception like "org.apache.commons.httpclient.NoHttpResponseException: The server <ip address>failed to respond" although I user the JMeter 2.10?
Created attachment 35394 [details] I see Socket exception when i run my script in jmeter
(In reply to bindu from comment #39) > Created attachment 35394 [details] > I see Socket exception when i run my script in jmeter @bindu (bindusreejava@gmail.com): please don't re-open an old issue. First ask on mailing list and then open a new bugzilla.
This issue has been migrated to GitHub: https://github.com/apache/jmeter/issues/3316