Bug 56664 - Proxy retry timeout does not cope with system time going backwards.
Summary: Proxy retry timeout does not cope with system time going backwards.
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy (show other bugs)
Version: 2.4.9
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-24 13:06 UTC by John-Mark Bell
Modified: 2014-06-24 13:06 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John-Mark Bell 2014-06-24 13:06:53 UTC
If the backend service for a proxied request is not available, then the worker will be placed in the error state until the configured retry interval has elapsed. However, if the system time goes backwards (e.g. if NTP synchronises a badly skewed system clock) after the error time has been set, then all subsequent requests to the proxy will be rejected with a 503 response until system time catches up again. If system time is badly wrong, then this may take many hours. 

Here's an example error log (output via syslog, so there are timestamps):

2014-06-24T16:55:49.641+00:00 system apache2[3649]: [proxy:error] [pid 3649:tid 140284189755136] (111)Connection refused: AH00957: HTTP: attempt to connect to 127.0.0.1:8009 (localhost) failed
2014-06-24T16:55:49.641+00:00 system apache2[3649]: [proxy:error] [pid 3649:tid 140284189755136] AH00959: ap_proxy_connect_backend disabling worker for (localhost) for 60s
2014-06-24T16:55:49.641+00:00 system apache2[3649]: [proxy_http:error] [pid 3649:tid 140284189755136] [client 10.44.99.70:58887] AH01114: HTTP: failed to make connection to backend: localhost
2014-06-24T16:55:54.649+00:00 system apache2[3648]: [proxy:error] [pid 3648:tid 140284189755136] (111)Connection refused: AH00957: HTTP: attempt to connect to 127.0.0.1:8009 (localhost) failed
2014-06-24T16:55:54.649+00:00 system apache2[3648]: [proxy:error] [pid 3648:tid 140284189755136] AH00959: ap_proxy_connect_backend disabling worker for (localhost) for 60s
2014-06-24T16:55:54.649+00:00 system apache2[3648]: [proxy_http:error] [pid 3648:tid 140284189755136] [client 10.44.99.70:58888] AH01114: HTTP: failed to make connection to backend: localhost
2014-06-24T08:54:16.291+00:00 system apache2[3648]: [proxy:error] [pid 3648:tid 140284198156032] AH00940: HTTP: disabled connection for (localhost)
2014-06-24T08:54:21.301+00:00 system apache2[3648]: [proxy:error] [pid 3648:tid 140284181354240] AH00940: HTTP: disabled connection for (localhost)

Note that the system time is almost 6 hours adrift here (this was a virtual machine running on a misconfigured host) which is significantly bad, but it's not entirely uncommon to see clock drift of the order of a few minutes on system boot.

By inspecting the code, I see that the error time is stored[1] as the result of apr_time_now(), which returns the current UTC time. When the retry interval is checked, a simple greater-than comparison is performed[2], which does not cater for time going backwards. 

A fairly simple work around would be to ensure at the point the retry is attempted that the current UTC time is later than the stored error time and to permit the request if it isn't (on the basis that a subsequent failure would return the worker to the error state after the system time has been corrected).

1. http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/proxy/proxy_util.c?revision=1602989&view=markup#l2796

2. http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/proxy/proxy_util.c?revision=1602989&view=markup#l1881