Created attachment 35288 [details] mod_proxy_http patch For a POST request proxied through mod_proxy_http, it looks like a response timeout is always handled by this block even after 100-continue has been received: ap_log_rerror(APLOG_MARK, APLOG_ERR, rc, r, APLOGNO(01102) "error reading status line from remote " "server %s:%d", backend->hostname, backend->port); if (APR_STATUS_IS_TIMEUP(rc)) { apr_table_setn(r->notes, "proxy_timedout", "1"); ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, r, APLOGNO(01103) "read timeout"); if (do_100_continue) { return ap_proxyerror(r, HTTP_SERVICE_UNAVAILABLE, "Timeout on 100-Continue"); } } For instance, logging shows a 100 response is received, but a "Timeout on 100-continue" is still indicated: [Fri Sep 01 15:59:25.264579 2017] [proxy_http:trace2] [pid 13455:tid 140308050339584] mod_proxy_http.c(1536): [client 127.0.0.1:55835] HTTP: received interim 100 response [Fri Sep 01 15:59:25.264583 2017] [dumpio:trace7] [pid 13455:tid 140308050339584] mod_dumpio.c(140): [remote 127.0.0.1:8080] mod_dumpio: dumpio_in [getline-blocking] 0 readbytes [Fri Sep 01 15:59:26.265697 2017] [dumpio:trace7] [pid 13455:tid 140308050339584] mod_dumpio.c(151): [remote 127.0.0.1:8080] mod_dumpio: dumpio_in - 70007 [Fri Sep 01 15:59:26.265753 2017] [proxy_http:error] [pid 13455:tid 140308050339584] (70007)The timeout specified has expired: [client 127.0.0.1:55835] AH01102: error reading status line from remote server 127.0.0.1:8080 [Fri Sep 01 15:59:26.265769 2017] [proxy_http:debug] [pid 13455:tid 140308050339584] mod_proxy_http.c(1264): [client 127.0.0.1:55835] AH01103: read timeout [Fri Sep 01 15:59:26.265785 2017] [proxy:error] [pid 13455:tid 140308050339584] [client 127.0.0.1:55835] AH00898: Timeout on 100-Continue returned by /app/page This can result in unwanted worker error states. Attached is a patch that addressed this on my end.
Fixed in trunk by r1807318
I found that Apache is “marking down” a JVM once ProxyTimeout elapsed. This is what happens: 1. A process got kicked off on a JVM. Let’s assume it is going to take lots of time(10 min) to complete. 2. While this processing is halfway, ProxyTimeout(5 min) elapsed. 3. Then Apache completely ignores default failontimeout=off setting and marks the JVM down for next 180 Sec(retry value). 4. Problem started! 5. Existing users logged on that JVM via that Apache instance This behavior sounds like bug to me because: - If you forcefully failed a HTTP GET request by elapsing ProxyTimeout, Apache *do not* mark the JVM down. It only fails that long running request with 502 error. That is expected. - If you do the same thing for a HTTP POST request, Apache *mark the JVM down*. I am pretty sure, this is *NOT* a desired behavior. I can easily reproduce the issue with Apache/2.4.10/2.4.25/2.4.28 versions. My questions are: Is this fix is also going to take care this ProxyTimeout and other different types of timeout scenarios? If yes, from which version this fix will be generally available? If not, please let me know if I can open a separate bug. Thanks! Suvendu Logs: [Thu Nov 02 19:25:39.810408 2017] [proxy_http:trace1] [pid 12484:tid 1204] mod_proxy_http.c(1904): [client ::1:59825] HTTP: serving URL http://localhost:8080/second.jsp, referer: http://localhost/ [Thu Nov 02 19:25:39.810408 2017] [proxy:debug] [pid 12484:tid 1204] proxy_util.c(2154): AH00942: HTTP: has acquired connection for (localhost) [Thu Nov 02 19:25:39.810408 2017] [proxy:debug] [pid 12484:tid 1204] proxy_util.c(2208): [client ::1:59825] AH00944: connecting http://localhost:8080/second.jsp to localhost:8080, referer: http://localhost/ [Thu Nov 02 19:25:39.810408 2017] [proxy:debug] [pid 12484:tid 1204] proxy_util.c(2417): [client ::1:59825] AH00947: connected /second.jsp to localhost:8080, referer: http://localhost/ [Thu Nov 02 19:25:39.811908 2017] [proxy_http:trace3] [pid 12484:tid 1204] mod_proxy_http.c(1375): [client ::1:59825] Status from backend: 100, referer: http://localhost/ [Thu Nov 02 19:25:39.811908 2017] [proxy_http:trace2] [pid 12484:tid 1204] mod_proxy_http.c(1536): [client ::1:59825] HTTP: received interim 100 response, referer: http://localhost/ [Thu Nov 02 19:25:49.812161 2017] [proxy_http:error] [pid 12484:tid 1204] (OS 10060)A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. : [client ::1:59825] AH01102: error reading status line from remote server localhost:8080, referer: http://localhost/ [Thu Nov 02 19:25:49.812161 2017] [proxy_http:debug] [pid 12484:tid 1204] mod_proxy_http.c(1265): [client ::1:59825] AH01103: read timeout, referer: http://localhost/ [Thu Nov 02 19:25:49.812161 2017] [proxy:error] [pid 12484:tid 1204] [client ::1:59825] AH00898: Timeout on 100-Continue returned by /second.jsp, referer: http://localhost/ [Thu Nov 02 19:25:49.812655 2017] [proxy:debug] [pid 12484:tid 1204] proxy_util.c(2169): AH00943: HTTP: has released connection for (localhost) [Thu Nov 02 19:25:49.812655 2017] [proxy:debug] [pid 12484:tid 1204] proxy_util.c(1904): AH00933: BALANCER: too soon to retry worker for (localhost) [Thu Nov 02 19:25:49.812655 2017] [proxy_balancer:debug] [pid 12484:tid 1204] mod_proxy_balancer.c(307): [client ::1:59825] AH01160: Found value DAE82EC9814CC59C66442A5925C32EE7.Tomcat_1 for stickysession JSESSIONID, referer: http://localhost/ [Thu Nov 02 19:25:49.813155 2017] [proxy_balancer:debug] [pid 12484:tid 1204] mod_proxy_balancer.c(320): [client ::1:59825] AH01161: Found route Tomcat_1, referer: http://localhost/ [Thu Nov 02 19:25:49.813155 2017] [proxy:debug] [pid 12484:tid 1204] proxy_util.c(1904): AH00933: BALANCER: too soon to retry worker for (localhost) [Thu Nov 02 19:25:49.813155 2017] [proxy_balancer:error] [pid 12484:tid 1204] [client ::1:59825] AH01167: balancer://lb-mycluster: All workers are in error state for route (Tomcat_1), referer: http://localhost/
Can someone please let me know when this fix will be GA'ed?
(In reply to suv3ndu from comment #3) > Can someone please let me know when this fix will be GA'ed? I believe this issue is not present in 2.4.x, especially because the patched code block is not present in 2.4.x anymore.