Summary: | mod_proxy_ws_tunnel - AH02447: err/hup on backconn | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Martin Studer <martin.studer> |
Component: | mod_proxy | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | martin.studer |
Priority: | P2 | Keywords: | FixedInTrunk |
Version: | 2.5-HEAD | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All | ||
Attachments: |
read POLLHUPs
Return correct HTTP codes |
Description
Martin Studer
2014-03-21 16:46:40 UTC
Weird, the 2.4.9 change here did not introduce this error or increase its severity. I actually noticed that it seemed to be at an unnecesarrily high level, and the corresponding msg for the client side is TRACE1 instead of NOTICE. Was 2.4.8 just unusable for you so you never saw it? Just trying to make sure this is simply "is NOTICE too high?" vs. "broke in 2.4.9" Created attachment 31428 [details]
read POLLHUPs
Shouldn't POLLUP be handled the like POLLIN, I mean let the proxy read the socket (see attached 2.4.x patch)?
On some systems, EOF triggers POLLHUP, IMHO it's safe to let apr_socket_recv() detect the error vs EOF.
works here, probably need to tidy up the existing NOTICE msg That would also hide real (network/system) errors (often good to know). For others proxy modules (but proxy_connect where the same code is used), the level is error for socket failures. TRACE1 is surely enough for EOF, but real errors deverse more IMHO (maybe not ERR because it's just streaming, but at least a warning). We were using 2.4.7 before and ran into the "infinite loop" issue https://issues.apache.org/bugzilla/show_bug.cgi?id=56145 which led us to upgrade to 2.4.9 (without trying 2.4.8 first). The behavior we are observing is that websocket connections initially work, however that reopening the connection after a while leads to a "AH02447: err/hup on backconn". This leads to the following error on the client side (Google Chrome): WebSocket connection to 'wss://...' failed: Connection closed before receiving a handshake response (In reply to Eric Covener from comment #1) > Weird, the 2.4.9 change here did not introduce this error or increase its > severity. I actually noticed that it seemed to be at an unnecesarrily high > level, and the corresponding msg for the client side is TRACE1 instead of > NOTICE. > > Was 2.4.8 just unusable for you so you never saw it? Just trying to make > sure this is simply "is NOTICE too high?" vs. "broke in 2.4.9" Forgot to mention that after having received that error, retrying to connect generally seems to work (In reply to Martin Studer from comment #5) > We were using 2.4.7 before and ran into the "infinite loop" issue > https://issues.apache.org/bugzilla/show_bug.cgi?id=56145 which led us to > upgrade to 2.4.9 (without trying 2.4.8 first). > > The behavior we are observing is that websocket connections initially work, > however that reopening the connection after a while leads to a "AH02447: > err/hup on backconn". This leads to the following error on the client side > (Google Chrome): > > WebSocket connection to 'wss://...' failed: Connection closed before > receiving a handshake response > > > > (In reply to Eric Covener from comment #1) > > Weird, the 2.4.9 change here did not introduce this error or increase its > > severity. I actually noticed that it seemed to be at an unnecesarrily high > > level, and the corresponding msg for the client side is TRACE1 instead of > > NOTICE. > > > > Was 2.4.8 just unusable for you so you never saw it? Just trying to make > > sure this is simply "is NOTICE too high?" vs. "broke in 2.4.9" Created attachment 31444 [details]
Return correct HTTP codes
This patch fixes something I noticed that might be contributing to the problem. It's based on and includes Yann's patch (I couldn't figure out a good way to create a patch to be applied on top of his).
The problem I noticed was that 500 was returned any time any closure of the WebSocket server occured. When reading from the server in ap_proxy_wstunnel_request(), anything except for a SUCCESS is considered an internal error, including a legitimate EOF. 500 is also returned if POLLERR is received when polling the server; this is a little suspect, since we're not the one having the issue, the WebSocket server is. That's not even considering the fact that the flag doing all this is named client_error, and that it's set in the section where we read from the server. It seemed like a good idea to just remove the client_error flag.
However, doing so triggers another problem. Since ap_proxy_wstunnel_request() now returns OK, ap_discard_request_body() is run as part of the connection cleanup. On a normal Websocket close, this is fine, but something causes ap_discard_request_body() to block indefinitely on ap_get_brigade() when the connection is killed abormally (e.g. by SIGKILL on the server). I think it's something to do with either all the filters that get removed, or because the client's still hanging onto the request - I wasn't able to determine conclusively. The request's thread does eventually get killed by mod_reqtimeout.
The client_error flag "solved" this problem, but in an obviously unsatisfactory way. I've removed it and instead set c->keepalive = AP_CONN_CLOSE, which tells the server that it doesn't need to run the additional processing after the handler returns.
Committed Edwards patch, which included Yann's. Thanks for the report. Fixed and released in 2.4.10 I'm seeing some similar behavior in mod_proxy_connect with error AH01026. Looking at the code in mod_proxy_connect, it looks very similar to the unpatched code that was addressed here for mod_proxy_wstunnel. Does this patch need to be ported there as well? Or does the underlying issue that was fixed here not apply there? (and thus I have something else wrong I need to figure out) (In reply to Matt Hauck from comment #10) > I'm seeing some similar behavior in mod_proxy_connect with error AH01026. Since this PR about mod_proxy_wstunnel is resolved, please create a new one with mod_proxy_connect as component, so that the fix/changelog can reference it. (In reply to Yann Ylavic from comment #11) > Since this PR about mod_proxy_wstunnel is resolved, please create a new one > with mod_proxy_connect as component, so that the fix/changelog can reference > it. Roger. Filed #57168 |