Bug 36448 - mod_proxy hangs when the backend sends 100 Continue after 200 OK
Summary: mod_proxy hangs when the backend sends 100 Continue after 200 OK
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy (show other bugs)
Version: 2.0.54
Hardware: All FreeBSD
: P2 major (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2005-08-31 21:01 UTC by Vahe Khachikyan
Modified: 2008-12-26 11:10 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vahe Khachikyan 2005-08-31 21:01:29 UTC
In reverse proxy configuration mod_proxy hangs if it encounters a buggy backend
which sends a 100 Continue header after an 200 OK or some other response other
than 100. There is some versions of IIS which send 100 Continue after they
successfully sent the 200 OK with content. The 100 Continue header field hangs
then around somewhere in TCP stack. 

During next request if Keep-Alive is used, mod_proxy tries to reuse the socket,
and tries to ensure that the socket can be reused. For this it tries to read one
byte from the socket and checks the return status to see if it was EOS.

Hmmm. what happens with the buggy IIS servers behaving as described above is
following: 
One byte read returns really one byte 'H' from HTTP/1.1 100 Continue and the
rest "TTP/1.1 100 Continue" stays in the TCP stack. Now mod_proxy reuses the
socket and sends new request. When it tries to read the response "TTP/1.1 100
Continue" is read and the rest of processing goes foo bar. Finally it hangs then
somewhere in the poll untill the poll returns with timeout. After coming back
from timeout (approx 2 minutes) it packs the rest of the response headers as a
content and sends it to client browser/(upper proxy in the chain).

Below is a patch which seams to solve this problem.
*** proxy_http.c_orig   Wed Aug 31 19:46:59 2005
--- proxy_http.c        Wed Aug 31 19:52:26 2005
***************
*** 298,306 ****
          socket_status = apr_recv(p_conn->sock, test_buffer, &buffer_len);
          /* put back old timeout */
          apr_socket_timeout_set(p_conn->sock, current_timeout);
!         if ( APR_STATUS_IS_EOF(socket_status) ) {
              ap_log_error(APLOG_MARK, APLOG_INFO, 0, NULL,
!                          "proxy: previous connection is closed, creating a new
connection.");
              new = 1;
          }
      }
--- 298,306 ----
          socket_status = apr_recv(p_conn->sock, test_buffer, &buffer_len);
          /* put back old timeout */
          apr_socket_timeout_set(p_conn->sock, current_timeout);
!         if ( APR_STATUS_IS_EOF(socket_status) || buffer_len >0) {
              ap_log_error(APLOG_MARK, APLOG_INFO, 0, NULL,
!                          "proxy: previous connection is closed or broken,
creating a new connection.");
              new = 1;
          }
      }
Comment 1 William A. Rowe Jr. 2005-09-07 01:56:58 UTC
 "There is some versions of IIS which send 100 Continue after they successfully 
  sent the 200 OK with content."

 Well, no pity for wholy violating the RFC; however I have to 100% agree with
 your patch since it adds further possible identification of proxy response
 splitting and other RFC violations(!)
Comment 2 Joe Orton 2005-09-07 11:59:19 UTC
It's not *after* the last request, it's *before* the next request; it is 
legitimate behaviour, try this patch:

http://people.apache.org/~jorton/ap_proxy100.patch

*** This bug has been marked as a duplicate of 16518 ***
Comment 3 Vahe Khachikyan 2005-09-07 13:46:34 UTC
(In reply to comment #2)
> It's not *after* the last request, it's *before* the next request; it is 
> legitimate behaviour, try this patch:
> 
I hope here is no missunderstanding, cause IMHO "after" or "before" it 
doesn't metter! The same thing will hapen even if we imagine it is before.
I say imagine because my ktrace shows exactly that the "HTTP 100 Continue"
is in TCP stack before we even perform the write for next "GET /blah blah..."

> http://people.apache.org/~jorton/ap_proxy100.patch
I'll try your patch but I am afraid it can't solve this problem!
Take a look at this snippest of ktrace
-------------
...
 71228 httpd    CALL  read(0x18,0xbfbfe7af,0x1)
 71228 httpd    GIO   fd 24 read 1 byte
       "H"
 71228 httpd    RET   read 1
 71228 httpd    CALL  writev(0x18,0xbfbfe610,0x3)
 71228 httpd    GIO   fd 24 wrote 1550 bytes
       "POST /Blah/blah.asmx HTTP/1.1\r
        Host: hostname.com\r
...
...
...
 71228 httpd    RET   writev 1550/0x60e
 71228 httpd    CALL  writev(0x18,0xbfbfe610,0x1)
 71228 httpd    GIO   fd 24 wrote 429 bytes
       "<soap:Envelope xmlns:soap="
....
....
....
 71228 httpd    RET   writev 429/0x1ad
 71228 httpd    CALL  read(0x18,0x81f6028,0x1f40)
 71228 httpd    GIO   fd 24 read 24 bytes
       "TTP/1.1 100 Continue\r
        \r
       "
 71228 httpd    RET   read 24/0x18
...
----------------
And the result of this last read "TTP/1.1 100 Continue" is what we later pass to 
apr_date_checkmask() (code below and line numbers are from proxy_http.c 2.0.54)

710       /* Is it an HTTP/1 response?
711        * This is buggy if we ever see an HTTP/1.10
712        */
713        if (apr_date_checkmask(buffer, "HTTP/#.# ###*")) {
...
We never come in to this if block! Instead we will land here which is 
plain wrong!
...
807        } else {
808            /* an http/0.9 response */
809            backasswards = 1;
810            r->status = 200;
811            r->status_line = "200 OK";
812            p_conn->close += 1;
813        }
Comment 4 Joe Orton 2005-09-07 14:02:43 UTC
That's interesting but not conclusive; it could just mean the proxy has failed
to consume multiple 100 responses from the previous request on that connection
(also quite valid).  ethereal/tcpdump (with -s0!) captures for both client/proxy
and proxy/backend connection would be useful if you're sure that the server is
really issuing a 100 *after* a non-1xx response.
Comment 5 Vahe Khachikyan 2005-09-07 15:05:39 UTC
(In reply to comment #4)
> That's interesting but not conclusive; it could just mean the proxy has failed
> to consume multiple 100 responses from the previous request on that connection
> (also quite valid).  ethereal/tcpdump (with -s0!) captures for both client/proxy
Ah OK that trace has surely also an earlier phase which looks like

 71228 httpd    CALL  read(0x18,0x81f4028,0x1f40)
 71228 httpd    GIO   fd 24 read 1658 bytes
       "HTTP/1.1 200 OK\r
...
...
 71228 httpd    RET   read 1658/0x67a

this is the last response from backend before the next POST mentioned
in the trace in previous comment. I think there is no way of leaving 
some content selectively in the TCP stack. And there is only one 
place in the code proxy_http.c where one byte read is performed...

> and proxy/backend connection would be useful if you're sure that the server is
> really issuing a 100 *after* a non-1xx response.
I am quite sure that the server does this! I had also the tcpdump traces for
this scenario. Reproducing this problem again is a quite time consuming task,
however I'll attach the trace results when I come up with this test.
Comment 6 Vahe Khachikyan 2005-09-07 15:14:05 UTC
> this is the last response from backend before the next POST mentioned
> in the trace in previous comment. 
And I forgot to mention that this is the last read on fd 24 before the 
read which returns that one byte with "H"

Comment 7 William A. Rowe Jr. 2005-09-07 17:50:41 UTC
  Joe saiz "It's not *after* the last request, it's *before* the next request; 
  it is legitimate behaviour..."

  Ah ha :)  However; it's invalid prior to a REQUEST being sent to the origin
  server, no?  It would only be valid after the 2nd request was submitted
  (e.g. HEAD - test resource, POST - transmit resource).

  So if we did a look-ahead for any bytes to be read from the backend stream
  before we sent the subsequent request, and saw lingering bytes (other than
  perhaps redundant CR/LF pairs), then we could safely terminate the backend
  proxy, presume it is poisioned, and begin the next request on another, new
  connection.  If there were no bytes to be read from the backend, after we
  consume a few extra/redundant CR/LF gaps, then we would trust the backend
  had not split a response.

  The 100 responses are only valid between the moment a request received, 
  and the final result code has been sent.  So Joe's right, it could be this
  'extra' 100 continue is in response to the subsequent request.  But it's
  altogether invalid as a 'keepalive indicator' between requests.
Comment 8 Vahe Khachikyan 2005-09-07 20:39:42 UTC
> So Joe's right, it could be this
>   'extra' 100 continue is in response to the subsequent request.  But it's
>   altogether invalid as a 'keepalive indicator' between requests.

Yep! The 1xx is valid between the response and the final request but ^not^ after
final response and ^before^ the subsequent request. At least this is my
understanding of section 10.1 in rfc 2616.

I am just wondering if there is a better way to check whether the socket
communication is still alive without trying to read at least one byte from the
socket? IMHO there is no way of doing this at least not for the portable code?!

Another solution to this problem could be let say trying to read a byte and if
it is read and is not EOS someway hand it over to the processing module, which
will assemble it together with the bytes read later. The advantage will be that
we will ^probably^ be able to reuse a keep-alive connection. Well but this
advantage seams to have a quite high price! I would vote for closing the
poisoned socket and open a new one...
Comment 9 Nick Kew 2008-01-12 12:08:27 UTC
Does this still happen in 2.2.8?
Comment 10 Nick Kew 2008-12-26 11:10:02 UTC
Assuming fixed in recent 2.2 versions.  Please reopen if I'm wrong.