Internet Explorer opens up to four http connections to the web server (configured as reverse proxy). These are in ESTABLISHED state for the duration of the keep-alive timeout period. Then they enter CLOSE_WAIT state, where they stay for a few seconds. If the user tries to go to a new page AND "friendly http error messages" is enabled in Internet Explorer, the infamous "The page cannot be displayed" appears. This has been reproduced with httpd 2.0.52 and 2.0.53 with OpenSSL 0.9.7b, 0.9.7d and 0.9.7g. We were not able to reproduce the bug in 2.0.40 or 2.0.47 (all config files were equal), which makes us believe some sort of bug or incampability with Explorer is introduced some time after 2.0.47. One will have to go via a proxy in order to reproduce it. For performance reasons, it is not an option to disable keep-alive. Disabling SSLv3 "fixes" the problem.
There have been a couple of reports like this recently. Do you have a precise reproduction case with a specific version of MSIE, which only triggers in those particular httpd versions? What's logged to the error log for the SSL vhost? Anyway, the default SSL vhost configuration will disable keepalives for MSIE for the SSL vhost, using the SetEnvIf below; I take it you have disabled this? SetEnvIf User-Agent ".*MSIE.*" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0
Please also include the full SSL configuration you're using,
Created attachment 14715 [details] SSL configuration
For performance reasons (throughput was halfed without keep-alive) the default SetEnvIf has been disabled. As 95% of our users use IE, it is not an option to use this setting. We tried to include just SetEnvIf User-Agent ".*MSIE.*" ssl-unclean-shutdown, but it did not fix the problem. We have also tried to run without mod_deflate, but that didn't help either. My full IE version is 6.0.2800.1106.xpsp2_gdr.040517-1325CO with updates Q323308, Q832894, Q837009 and Q867801. We have received error reports from users with different versions, though. The full SSL config file is now attached (virtual section included).
Nothing is logged in the ssl logs on the server. In fact I don't think the request reaches the server at all, as Explorer seems to try to use a connection that is in CLOSE_WAIT state. One could argue that this seems like an Explorer bug, but as we easily reproduce this problem in 2.0.52/53 and not in 40/47, some unfortunate change seems to have been introduced after 47.
OK, more things it would be useful to try: 1. MSIE can be sensitive to session caching; try switching to the SSLSessionCache shmcb:... line 2. it would be useful to narrow down where the regression occurs; particularly, if it works with 2.0.48 and fails with 2.0.49, that's useful; there was a significant change to the SSL connection closure handling there, but it shouldn't take effect if you have ssl-unclean-shutdown configured for *MSIE*. 3. get some useful logs; add to the SSL vhost config: LogLevel debug ErrorLog logs/ssl_debug_log and attach the resultant ssl_debug_log showing the reproduction of the failure (or better yet; a before-and-after with a version which works and one which doesn't)
Created attachment 14724 [details] SSL debug log from 2.0.40 (no error) Performed the following steps: 1. Navigated to https://10.110.64.26/ega/connectiontest/index.html (this page contains a lot of large images so that explorer sets up many connections 2. Waited until all connetions were in CLOSE_WAIT state (~30 secs) 3. Navigated to https://10.110.64.26/ega/ 10.110.5.11 is the proxy server 10.110.64.26 is the reverse proxy server 10.110.64.6 is the web server
Created attachment 14725 [details] SSL debug log from 2.0.52 (page cannot be displayed) Performed the following steps: 1. Navigated to https://10.110.64.26/ega/connectiontest/index.html (this page contains a lot of large images so that explorer sets up many connections 2. Waited until all connetions were in CLOSE_WAIT state (~30 secs) 3. Navigated to https://10.110.64.26/ega/ 10.110.5.11 is the proxy server 10.110.64.26 is the reverse proxy server 10.110.64.6 is the web server
Created attachment 14726 [details] SSL debug log from 2.0.52 ssl2 (no error) Performed the following steps: 1. Navigated to https://10.110.64.26/ega/connectiontest/index.html (this page contains a lot of large images so that explorer sets up many connections 2. Waited until all connetions were in CLOSE_WAIT state (~30 secs) 3. Navigated to https://10.110.64.26/ega/ 10.110.5.11 is the proxy server 10.110.64.26 is the reverse proxy server 10.110.64.6 is the web server
schmcb caching did not help. I have uploaded some log files from 2.0.52 and 2.0.40. If you still need more info I can try to narrow down where the problem occurs in a few days.
I have now verified that the error is reproducable in 2.0.49 but not in 2.0.48.
I looked at the code and it seems the problem is that ssl-unclean-shutdown is ignored. If you change the default: behaviour in the switch in ssl_filter_io_shutdown() in ssl_engine_io.c to unclean the problem disappears. I guess sslconn->shutdown_type should be set by ssl_configure_env in ssl_engine_kernel.c, but it seems like this function is not run at all. I don't know the httpd architecture well enough to find why not. The reason it worked in 2.0.48 is that the block else if (AP_BUCKET_IS_EOC(bucket)) { /* The special "EOC" bucket means a shutdown is needed; * - turn off buffering in bio_filter_out_write * - issue the SSL_shutdown */ filter_ctx->nobuffer = 1; status = ssl_filter_io_shutdown(filter_ctx, f->c, 0); if (status != APR_SUCCESS) { ap_log_error(APLOG_MARK, APLOG_INFO, status, NULL, "SSL filter error shutting down I/O"); } if ((status = ap_pass_brigade(f->next, bb)) != APR_SUCCESS) { return status; } break; } was inserted in ssl_io_filter_output in ssl_engine_io.c. Does this make sense?
That does make perfect sense, I was wondering whether that might be the issue. But we now need to work out why the shutdown_type is not getting set; I'll try some tests here. Thanks a lot!
Ugh, yes. ssl_configure_env is called from mod_ssl's ssl_hook_Translate, but that won't run if, e.g. mod_proxy's translate_name hook is run first and returns OK, as happens in the reverse-proxy case. I have no idea why that section of mod_ssl code needs to be in a translate_name hook, it's probably historical. If it can be moved somewhere more sensible this will work.
Created attachment 14750 [details] proposed mod_ssl fix Here's a patch which should fix this; it moves the ssl_configure_env call to the post_read_request hook, and runs said hook slightly later to ensure that it runs later than the mod_setenvif post_read_request hook (a quick hack for testing purposes, I'll do better when committing this). Patch should apply against 2.0.5[34]-ish.
Created attachment 14804 [details] equivalent patch for 2.0.x backport proposal This patch is to be proposed for backport to 2.0.x and is essentially equivalent to the previous patch.
Fixed on the trunk: http://svn.apache.org/viewcvs?view=rev&rev=161958 and proposed for backport to 2.0.x. Thanks again for your help debugging this issue!
Does anybody know whether or not this fix was incorporated in the 2.2 tree? We recently upgraded from 2.0.48 to 2.2.0 and are now seeing a similiar issue to what's reported here. Let me know if you require additional information.
This patch is also contained in 2.2.0.
*** Bug 20641 has been marked as a duplicate of this bug. ***