Apache HTTPD 2.0.53 + OpenSSL 0.9.7d compiled with gcc 3.3 on Solaris 8: Any time I attempt to POST a large file (> 40 MB) from Internet Explorer I receive the following lines in my access log, showing the initial GET request and subsequent POST: [01/Apr/2005:13:47:57 -0600] 4846040534 10.1.20.92 SSLv3 RC4-MD5 "GET /upload_error/upload.html HTTP/1.1" 200 257 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MGIC; .NET CLR 1.1.4322)" [01/Apr/2005:13:48:02 -0600] 4846040534 10.1.20.92 SSLv3 RC4-MD5 "POST /upload_error/upload.cgi HTTP/1.1" 70014 539 "https://mytestsite.mgic.com/upload_error/upload.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MGIC; .NET CLR 1.1.4322)" The transfer aborts with the file being truncated, and Internet Explorer returns an unhelpful error page, instead of the successful result. Note the return code in the POST line of 70014 -- which is an SSL EOF code. Following is the CGI code I am using to reproduce this error. #!/usr/bin/perl my $bytesread = 0; my $i = 0; open(OUTPUT, ">/tmp/upload.stdin.$$"); while($i = read(STDIN,$buf,1024)) { print OUTPUT $buf; $bytesread += $i; } close(OUTPUT); print "Content-type: text/plain\n\n"; print "$bytesread bytes successfully posted."; exit; This also fails with a simple while (<STDIN>) { print OUTPUT $_; } I know the gut reaction to this is "Don't use Internet Explorer" but I have customers accessing this page -- I cannot control that. This problem does NOT occur in the following configurations that I have tested: XP Service Pack 2 + MSIE XP Service Pack 2 + FireFox XP Service Pack 1 + FireFox XP Service Pack 1 + Mozilla Linux + FireFox So I am thinking we are looking at some form of Microsoft induced weirdness from XP Service Pack 1 (I have not tested with any other MS OS + IE, as I do not have access to those configurations, it may fail in all versions of IE prior to what came with XP SP2.) If there is any way to track down this oddity and provide a workaround, it would be appreciated. I can reproduce easily, and will be happy to provide additional information if requested.
Can you include the full configuration you use for the SSL vhost? Is anything logged to the SSL vhost's error_log?
<IfDefine SSL> Listen 10.2.201.36:443 AddType application/x-x509-ca-cert .crt AddType application/x-pkcs7-crl .crl SSLPassPhraseDialog builtin SSLSessionCache shm:/logfiles/apache/rd.transfer.mgic.com/ssl_scache SSLSessionCacheTimeout 300 #SSLMutex file:/logfiles/apache/rd.transfer.mgic.com/ssl_mutex SSLMutex sem SSLRandomSeed startup builtin SSLRandomSeed connect builtin <VirtualHost 10.2.201.36:443> DocumentRoot "/home2/rd.transfer.mgic.com" ServerName rd.transfer.mgic.com:443 ServerAdmin inet_admin@mgic.com SSLEngine on #SSLCipherSuite SSLv3:!EXP:!LOW SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL SSLCertificateFile /usr/local/apache2/conf/ssl.crt/rd.transfer.mgic.com.crt SSLCertificateKeyFile /usr/local/apache2/conf/ssl.key/rd.transfer.mgic.com.key SSLCertificateChainFile /usr/local/apache2/conf/ssl.crt/eappqa2-new-chain-cert ##SSLVerifyClient 0 SSLVerifyClient none <Files ~ "\.(cgi|shtml|phtml|php3?)$"> SSLOptions +StdEnvVars </Files> <Directory "/usr/local/apache2/cgi-bin"> SSLOptions +StdEnvVars </Directory> #SetEnvIf User-Agent ".*MSIE.*" \ # nokeepalive ssl-unclean-shutdown \ # downgrade-1.0 force-response-1.0 BrowserMatch "MSIE" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0 LogFormat "%t %u %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" sslcombined CustomLog "|/usr/local/sbin/cronolog /logfiles/apache/rd.transfer.mgic.com/%Y%m%d-access.log" sslcombined </VirtualHost> </IfDefine> And the error log, on Loglevel info: [Mon Apr 04 09:26:31 2005] [info] Connection to child 3 closed with unclean shutdown(server rd.transfer.mgic.com:443, client 10.2.20.92) Loglevel debug actually logs so much because of the SSL dump that it's difficult to tell whether the process is failing because httpd is churning so much data to the disk or whether I'm reproducing the same issue. I commented out the ssl_io_data_dump in ssl_io_engine.c to eliminate that and there wasn't much useful additional data. The transfer just suddenly stops.
Any difference if you switch the session cache from shm: to shmcb:? (MSIE can be very sensitive to session caching) This is reproducible *every* time you attempt the POST with that particular browser? If you could attach the logs at debug level that might show something.
Just the tail-end of the debug info, that is, yes!
What, you don't want all ~30 megs? :) Here's the last few lines, without ssl_io_data_dump: [Mon Apr 04 10:13:51 2005] [debug] ssl_engine_io.c(1511): OpenSSL: read 1196/8208 bytes from BIO#186f10 [mem: 1ba9f5] (BIO dump follows) [Mon Apr 04 10:13:51 2005] [debug] ssl_engine_io.c(1511): OpenSSL: read 565/7012 bytes from BIO#186f10 [mem: 1baea1] (BIO dump follows) [Mon Apr 04 10:13:51 2005] [debug] ssl_engine_io.c(1511): OpenSSL: read 6447/6447 bytes from BIO#186f10 [mem: 1bb0d6] (BIO dump follows) [Mon Apr 04 10:13:51 2005] [debug] ssl_engine_io.c(1511): OpenSSL: read 5/5 bytes from BIO#186f10 [mem: 1ba9f0] (BIO dump follows) [Mon Apr 04 10:13:51 2005] [debug] ssl_engine_io.c(1511): OpenSSL: read 1548/4627 bytes from BIO#186f10 [mem: 1ba9f5] (BIO dump follows) [Mon Apr 04 10:13:51 2005] [debug] ssl_engine_io.c(1511): OpenSSL: read 376/3079 bytes from BIO#186f10 [mem: 1bb001] (BIO dump follows) [Mon Apr 04 10:13:51 2005] [debug] ssl_engine_io.c(1511): OpenSSL: read 2703/2703 bytes from BIO#186f10 [mem: 1bb179] (BIO dump follows) [Mon Apr 04 10:13:51 2005] [info] Connection to child 3 closed with unclean shutdown(server rd.transfer.mgic.com:443, client 10.2.20.92) Reproducible 9 times out of 10. Changed to shmcb and still no change. Oddly enough, when requests are made to this site through an apache proxy running on another host, the transfer completes without incident (but it's very slow.) Also, running with ssl_io_data_dump uncommented, I do not get a failure (and it is also very slow, because it's writing every byte out to the log.)
This is an issue with Solaris 8 TCP stack settings, not Apache. I was able to resolve this issue by applying the following settings: ip_ignore_redirect 1 tcp_conn_grace_period 0 tcp_conn_req_max_q 1024 tcp_conn_req_max_q0 4096 tcp_conn_req_min 1 tcp_cwnd_max 1048576 tcp_fin_wait_2_flush_interval 675000 tcp_ip_abort_cinterval 180000 tcp_ip_abort_interval 480000 tcp_keepalive_interval 7200000 tcp_recv_hiwat 49152 tcp_rexmit_interval_initial 3000 tcp_rexmit_interval_max 60000 tcp_rexmit_interval_min 400 tcp_slow_start_initial 4 tcp_time_wait_interval 60000 tcp_xmit_hiwat 49152 Which incidentally are the default settings in Solaris 9.
Further research yields more data: TCP settings help mask the problem somewhat but failures still result when posting > 40 MB files from IE via an SSL connection. The failure mode is interesting -- a script which reads the input stream from <STDIN> by simply reading bytes until $ENV{CONTENT_LENGTH} is reached will eventually be killed by mod_cgi because the data stream is cut off before the filesize is reached and the script happily continues to read 0 byte blocks for a few loops until it receives SIGTERM. A script which reads until the read() returns 0 bytes will "complete" when it sees the 0 byte block but will result in a filesize less than $ENV{CONTENT_LENGTH} because at some point in the transfer, one endpoint will close the connection before the file is completely transferred. These failures, at least in our test environment, result about 1 out of every 10 transfers. Based on the relatively infrequent failure rate, I began looking for likely culprits for variability in the environment. Our "test" server is a Sun E250 (Ultrasparc-II CPU 400 Mhz) which experiences the failures. Our "qa" server is a Sun E250 (Ultrasparc-II CPU 2x 300 Mhz) which also experiences failures. And our production server is an E6500 (Ultrasparc-II CPU 14x 400 Mhz) and, still experiences failures. On a whim, I set up our test environment on a Sun 280R (Ultrasparc-III CPU, 2x 750 Mhz) which is a faster, newer server with a much better pair of CPUs. Over about 80 tests I did not experience one failure. I then reviewed what other steps we took to attempt to troubleshoot this problem: 1. Put the site behind an Apache proxy (mod_proxy) which halved the transfer speed, but masked the problem -- no failures observed in this configuration. 2. Put the site behind a load balancer appliance (BigIP) which also somewhat slowed the transfer speed, and masked the problem -- also no failures observed. 3. Turned on Loglevel debug in httpd.conf -- seriously slowed down the transfer, but masked the problem, as every packet is being written to the error_log. Because we are looking at only one thread handling the SSL decryption I have considered that the old processors simply cannot keep up with the datastream and some buffer, somewhere in the SSL code either inside Apache or in OpenSSL (0.9.7g) is filling up and not being emptied correctly. If the transfer is bottlenecked somewhere else, this buffer isn't being filled. The limit of my troubleshooting capability has been exceeded at this point, so someone who knows more about this process will need to take over. I can provide as much information as is requested -- except an apache debug session, as no failures are observed when this is on.
Are you still seeing this problem, David? It would be very useful if you could try 2.0.54 plus these two patches: http://people.apache.org/~jorton/httpd-2.0.54-sslnbio.patch http://people.apache.org/~jorton/httpd-2.0.52-sslbuff.patch
Just a confirmation, I am seeing this problem on FreeBSD 6.0 with 2.0.55 ns.0gino.jp - - [06/Feb/2006:08:52:38 -0800] "POST /blog/xmlrpc.php HTTP/1.1" 70014 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" ns.0gino.jp - - [06/Feb/2006:08:52:39 -0800] "POST /blog/xmlsrv/xmlrpc.php HTTP/1.1" 70014 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" ns.0gino.jp - - [06/Feb/2006:08:52:40 -0800] "POST /blogs/xmlsrv/xmlrpc.php HTTP/1.1" 70014 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" ns.0gino.jp - - [06/Feb/2006:08:52:41 -0800] "POST /drupal/xmlrpc.php HTTP/1.1" 70014 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" ns.0gino.jp - - [06/Feb/2006:08:52:42 -0800] "POST /phpgroupware/xmlrpc.php HTTP/1.1" 70014 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" ns.0gino.jp - - [06/Feb/2006:08:52:43 -0800] "POST /wordpress/xmlrpc.php HTTP/1.1" 70014 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" ns.0gino.jp - - [06/Feb/2006:08:52:46 -0800] "POST /xmlrpc/xmlrpc.php HTTP/1.1" 70014 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" ns.0gino.jp - - [06/Feb/2006:08:52:47 -0800] "POST /xmlsrv/xmlrpc.php HTTP/1.1" 70014 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" I could try the patches, but I want to make sure you weren't trying to solve a Solaris-specific problem...
FYI, here are the error logs for the same period. Notice that these are simple file does not exist errors. [Fri Feb 06 08:52:38 2006] [error] [client 219.117.208.209] File does not exist: /u/default_site/www/server_common/blog [Fri Feb 06 08:52:39 2006] [error] [client 219.117.208.209] File does not exist: /u/default_site/www/server_common/blog [Fri Feb 06 08:52:40 2006] [error] [client 219.117.208.209] File does not exist: /u/default_site/www/server_common/blogs [Fri Feb 06 08:52:41 2006] [error] [client 219.117.208.209] File does not exist: /u/default_site/www/server_common/drupal [Fri Feb 06 08:52:42 2006] [error] [client 219.117.208.209] File does not exist: /u/default_site/www/server_common/phpgroupware
In a B2B model where the apache act as a reverse HTTPS proxy, we have the same problem with small post files: 172.31.48.154 - - [03/Mar/2006:09:30:13 +0100] "POST /b2b/test.eip HTTP/1.1" 70014 548 "-" "-" Apache version: 2.0.53 OpenSSL version: 0.9.7.e
This is a fairly major problem, leaving many hits uncounted. We're seeing an average of 1k "corrupted" entries per night, per webserver. All of these so called "corrupted" entries are invalid status codes.
What do you see if you insert mod_diagnostics in the input filter chain, for a successful and a failed request?
The fact that the bogus "70014" status code is logged to the error_log on an SSL timeout is bug 31759; this is not necessarily symptomatic of a behavioural bug, but is merely a cosmetic logging issue. It can occur if an SSL connection times out or is forcibly closed by the client (i.e. under normal operating conditions). If you are using a reverse proxy, then you may be suffering from bug 34452 -- use http://people.apache.org/~jorton/httpd-2.0.52-sslbuff.patch to fix that. If you are using SSLVerifyClient in a per-directory or per-location context which is POSTed too, see bug 12335.
(sorry, posted previous comment too quickly) Finally: make sure you are using the shmcb session cache, and have the MSIE-specific BrowserMatch/SetEnvIf statement configured in the correct SSL vhost context: BrowserMatch ".*MSIE.*" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0
Calling this a "cosmetic logging issue" makes it sound less important. 200k of invalid log lines each night that must be manually mangled for proper log processing is not "cosmetic". In fact, an argument could be made that this is a security problem, because most people aren't doing this thus many log lines are effectively invisible to them. This is a MAJOR problem and it needs to get fixed, rather than focusing on some minor trivia new feature.
I'd also like to point out that downgrading ALL versions of MSIE is wielding a big blunt stick. It may work fine for developers on their home systems, but it slows down the users with modern versions of MSIE too much. Too many complaints for large production sites.
The situation is more complicated then you might think off. Even IE6 behaves differently and it is impossible to detect on server side which patches have been applied to it. See also http://mail-archives.apache.org/mod_mbox/httpd-dev/200511.mbox/%3c20051122135629.03A2882D02@cmcodec02.st1.spray.net%3e So if you can provide more concrete information and help on this topic this is welcome.
What IE does and doesn't do is not relevant to this APACHE bug. IE doesn't log messages in Apache's access log. What help do you need from me? As I understand it, this issue has been known about for over a year now, has been clearly identified... and yes nobody will bother to commit a fix. So all of us running production servers have to pre-process our logfiles cleaning out these invalid return codes before processing the logs.
My remark was regarding your comment #17. If you would like to see this log file issue fixed prepare a patch and attach it. Although there is no guarantee that it gets accepted it may speed up a change in the codebase and at least it solves the problem on your servers. Thats the nice thing of open source software: If the developers don't fix it you can do it for yourself (or pay someone to do it).
For the record, the problem referred to in Ruediger Pluem's comment only applies to per-directory SSL reconfigurations. In short, the SSLVerifyClient statement inside a Directory container posted at the top. This isn't related to this bug at all, which involves what Apache logs.
> If you would like to see this log file issue fixed prepare a patch and new to open source, are we? Yes, that's the religion. And like God, it doesn't show its face much in the real world. I've spent thousands of hours fixing major problems in open source projects. My fixes are fairly rarely implemented because whoever owns that piece of code was ignoring the problem in the first place. Or they don't care to fix it. Or they have some religious objection to it. ("who cares about MSIE problems? etc") I stopped submitting unsolicited patches years ago. The very few organizations that do want and accept unsolicited patches are (a) responsive to the bug reports and (b) willing to review them. It's been nearly 10 years since Apache did either. So the truth is that, with Apache, the only way bugs get fixed is if one of the maintainers cares enough. And given constant reports of this problem and confirmation from the maintainers about exactly where the problem is and what needs to be done to fix it... but no patches, it's clear that they don't care. It's likely another personal war against MSIE. Again. sigh.
Please try to keep this discussion productive. 1) If the only bug you are seeing is the non-MSIE-specific logging issue, then further discussion belongs at bug 31759, which is a problem entirely outside mod_ssl, and nothing to do with this bug (which is frankly rather ill-specified anyway). 2) If you are seeing bugs *other* than the logging issue, such as some issue with POST requests and MSIE, *and* the problems I outlined above have all been eliminated, then those issues need to be diagnosed further, because there is not enough information given thus far to help solve them. 3) There is no vendetta against MSIE. The default BrowserMatch directive is used in the default config exactly to *maximize interoperability with MSIE*. Nothing else. Any data which anyone has which can be used to refine the scope of this BrowserMatch and improve operation with MSIE is certainly very welcome, but we currently don't have such data, so the current default seems to be safe and correct.
(In reply to comment #14) > If you are using a reverse proxy, then you may be suffering from bug 34452 -- use > http://people.apache.org/~jorton/httpd-2.0.52-sslbuff.patch to fix that. > I think either the PR number or the patch is wrong :-). For 34452 it should be http://issues.apache.org/bugzilla/attachment.cgi?id=14804. BTW: PR 34452 will be fixed 2.0.56
(In reply to comment #21) > This isn't related to this bug at all, which involves what Apache logs. Yes, it is not related to this bug, but to your comment in #17, as it is about why we set BrowserMatch ".*MSIE.*" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0 in the default config.
Oops, yes, thanks Ruediger, the link you give is the correct fix for bug 34452, which is the PR I was talking about there.
(In reply to comment #22) > So the truth is that, with Apache, the only way bugs get fixed is if one of the > maintainers cares enough. And given constant reports of this problem and > confirmation from the maintainers about exactly where the problem is and what > needs to be done to fix it... but no patches, it's clear that they don't care. Concerning the logging issue (PR31759): Yes, this has been reported about 6 month ago and there is still no fix. If you would like, you can say that developers do not care about this. I would say the priorities are different. This problem bothers you, so it has a high priority for you. So my point is: a) You can fix it yourself in comparsion to a closed source product where you can do nothing if the vendor does not care. b) Given the low priority on developer side regarding this issue you can lower the action bar of actually fixing it in the code base by providing a patch. c) You can try to help improving the product.
Reply comment 27: > If you would like, you can say that developers do not care about this. > I would say the priorities are different. There is no difference in these statements. There are no crushing problems with Apache that everyone is busy solving. It's only adding new features to the bloat. When the product fails to do the basics properly (log the request) but you're busy adding cute new features to the bloat, then "developers don't care" is much more correct and apt for the situation.
Joe: what does a reverse proxy situation have to do with this bug? Yes, they might be witnessing a problem in the same piece of code ... but perhaps not. And frankly, I can understand why a problem in the reverse proxy would be ignored. But a major problem in the very basic, simple, core thing that Apache is supposed to do should get attention.
Harrassing volunteers seldom get good results. You might get the opposite effect you were looking for. I understand your need for a fix but there must be a better way to get there.
> Harrassing volunteers seldom get good results. > You might get the opposite effect you were looking for. Sorry, I've just been the recipient of about 20,000 too many "submit a patch" responses. And I've spent the time and done the work, and it still never gets committed. "submit a patch" in open source land is often synonymous with "go f**k yourself". Even if it isn't meant that way, neither act gains you very much.
CC myself on FreeBSD related bugs