Bug 52476

Summary: SSL failures on windows with AcceptFilter https none
Product: Apache httpd-2 Reporter: Daniel Ruggeri <DRuggeri>
Component: CoreAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED FIXED    
Severity: regression CC: DRuggeri, gls
Priority: P2    
Version: 2.3.15-beta   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: debug SSL I/O reads
"MSG_PEEK" workaround for PR 52476

Description Daniel Ruggeri 2012-01-17 15:31:08 UTC
When setting "AcceptFilter https none" on an SSL installation, some resources are served incomplete or empty.

A simple test instance was configured to duplicate this problem. When testing, openssl s_client to the 127.0.0.1:443 always failed (as did LWP tests). Firefox 8.0.1 and IE 8 did not consistently fail but were able to be duplicated.

The following log entry on trace8 is common across all failures:
[Tue Jan 17 09:03:18.268556 2012] [ssl:trace3] [pid 1088:tid 1232] ssl_engine_kernel.c(1869): [client 127.0.0.1:55354] OpenSSL: Exit: error in SSLv2/v3 read client hello A

It would appear the ClientHello message is malformatted when it reaches mod_ssl.

Logs can be found here:
http://people.apache.org/~druggeri/logs/WinSSL/

The test instance with the dummy configuration is here:
http://people.apache.org/~druggeri/logs/WinSSL/testInstance.zip


A simple perl script in the root of the test instance zip file called testMe.pl is provided.
Comment 1 Rainer Jung 2012-01-17 18:47:26 UTC
Daniel,

thanks for that. Most files have permissions 000 so can not be viewed. Could you please fix the perms?

Thanks!

Rainer
Comment 2 Daniel Ruggeri 2012-01-17 19:49:39 UTC
Sorry about that - I had corrected logs.zip but forgot to make the individual files 755. Should be good now.
Comment 3 Rainer Jung 2012-01-17 21:57:24 UTC
Thanks, using OpenSSL as a client I can now reproduce as well.

I tried to debug into it. As soon as I add some additional debugging statements to our debug callback function ssl_io_data_dump() I can no longer reproduce the problem. The smallest change that makes it no longer reproducible with OpenSSL as a client is adding "Sleep(1);" at the beginning of ssl_io_data_dump().

Any chance you can check, whether that is true for you as well?

If so it seems that there is a timing issue between BIO and the lower input filter.

Regards,

Rainer
Comment 4 Daniel Ruggeri 2012-01-18 01:53:11 UTC
Sounds like a race condition. I can try that, but I don't have a build environment. Can you share the mod_ssl.so file (or maybe the whole build)? I'll be able to verify tomorrow.
Comment 5 William A. Rowe Jr. 2012-01-18 03:35:13 UTC
Perhaps we are misconstruing a WSA_EAGAIN type of error, and the delay gives
just enough time for browsers to transmit that first data packet?
Comment 6 Rainer Jung 2012-01-18 07:24:38 UTC
(In reply to comment #4)
> Sounds like a race condition. I can try that, but I don't have a build
> environment. Can you share the mod_ssl.so file (or maybe the whole build)? I'll
> be able to verify tomorrow.

I could, but I would much prefer a more independent verification. My build was done with VC 10. Where did you get your binary from, is it apachelounge? I guess Steffen could provide a special build with the Sleep(1) included for reproduction tests.

Steffen, do you follow this ticket?
Comment 7 Rainer Jung 2012-01-18 07:25:09 UTC
(In reply to comment #5)
> Perhaps we are misconstruing a WSA_EAGAIN type of error, and the delay gives
> just enough time for browsers to transmit that first data packet?

Where in the code would that be, any idea?
Comment 8 Kaspar Brand 2012-01-18 07:47:54 UTC
I'm not really familiar with the I/O layer in mod_ssl, unfortunately, but from examining the logs provided by Daniel (thanks a lot), there seems to be a pattern which is common to the failure cases.

In opensslRequest\error.log, we have:

[Tue Jan 17 09:01:00.276626 2012] [ssl:info] [pid 10672:tid 1256] [client 127.0.0.1:55343] AH01964: Connection to child 63 established (server www.example.com:443)
[Tue Jan 17 09:01:00.276626 2012] [ssl:trace2] [pid 10672:tid 1256] ssl_engine_rand.c(123): Seeding PRNG with 144 bytes of entropy
[Tue Jan 17 09:01:00.276626 2012] [ssl:trace3] [pid 10672:tid 1256] ssl_engine_kernel.c(1832): [client 127.0.0.1:55343] OpenSSL: Handshake: start
[Tue Jan 17 09:01:00.276626 2012] [ssl:trace3] [pid 10672:tid 1256] ssl_engine_kernel.c(1840): [client 127.0.0.1:55343] OpenSSL: Loop: before/accept initialization
[Tue Jan 17 09:01:00.276626 2012] [core:trace6] [pid 10672:tid 1256] core_filters.c(502): [client 127.0.0.1:55343] core_output_filter: flushing because of FLUSH bucket
[Tue Jan 17 09:01:00.276626 2012] [ssl:trace4] [pid 10672:tid 1256] ssl_engine_io.c(2000): [client 127.0.0.1:55343] OpenSSL: I/O error, 11 bytes expected to read on BIO#2856280 [mem: 28e0888]
[Tue Jan 17 09:01:00.276626 2012] [ssl:trace3] [pid 10672:tid 1256] ssl_engine_kernel.c(1869): [client 127.0.0.1:55343] OpenSSL: Exit: error in SSLv2/v3 read client hello A
[Tue Jan 17 09:01:00.276626 2012] [core:trace6] [pid 10672:tid 1256] core_filters.c(502): [client 127.0.0.1:55343] core_output_filter: flushing because of FLUSH bucket
[Tue Jan 17 09:01:00.276626 2012] [ssl:trace4] [pid 10672:tid 1256] ssl_engine_io.c(2000): [client 127.0.0.1:55343] OpenSSL: I/O error, 11 bytes expected to read on BIO#2856280 [mem: 28e0888]
[Tue Jan 17 09:01:00.276626 2012] [ssl:trace3] [pid 10672:tid 1256] ssl_engine_kernel.c(1869): [client 127.0.0.1:55343] OpenSSL: Exit: error in SSLv2/v3 read client hello A

I'm not sure if there's some artifact in the logging infrastructure (all these have the same µsec timestamp), but what is peculiar is that whenever the request fails, there are two occurrences of "OpenSSL: I/O error, 11 bytes expected" - i.e., it looks like the same thread (tid 1256, in the above case) is trying to read the client hello, but apparently doesn't get anything.

For successful requests, the 11-byte read succeeds in the first attempt (see goodRequest\error.log). Finally, in the firefox8.0.1Request\error.log, there's this "interesting" combination of events:

[Tue Jan 17 09:03:18.268556 2012] [ssl:info] [pid 1088:tid 1232] [client 127.0.0.1:55354] AH01964: Connection to child 43 established (server www.example.com:443)
[Tue Jan 17 09:03:18.268556 2012] [ssl:trace2] [pid 1088:tid 1232] ssl_engine_rand.c(123): Seeding PRNG with 144 bytes of entropy
[Tue Jan 17 09:03:18.268556 2012] [ssl:trace3] [pid 1088:tid 1232] ssl_engine_kernel.c(1832): [client 127.0.0.1:55354] OpenSSL: Handshake: start
[Tue Jan 17 09:03:18.268556 2012] [ssl:trace3] [pid 1088:tid 1232] ssl_engine_kernel.c(1840): [client 127.0.0.1:55354] OpenSSL: Loop: before/accept initialization
[Tue Jan 17 09:03:18.268556 2012] [core:trace6] [pid 1088:tid 1232] core_filters.c(502): [client 127.0.0.1:55354] core_output_filter: flushing because of FLUSH bucket
[Tue Jan 17 09:03:18.268556 2012] [ssl:trace4] [pid 1088:tid 1232] ssl_engine_io.c(2000): [client 127.0.0.1:55354] OpenSSL: I/O error, 11 bytes expected to read on BIO#2756a80 [mem: 27e0890]
[Tue Jan 17 09:03:18.268556 2012] [ssl:trace3] [pid 1088:tid 1232] ssl_engine_kernel.c(1869): [client 127.0.0.1:55354] OpenSSL: Exit: error in SSLv2/v3 read client hello A
[Tue Jan 17 09:03:18.268556 2012] [core:trace6] [pid 1088:tid 1232] core_filters.c(502): [client 127.0.0.1:55354] core_output_filter: flushing because of FLUSH bucket
[Tue Jan 17 09:03:18.268556 2012] [ssl:trace4] [pid 1088:tid 1232] ssl_engine_io.c(1989): [client 127.0.0.1:55354] OpenSSL: read 11/11 bytes from BIO#2756a80 [mem: 27e0890] (BIO dump follows)

I.e., the first read fails, but the second one succeeds (and again, the timestamps are identical, but it's unclear if this reflects reality).
Comment 9 Gregg L. Smith 2012-01-18 22:00:43 UTC
Regarding Comment #3

Adding Sleep(1); to the beginning of ssl_io_data_dump() seems to help, but then not. I used Daniels testme.pl script and that build of 2.3.16 to test. It is Apache Lounge, I can tell by the files in the bin & modules folders. 

My results were as erratic as the behavior is to begin with. Running testme.pl 3 times each, here is what I saw;

####### Unmodified mod_ssl #######
161 attempts.
0  SSL negotiation failed:
33 attempts.
0  SSL negotiation failed:
179 attempts.
0  SSL negotiation failed:

####### mod_ssl w/ Sleep(1) #######
22 attempts.
0  SSL negotiation failed:
3793 attempts.
0  Server closed connection without sending any data back
55 attempts.
0  SSL negotiation failed:

I was also asked off list to try OpenSSL 0.9.8 by someone, to see how it behaved. It did not do much better.

758 attempts.
0  SSL negotiation failed:
291 attempts.
0  SSL negotiation failed:
106 attempts.
0  SSL negotiation failed:

VC9 mod_ssl/OpenSSL 1.0.0e w/ Sleep(1) in ssl_io_data_dump() 

http://people.apache.org/~gsmith/httpd/PR52476/mod_ssl_io_sleep.zip
Comment 10 Gregg L. Smith 2012-01-18 22:03:04 UTC
I'll add that my testing was on Vista x86
Comment 11 Rainer Jung 2012-01-18 23:55:40 UTC
Gregg,

thanks a lot for testing. I must explain, that Sleep(1) was a bit to simple as an explanation. The ssl_io_data_dump() function is registered as a callback with the OpenSSL BIO layer, which does the low level I/O. The callback is normally only used to log data read when debug logging is on. But is is called before and after each BIO operation by OpenSSL.

The first call is before the first read. The first read expects 11 bytes, but when actually trying to read them we get an error code. Bill's expectation is, that the first read happens to early and the error code is not a fatal one, but instead an error code that tells us to try again.

By adding the Sleep(), we give the client a little more time to actually deliver the first bytes. This is obviously not a fix, but an interesting way to harden Bill's EAGAIN explanation.

Now I was choosing Sleep(1) and it helped here, but the "1" would depend on how long we expect the client needs to send the first handshake bytes. The observation, that a "1" helps a bit but some requests are still broken is somewhat expected. If you increase the "1", say to "10" and then "50", the number of broken requests should drop to a very small percentage.

I hope Bill will give us hints, where we do have to look for the missing handling of EAGAIN.

Thanks!

Rainer
Comment 12 Joe Orton 2012-01-19 11:39:20 UTC
Created attachment 28175 [details]
debug SSL I/O reads

When mod_ssl tries to read the client handshake from the input filter stack, it should be a blocking read (APR_BLOCK_READ), so if an EAGAIN or empty brigade is propagating up, that is a bug somewhere else.

Can you try with the attached patch to see what mod_ssl is receiving from the filter stack?
Comment 13 Gregg L. Smith 2012-01-19 23:40:58 UTC
It looks like an empty brigade/s

[Thu Jan 19 15:22:52.389200 2012] [ssl:debug] [pid 4816:tid 1028] ssl_engine_kernel.c(1881): [client 127.0.0.1:50778] AH02041: Connection: Client IP: , Protocol: TLSv1, Cipher: ECDHE-RSA-AES256-SHA (256/256 bits)
[Thu Jan 19 15:22:52.389200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(475): [client 127.0.0.1:50778] read of 5 bytes => 0, mode = blocking, brigade is non-empty
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace4] [pid 4816:tid 1028] ssl_engine_io.c(1994): [client 127.0.0.1:50778] OpenSSL: read 5/5 bytes from BIO#1d08eb0 [mem: 1d97663] (BIO dump follows)
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1927): +-------------------------------------------------------------------------+
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0000: 17 03 01 00 b0                                   .....            |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1972): +-------------------------------------------------------------------------+
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(475): [client 127.0.0.1:50778] read of 176 bytes => 0, mode = blocking, brigade is non-empty
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace4] [pid 4816:tid 1028] ssl_engine_io.c(1994): [client 127.0.0.1:50778] OpenSSL: read 176/176 bytes from BIO#1d08eb0 [mem: 1d97668] (BIO dump follows)
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1927): +-------------------------------------------------------------------------+
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0000: 2f 7c ee 2e 4f fc 7d 19-ad dc e2 9f 50 a2 87 b5  /|..O.}.....P... |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0010: 43 49 80 84 da aa 75 0f-c6 b6 3f aa 25 36 d8 96  CI....u...?.%6.. |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0020: 8f 91 0e 45 cb 69 cb f9-e5 dd 40 72 67 1e b4 fb  ...E.i....@rg... |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0030: a0 e2 93 5a 03 e5 43 1c-d7 30 c7 49 58 3f dc b8  ...Z..C..0.IX?.. |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0040: 7c f1 22 1b 83 5e 03 ab-dc 95 a9 6b 23 a9 d1 30  |."..^.....k#..0 |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0050: 9e c5 82 ca 7d d0 ae ea-05 2b 72 4a 16 a1 a4 f7  ....}....+rJ.... |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0060: d2 43 bf ea ee f7 d7 6e-57 94 cf 8f 67 2b 70 52  .C.....nW...g+pR |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0070: b5 bb bc fc 18 23 d6 85-bc 7a 1e 10 a6 ca 2d 98  .....#...z....-. |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0080: 06 5f 3f 25 f9 da 81 b1-27 90 70 1b e1 d1 4f 7d  ._?%....'.p...O} |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0090: 7a 93 d7 dc c0 6a bd 10-09 50 f3 b0 31 40 80 2b  z....j...P..1@.+ |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 00a0: 13 28 fc 13 ef 70 08 d3-bf 87 80 1f ed 88 ac 57  .(...p.........W |
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1972): +-------------------------------------------------------------------------+
[Thu Jan 19 15:22:52.390200 2012] [ssl:debug] [pid 4816:tid 1028] ssl_engine_kernel.c(236): [client 127.0.0.1:50778] AH02034: Initial (No.1) HTTPS request received for child 62 (server www.example.com:443)
[Thu Jan 19 15:22:52.390200 2012] [ssl:trace4] [pid 4816:tid 1028] ssl_engine_io.c(1444): [client 127.0.0.1:50778] coalesce: have 0 bytes, adding 265 more
[Thu Jan 19 15:22:52.391200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1850): [client 127.0.0.1:50778] OpenSSL: Write: SSL negotiation finished successfully
[Thu Jan 19 15:22:52.391200 2012] [ssl:debug] [pid 4816:tid 1028] ssl_engine_io.c(989): [client 127.0.0.1:50778] AH02001: Connection closed to child 62 with standard shutdown (server www.example.com:443)
[Thu Jan 19 15:22:52.397200 2012] [ssl:info] [pid 4816:tid 1028] [client 127.0.0.1:50779] AH01964: Connection to child 62 established (server www.example.com:443)
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace2] [pid 4816:tid 1028] ssl_engine_rand.c(123): Seeding PRNG with 144 bytes of entropy
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1832): [client 127.0.0.1:50779] OpenSSL: Handshake: start
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1840): [client 127.0.0.1:50779] OpenSSL: Loop: before/accept initialization
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(475): [client 127.0.0.1:50779] read of 11 bytes => 0, mode = blocking, brigade is empty
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace4] [pid 4816:tid 1028] ssl_engine_io.c(2005): [client 127.0.0.1:50779] OpenSSL: I/O error, 11 bytes expected to read on BIO#1d08eb0 [mem: 1d97660]
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1869): [client 127.0.0.1:50779] OpenSSL: Exit: error in SSLv2/v3 read client hello A
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(475): [client 127.0.0.1:50779] read of 11 bytes => 0, mode = blocking, brigade is empty
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace4] [pid 4816:tid 1028] ssl_engine_io.c(2005): [client 127.0.0.1:50779] OpenSSL: I/O error, 11 bytes expected to read on BIO#1d08eb0 [mem: 1d97660]
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1869): [client 127.0.0.1:50779] OpenSSL: Exit: error in SSLv2/v3 read client hello A
[Thu Jan 19 15:22:52.400200 2012] [ssl:info] [pid 4816:tid 1028] [client 127.0.0.1:50780] AH01964: Connection to child 62 established (server www.example.com:443)
[Thu Jan 19 15:22:52.400200 2012] [ssl:trace2] [pid 4816:tid 1028] ssl_engine_rand.c(123): Seeding PRNG with 144 bytes of entropy
[Thu Jan 19 15:22:52.400200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1832): [client 127.0.0.1:50780] OpenSSL: Handshake: start
[Thu Jan 19 15:22:52.400200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1840): [client 127.0.0.1:50780] OpenSSL: Loop: before/accept initialization
[Thu Jan 19 15:22:52.400200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(475): [client 127.0.0.1:50780] read of 11 bytes => 0, mode = blocking, brigade is empty
[Thu Jan 19 15:22:52.401200 2012] [ssl:trace4] [pid 4816:tid 1028] ssl_engine_io.c(2005): [client 127.0.0.1:50780] OpenSSL: I/O error, 11 bytes expected to read on BIO#1d09060 [mem: 1d97660]
[Thu Jan 19 15:22:52.401200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1869): [client 127.0.0.1:50780] OpenSSL: Exit: error in SSLv2/v3 read client hello A
[Thu Jan 19 15:22:52.401200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(475): [client 127.0.0.1:50780] read of 11 bytes => 0, mode = blocking, brigade is empty
[Thu Jan 19 15:22:52.401200 2012] [ssl:trace4] [pid 4816:tid 1028] ssl_engine_io.c(2005): [client 127.0.0.1:50780] OpenSSL: I/O error, 11 bytes expected to read on BIO#1d09060 [mem: 1d97660]
[Thu Jan 19 15:22:52.401200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1869): [client 127.0.0.1:50780] OpenSSL: Exit: error in SSLv2/v3 read client hello A
[Thu Jan 19 15:22:52.426200 2012] [ssl:info] [pid 4816:tid 1028] [client 127.0.0.1:50781] AH01964: Connection to child 62 established (server www.example.com:443)
[Thu Jan 19 15:22:52.426200 2012] [ssl:trace2] [pid 4816:tid 1028] ssl_engine_rand.c(123): Seeding PRNG with 144 bytes of entropy
[Thu Jan 19 15:22:52.426200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1832): [client 127.0.0.1:50781] OpenSSL: Handshake: start
[Thu Jan 19 15:22:52.426200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1840): [client 127.0.0.1:50781] OpenSSL: Loop: before/accept initialization
[Thu Jan 19 15:22:52.426200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(475): [client 127.0.0.1:50781] read of 11 bytes => 0, mode = blocking, brigade is non-empty
[Thu Jan 19 15:22:52.426200 2012] [ssl:trace4] [pid 4816:tid 1028] ssl_engine_io.c(1994): [client 127.0.0.1:50781] OpenSSL: read 11/11 bytes from BIO#1d08d90 [mem: 1d97660] (BIO dump follows)
[Thu Jan 19 15:22:52.426200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1927): +-------------------------------------------------------------------------+
[Thu Jan 19 15:22:52.426200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1966): | 0000: 80 2e 01 00 02 00 15 00-00 00 10                 ...........      |
[Thu Jan 19 15:22:52.426200 2012] [ssl:trace7] [pid 4816:tid 1028] ssl_engine_io.c(1972): +-------------------------------------------------------------------------+
[Thu Jan 19 15:22:52.426200 2012] [ssl:trace3] [pid 4816:tid 1028] ssl_engine_kernel.c(1869): [client 127.0.0.1:50781] OpenSSL: Exit: error in SSLv2/v3 read client hello A
[Thu Jan 19 15:22:52.426200 2012] [ssl:info] [pid 4816:tid 1028] [client 127.0.0.1:50781] AH02008: SSL library error 1 in handshake (server www.example.com:443)
[Thu Jan 19 15:22:52.426200 2012] [ssl:info] [pid 4816:tid 1028] SSL Library Error: error:140760FC:SSL routines:SSL23_GET_CLIENT_HELLO:unknown protocol -- speaking not SSL to HTTPS port!?
[Thu Jan 19 15:22:52.426200 2012] [ssl:info] [pid 4816:tid 1028] [client 127.0.0.1:50781] AH01998: Connection closed to child 62 with abortive shutdown (server www.example.com:443)
Comment 14 Joe Orton 2012-01-20 10:24:32 UTC
[Thu Jan 19 15:22:52.397200 2012] [ssl:trace7] [pid 4816:tid 1028]
ssl_engine_io.c(475): [client 127.0.0.1:50779] read of 11 bytes => 0, mode =
blocking, brigade is empty

Thanks, Gregg.  This would suggest Bill's intuition about an EAGAIN is correct, but specifically, the only way this could happen is if apr_bucket_read() is returning EAGAIN for a blocking read.  The core input filter should really treat that as an error, because it should never happen.

The obvious step on Unix would be to get "strace" output to see what the system calls are doing, but somebody with Windows domain expertise will have to suggest what to do.
Comment 15 William A. Rowe Jr. 2012-01-20 18:26:22 UTC
I suspect I now know the origin of the bug... give me through this evening (CST) to wrangle it.  We have a 'special case' filter where the first bytes of an AcceptEx + data is handed up the brigade.  Since there are no first bytes, and we don't read, and we aren't blocking (as mod_ssl expects) - this is a very likely possibility.
Comment 16 Rainer Jung 2012-01-20 19:02:02 UTC
Bill: did you notice http://svn.apache.org/viewvc?rev=1233882&view=rev ?
Comment 17 William A. Rowe Jr. 2012-01-20 19:25:09 UTC
Re: comment #16, yes but its irrelevant.  We don't want to hit that errored state.
Comment 18 William A. Rowe Jr. 2012-01-23 04:29:19 UTC
So, there is no special-case filter... I had modified the logic that was
unusable within the 'public', simple-yet-insufficient ap_process_connection,
which would otherwise do;

    ap_update_vhost_given_ip(c);

    rc = ap_run_pre_connection(c, csd);
    if (rc != OK && rc != DONE) {
        c->aborted = 1;
    }

    if (!c->aborted) {
        ap_run_process_connection(c);
    }

Note that the event, simple and winnt mpm's all have custom implementations
of that not-altogether useful sequence of instructions.

So mpm/winnt/child.c worker_main() reimplements this logic for those cases
where we must inject an initial bucket of data, pulled from the accepted
socket by AcceptEx.  Really there is little special about Acceptfilter none;
it bypasses that magic, the initial brigade is empty and obtains one socket
bucket.

Now the only idea I have left is that somewhere in the code path
        else /* (accf = 0)  e.g. 'none' */
where we handle accept(), play some setsockopt games, and call apr_os_sock_make
apr and mpm_winnt disagree on the default state of that new apr_socket_t.

That is about the last explanation which would address these interesting
timing observations, that apr is not blocking, but httpd assumes that it is.
Comment 19 Joe Orton 2012-01-23 10:49:44 UTC
That stuff in winnt/child.c w.r.t. handling the data bucket insertion looks pretty fragile, faking up the internals of the core_input_filter's context structure!

Couldn't we replace the core input filter's call to:

        e = apr_bucket_socket_create(net->client_socket, f->c->bucket_alloc);

with a call out to the MPM, so the winNT MPM can insert its data bucket there?
Comment 20 William A. Rowe Jr. 2012-01-23 20:06:29 UTC
Joe, if you can fix up event.c with a sufficiently generic framework, then
I would guess windows would plug right in.

I had always assumed that the MPM was in part owner of the network mechanics
and core filter, as evidenced by event.mpm, and the additional changes which
will be needed if httpd ever becomes fully async.
Comment 21 Jeff Trawick 2012-08-09 22:45:12 UTC
Created attachment 29206 [details]
"MSG_PEEK" workaround for PR 52476

Derived from a patch posted to dev@httpd by Claudio Caldato
Comment 22 Jeff Trawick 2012-08-09 22:48:00 UTC
Is anyone able to try this MSG_PEEK workaround?  It works for me, but you know how that goes...
Comment 23 Jeff Trawick 2012-08-10 18:22:27 UTC
Fixed in trunk with

http://svn.apache.org/viewvc?view=revision&revision=1371801

The patch I posted yesterday is obsolete, as this is the real fix.

I'll propose this fix for 2.4.next shortly.
Comment 24 Jeff Trawick 2012-08-14 23:44:14 UTC
The real fix is in 2.4.x and will be released with 2.4.3 "soon".