Bug 60671 - "Event handle is invalid" when using mod_cache_disk
Summary: "Event handle is invalid" when using mod_cache_disk
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_cache_disk / mod_disk_cache (show other bugs)
Version: 2.4.25
Hardware: PC All
: P2 normal with 3 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-31 08:29 UTC by dnie
Modified: 2019-03-19 10:52 UTC (History)
2 users (show)



Attachments
ErrorLog (100.13 KB, text/plain)
2017-01-31 08:29 UTC, dnie
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dnie 2017-01-31 08:29:14 UTC
Created attachment 34696 [details]
ErrorLog

We are using Apache 2.4.25 (Windows) as a reverse proxy with caching. 

I am using the following cache configuration:
  
  CacheRoot cache
  CacheDirLevels 5
  CacheDirLength 3
  CacheEnable disk  /

The cache itself is working correctly. But after a period of time (1-x hours), Apache is crashing/restarting.
The log messages in this case are:

[Thu Jan 26 13:58:34.304331 2017] [mpm_winnt:crit] [pid 8824:tid 324] (OS 6)The handle is invalid.  : AH00356: Child: WAIT_FAILED -- shutting down server
[Thu Jan 26 13:58:34.304331 2017] [mpm_winnt:crit] [pid 8824:tid 324] (OS 6)The handle is invalid.  : AH02644: Child: Event handle #0 (8) is invalid
[Thu Jan 26 13:59:06.336811 2017] [mpm_winnt:notice] [pid 8824:tid 324] AH00362: Child: Waiting 90 more seconds for 1 worker threads to finish.
[Thu Jan 26 13:59:36.368238 2017] [mpm_winnt:notice] [pid 8824:tid 324] AH00362: Child: Waiting 60 more seconds for 1 worker threads to finish.
[Thu Jan 26 14:00:06.396427 2017] [mpm_winnt:notice] [pid 8824:tid 324] AH00362: Child: Waiting 30 more seconds for 1 worker threads to finish.
[Thu Jan 26 14:00:36.420727 2017] [mpm_winnt:notice] [pid 8824:tid 324] AH00362: Child: Waiting 0 more seconds for 1 worker threads to finish.
[Thu Jan 26 14:00:36.520808 2017] [mpm_winnt:notice] [pid 8824:tid 324] AH00363: Child: Terminating 1 threads that failed to exit.
[Thu Jan 26 14:00:36.520808 2017] [mpm_winnt:notice] [pid 8824:tid 324] AH00364: Child: All worker threads have exited.
[Thu Jan 26 14:00:36.542289 2017] [mpm_winnt:notice] [pid 3636:tid 384] AH00428: Parent: child process 8824 exited with status 0 -- Restarting.
[Thu Jan 26 14:00:36.831791 2017] [mpm_winnt:notice] [pid 3636:tid 384] AH00455: Apache/2.4.25 (Win64) OpenSSL/1.0.2j configured -- resuming normal operations
[Thu Jan 26 14:00:36.831791 2017] [mpm_winnt:notice] [pid 3636:tid 384] AH00456: Apache Lounge VC14 Server built: Dec 17 2016 11:15:57
[Thu Jan 26 14:00:36.831791 2017] [core:notice] [pid 3636:tid 384] AH00094: Command line: 'E:\\d3\\d.ecs http gateway\\webserver\\bin\\httpd.exe -d E:/d3/d.ecs http gateway/webserver -d E:\\d3\\d.ecs http gateway\\webserver'
[Thu Jan 26 14:00:36.832768 2017] [mpm_winnt:notice] [pid 3636:tid 384] AH00418: Parent: Created child process 2012
[Thu Jan 26 14:00:37.525523 2017] [mpm_winnt:notice] [pid 2012:tid 324] AH00354: Child: Starting 250 worker threads.

(complete logfile attached)

When deactivating the cache, everything runs perfectly for weeks.

We made the following additional tests:

- When changing from disk cache to socache/shmcb ==> no crash.

- Since we use Windows, we tried 3 diffrent compilations from 2 distrubutors (ApacheHaus, ApacheLounge VC11/VC14) ==> This makes no diffrence (crash)

- We tried the following config without success (config is recommended on https://www.apachelounge.com/viewtopic.php?p=34683)
  AcceptFilter http none
  AcceptFilter https none
  EnableSendfile off
  EnableMMAP off


Let me know, if you need more information or log files. I always can reproduce this issue in my test environment.
Comment 1 abrighton 2017-11-15 04:17:55 UTC
We are seeing similar behavior on a number of Apache installations we just upgraded from 2.4.23 to 2.4.27.  We enabled the cache feature during the upgrade.

As these are production systems, we've disabled the cache feature for now, but look forward to a possible fix.
Comment 2 Michael Schlenker 2018-11-22 17:14:44 UTC
I had a look at some crash dumps (via procdump) of this issue.

In all cases i saw the error seems to be a CloseHandle() on an invalid apr_thread_mutex_t -> handle inside the apr_file object.

The code can only create a Critical Section Mutex if APR_HAS_UNICODE_FS is active, but it tries to close a Handle, which is always uninitialized memory for Critial Section Mutexes. 

The Mutex seems to get destroyed twice, the type is always 0xffffffff = -1, which is the sentinel set by apr_thread_mutex_t when a critical section gets deleted the first time.


Microsoft (R) Windows Debugger Version 10.0.10586.567 X86
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [D:\logs\httpd.exe_181120_160002\httpd.exe_181120_161836.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment:
*** Unhandled exception: C0000008.INVALID_HANDLE'

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(d00.fa8): Invalid handle - code c0000008 (first/second chance not available)
eax=00000000 ebx=1736d3c0 ecx=00000000 edx=00000000 esi=64697254 edi=17379108
eip=777ac3ec esp=1599f9f0 ebp=1599fa00 iopl=0         nv up ei pl nz ac pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000216
ntdll!NtClose+0xc:
777ac3ec c20400          ret     4
0:255> kb
 # ChildEBP RetAddr  Args to Child              
00 1599f9ec 757def8a 64697254 6d018d80 1736c110 ntdll!NtClose+0xc
01 1599fa00 6d018dab 64697254 1599fa20 6d019d99 KERNELBASE!CloseHandle+0x16
02 1599fa0c 6d019d99 17379108 6d016b10 17379108 libapr_1!thread_mutex_cleanup+0x2b [c:\sdk\src\apr-1.6.3\locks\win32\thread_mutex.c @ 35]
03 1599fa20 6d018f94 1736c0d8 17379108 6d018d80 libapr_1!apr_pool_cleanup_run+0x19 [c:\sdk\src\apr-1.6.3\memory\unix\apr_pools.c @ 2620]
04 1599fa34 6d016b3f 17379108 1736d450 1599fa64 libapr_1!apr_thread_mutex_destroy+0x14 [c:\sdk\src\apr-1.6.3\locks\win32\thread_mutex.c @ 133]
05 1599fa44 6cbd1721 1736b258 1736d430 6cbd1756 libapr_1!apr_file_close+0x2f [c:\sdk\src\apr-1.6.3\file_io\win32\open.c @ 503]
06 1599fa50 6cbd1756 00000000 1736c118 1736d2d8 mod_cache_disk!close_disk_cache_fd+0x11 [c:\sdk\src\httpd-2.4.33\modules\cache\mod_cache_disk.c @ 605]
07 1599fa64 6cbe37b6 1736d3c0 009e1048 17d7dc50 mod_cache_disk!remove_entity+0x16 [c:\sdk\src\httpd-2.4.33\modules\cache\mod_cache_disk.c @ 618]
08 1599faf4 6d071eba 17379e00 17d7dc50 1736be38 mod_cache!cache_save_filter+0x9b6 [c:\sdk\src\httpd-2.4.33\modules\cache\mod_cache.c @ 1326]
09 1599fb08 6cbf21fc 17379e00 17d7dc50 171be130 libhttpd!ap_pass_brigade+0x5a [c:\sdk\src\httpd-2.4.33\server\util_filter.c @ 590]
0a 1599fb20 6d071eba 1736be38 17d7dc50 17d7dbc0 mod_headers!ap_headers_output_filter+0x7c [c:\sdk\src\httpd-2.4.33\modules\metadata\mod_headers.c @ 885]
0b 1599fb34 6cc31f20 1736be38 17d7dc50 1736c118 libhttpd!ap_pass_brigade+0x5a [c:\sdk\src\httpd-2.4.33\server\util_filter.c @ 590]
0c 1599fb6c 6d071eba 1736be20 17d7dba8 170381a0 mod_deflate!deflate_out_filter+0x760 [c:\sdk\src\httpd-2.4.33\modules\filters\mod_deflate.c @ 913]
0d 1599fb80 6cb53567 1736be20 17d7dba8 18720020 libhttpd!ap_pass_brigade+0x5a [c:\sdk\src\httpd-2.4.33\server\util_filter.c @ 590]
...

0:255> .frame 02
02 1599fa0c 6d019d99 libapr_1!thread_mutex_cleanup+0x2b [c:\sdk\src\apr-1.6.3\locks\win32\thread_mutex.c @ 35]
0:255> dt data
Local var @ 0x1599fa14 Type void*
0x17379108 
Void
0:255> dt libapr_1!apr_thread_mutex_t 0x17379108
   +0x000 pool             : 0x1736c0d8 apr_pool_t
   +0x004 type             : 0xffffffff (No matching name)
   +0x008 handle           : 0x64697254 Void
   +0x00c section          : _RTL_CRITICAL_SECTION
Comment 3 William A. Rowe Jr. 2018-11-23 06:47:48 UTC
This sounds very familiar, addressed within the past 18 mos by the APR project. Would you retest with APR release 1.6.5 and report back? FWIW httpd recently released 2.4.35 and 2.4.37, either of these should be used over the much older 2.4.27.

This specific bug, IIRC, was specifically a quick in apr and not specific to httpd. And your report indicates nothing whatsoever to do with Unix that I can tell.
Comment 4 Michael Schlenker 2018-11-23 12:22:25 UTC
Thanks, will retry with APR-1.6.5.

Probably you mean https://svn.apache.org/r1808457 ?

That change should fix the mutex issue (by not using a mutex) unless Sendfile is enabled, which is the case in one place in the mod_cache code.
Comment 5 Daniel Cheng 2019-03-06 08:44:43 UTC
I am getting the same "Handle is Invalid" problem with 2.4.37 build from apachelounge, which is using APR-1.6.5.

But the exception stack track is not the same:




RetAddr           : Args to Child                                                           : Call Site
000007fe`fc841873 : 00000000`010d0230 00000000`0000034d 00000000`010d06d8 00000000`010d0274 : ntdll!KiRaiseUserExceptionDispatcher+0x3a
00000000`76ad1951 : 00000000`00000058 00000000`001e5b80 00000000`0410cf28 00000000`00000012 : KERNELBASE!CloseHandle+0x13
00000000`7440ab2c : 00000000`00000000 00000000`7448571b 00000000`18874d00 00000000`74462947 : kernel32!CloseHandle+0x41
00000000`74407215 : 00000000`188782e8 00000000`1887f010 00057d69`ecd56780 0000002a`00000000 : libapr_1!apr_thread_mutex_unlock+0x8c
000007fe`f7e8102a : 00000000`00000076 00000000`00000000 00000000`00000000 0000935d`446c5428 : libapr_1!apr_file_close+0x35
000007fe`f7e97462 : 00000000`00000000 00000000`18876cb0 00000000`18876cb0 000007fe`f8504750 : mod_cache_disk+0x102a
000007fe`f85d36ba : 00000000`188856b8 000007fe`f85d5968 00000000`1887d398 00000000`18862528 : mod_cache!cache_run_cache_status+0xc22
000007fe`f8511bdd : 00000000`00000000 00000000`188f4488 00000000`18706dd0 00000000`18874d00 : mod_deflate+0x36ba
000007fe`f84e72c4 : 00000000`1887d448 00000000`1887f290 00000000`1887f2d0 00000000`18862528 : mod_filter+0x1bdd
000007fe`f7fe2106 : 00000000`1887d3d0 00000000`188f4488 00000000`0026ad88 00000000`18874d00 : mod_include+0x72c4
000007fe`f7fe3920 : 00000000`00000000 00000000`00000000 00000000`1887e550 00000000`017319a0 : mod_proxy_http+0x2106
000007fe`f8405ca8 : 00000000`00000000 00000000`18874d00 00000000`00000000 000007fe`f84140c8 : mod_proxy_http+0x3920
000007fe`f8404fb6 : 00000000`00000000 00000000`0410f6b9 00000000`00000000 00000000`18874d00 : mod_proxy!proxy_run_scheme_handler+0x78
000007fe`f8c8edc5 : 00000000`00000000 000007fe`00000000 00000000`00000000 00000000`00000000 : mod_proxy!ap_proxy_trans_match+0x16c6
000007fe`f8c8db2f : 00000000`00000000 00000000`18874d00 00000000`18874d00 00000000`0175ed80 : libhttpd!ap_run_handler+0x35
000007fe`f8c8a6ca : 00000000`18874d00 00000000`00000000 00000000`00000000 00000000`18874d00 : libhttpd!ap_invoke_handler+0x10f
000007fe`f8c8a74f : 00000000`1881cf50 000007fe`f8cbddf5 00000000`00000000 00000000`00000000 : libhttpd!ap_internal_redirect_handler+0x29a
000007fe`f8c83c01 : 00000000`18874c88 00000000`11e1a300 00000000`00000001 00000000`1881cf50 : libhttpd!ap_process_request+0xf
000007fe`f8c91a55 : 000007fe`f8cd0808 00000000`00000000 00000000`00000000 00000000`00000002 : libhttpd!ap_byterange_filter+0x1581
000007fe`f8c91845 : 00000000`1881cd40 00000000`1881cf50 00000000`00000000 00000000`00000000 : libhttpd!ap_run_process_connection+0x35
000007fe`f8ca1bd0 : 00000000`18736158 00000000`0000000a 00000000`187e2050 00000000`0000000a : libhttpd!ap_process_connection+0x45
00000000`76ac59cd : 00000000`00000000 00000000`00000000 00000000`1881d338 00000000`00000001 : libhttpd!ap_run_generate_log_id+0x3da0
00000000`76d2385d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d
Comment 6 Michael Schlenker 2019-03-19 10:52:06 UTC
This seems to be a bug in apr, not initializing some memory in the lock.