Different sites on my server use different TCL-modules (result of consolidation) and the two do not coexist well causing frequent crashes in during Apache's cleanups. Apparently, both attempt to free certain bits and pieces inside Tcl Here is the typical output from valgrind. To reproduce, run "valgrind httpd -X" and then kill (not with -9!) the httpd process to trigger the cleanups: ==51407== Invalid read of size 8 ==51407== at 0x6744544: TclFreeObj (tclObj.c:1461) ==51407== by 0x677DC5D: FreeVarEntry (tclVar.c:5578) ==51407== by 0x6712919: Tcl_DeleteHashEntry (tclHash.c:463) ==51407== by 0x677C131: TclDeleteNamespaceVars (tclVar.c:4465) ==51407== by 0x67398B5: TclTeardownNamespace (tclNamesp.c:1070) ==51407== by 0x6693772: DeleteInterpProc (tclBasic.c:1261) ==51407== by 0x675586A: Tcl_EventuallyFree (tclPreserve.c:298) ==51407== by 0x6693699: Tcl_DeleteInterp (tclBasic.c:1183) ==51407== by 0x6432001: destroyPool (interpool.c:691) ==51407== by 0x6432666: exit_websh_pool (mod_websh.c:128) ==51407== by 0x1538A9C: run_cleanups (apr_pools.c:2346) ==51407== by 0x153951D: apr_pool_destroy (apr_pools.c:809) ==51407== Address 0x6ba23e0 is 0 bytes inside a block of size 8 free'd ==51407== at 0x259373: free (in /opt/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==51407== by 0x6692219: TclpFree (tclAlloc.c:723) ==51407== by 0x669EA40: Tcl_Free (tclCkalloc.c:1217) ==51407== by 0x6769493: TclFinalizeSynchronization (tclThread.c:398) ==51407== by 0x66F9A2D: Tcl_Finalize (tclEvent.c:1154) ==51407== by 0x68CD169: Rivet_ChildExit (mod_rivet.c:1352) ==51407== by 0x1538A9C: run_cleanups (apr_pools.c:2346) ==51407== by 0x153951D: apr_pool_destroy (apr_pools.c:809) ==51407== by 0x153950B: apr_pool_destroy (apr_pools.c:806) ==51407== by 0x153950B: apr_pool_destroy (apr_pools.c:806) ==51407== by 0x422D26: destroy_and_exit_process (main.c:270) ==51407== by 0x4237E9: main (main.c:760)
If I well understand the problem is triggered by the release of an APR pool. In Rivet_ChildInit we register the module cleanup exploiting the pool cleanup. apr_pool_cleanup_register (pChild, s, Rivet_ChildExit, Rivet_ChildExit); I guess Websh is doing the same and pChild is a pool passed by the webserver framework. Separating these pools my work out a solution?
Yes, Websh does the same. But then, that's the whole idea of registering with a pool. But what I'm not sure about is: Rivet_ChildExit calls Tcl_Finalize and then Websh tries to Tcl_DeleteInterp and stuff. Is that supposed to work? Does Tcl_Finalize more than calling the registered (thread) exit handlers? Can Websh safely still use Tcl after you called Tcl_Finalize? I also noticed that Websh doesn't properly use Tcl_Preserve and Tcl_Release on interps retrieved from data structures. Could that be an issue? (like trying to Tcl_DeleteInterp an already deleted interp? ...)
Another question is why even bother with the clean-up, if the current httpd process is exiting? Granted, this fact may not be easy to detect, but it may be worth trying...
Created attachment 29923 [details] Delete our own interpreter instead of calling Tcl_Finalize Maybe, instead of calling Tcl_Finalize -- which makes it impossible to libtcl in the process again, rivet should limit itself to deleting just the interpreter it created? Then everyone should be happy...
(In reply to comment #4) > Created attachment 29923 [details] > Delete our own interpreter instead of calling Tcl_Finalize > > Maybe, instead of calling Tcl_Finalize -- which makes it impossible to > libtcl in the process again, rivet should limit itself to deleting just the > interpreter it created? > > Then everyone should be happy... It's probably so, separating the pools wouldn't make any difference because Tcl is not allocating from it and moreover you're corrent in saying that if the child process is exiting where is the point of calling Tcl_Finalize as log as we are using prefork as MPM. Mikhail, please, would you test on your installation if this should work out a solution? After all your set up seems to be the de-facto standard for the reproducibility of the bug.
I reiterate my request for cooperation to Mikhail: please, are you willing to test a modified rivet code where the call to Tcl_Finalized is removed to see if it cures this fault? If you could help me doing it I could close the bug and include the fix in the next patchlevel release of Rivet, thanks
Created attachment 29944 [details] Do not call Tcl_Finalize in Rivet_ChildExit Interpreter-deletion is, actually, already done by Rivet_ChildHandlers(). Thus, all that's needed is to simply stop calling Tcl_Finalize. Tested (under valgrind) on FreeBSD-9.1/amd64.
fixed in trunk