Bug 54162 - Rivet and websh can not coexist peacefully
Summary: Rivet and websh can not coexist peacefully
Status: RESOLVED FIXED
Alias: None
Product: Rivet
Classification: Unclassified
Component: mod_rivet (show other bugs)
Version: 2.1.0
Hardware: All All
: P2 normal
Target Milestone: mod_rivet
Assignee: Apache Rivet Mailing list account
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-18 08:18 UTC by Mikhail T.
Modified: 2013-02-12 21:54 UTC (History)
1 user (show)



Attachments
Delete our own interpreter instead of calling Tcl_Finalize (431 bytes, patch)
2013-02-05 23:01 UTC, Mikhail T.
Details | Diff
Do not call Tcl_Finalize in Rivet_ChildExit (265 bytes, patch)
2013-02-12 17:59 UTC, Mikhail T.
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mikhail T. 2012-11-18 08:18:49 UTC
Different sites on my server use different TCL-modules (result of consolidation) and the two do not coexist well causing frequent crashes in during Apache's cleanups. Apparently, both attempt to free certain bits and pieces inside Tcl

Here is the typical output from valgrind. To reproduce, run "valgrind httpd -X" and then kill (not with -9!) the httpd process to trigger the cleanups:

==51407== Invalid read of size 8
==51407==    at 0x6744544: TclFreeObj (tclObj.c:1461)
==51407==    by 0x677DC5D: FreeVarEntry (tclVar.c:5578)
==51407==    by 0x6712919: Tcl_DeleteHashEntry (tclHash.c:463)
==51407==    by 0x677C131: TclDeleteNamespaceVars (tclVar.c:4465)
==51407==    by 0x67398B5: TclTeardownNamespace (tclNamesp.c:1070)
==51407==    by 0x6693772: DeleteInterpProc (tclBasic.c:1261)
==51407==    by 0x675586A: Tcl_EventuallyFree (tclPreserve.c:298)
==51407==    by 0x6693699: Tcl_DeleteInterp (tclBasic.c:1183)
==51407==    by 0x6432001: destroyPool (interpool.c:691)
==51407==    by 0x6432666: exit_websh_pool (mod_websh.c:128)
==51407==    by 0x1538A9C: run_cleanups (apr_pools.c:2346)
==51407==    by 0x153951D: apr_pool_destroy (apr_pools.c:809)
==51407==  Address 0x6ba23e0 is 0 bytes inside a block of size 8 free'd
==51407==    at 0x259373: free (in /opt/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==51407==    by 0x6692219: TclpFree (tclAlloc.c:723)
==51407==    by 0x669EA40: Tcl_Free (tclCkalloc.c:1217)
==51407==    by 0x6769493: TclFinalizeSynchronization (tclThread.c:398)
==51407==    by 0x66F9A2D: Tcl_Finalize (tclEvent.c:1154)
==51407==    by 0x68CD169: Rivet_ChildExit (mod_rivet.c:1352)
==51407==    by 0x1538A9C: run_cleanups (apr_pools.c:2346)
==51407==    by 0x153951D: apr_pool_destroy (apr_pools.c:809)
==51407==    by 0x153950B: apr_pool_destroy (apr_pools.c:806)
==51407==    by 0x153950B: apr_pool_destroy (apr_pools.c:806)
==51407==    by 0x422D26: destroy_and_exit_process (main.c:270)
==51407==    by 0x4237E9: main (main.c:760)
Comment 1 Massimo Manghi 2013-01-30 23:41:59 UTC
If I well understand the problem is triggered by the release of an APR pool. In Rivet_ChildInit we register the module cleanup exploiting the pool cleanup.

apr_pool_cleanup_register (pChild, s, Rivet_ChildExit, Rivet_ChildExit);

I guess Websh is doing the same and pChild is a pool passed by the webserver framework. Separating these pools my work out a solution?
Comment 2 Ronnie Brunner 2013-02-05 20:26:18 UTC
Yes, Websh does the same. But then, that's the whole idea of registering with a pool.

But what I'm not sure about is: Rivet_ChildExit calls Tcl_Finalize and then Websh tries to Tcl_DeleteInterp and stuff. Is that supposed to work? Does Tcl_Finalize more than calling the registered (thread) exit handlers? Can Websh safely still  use Tcl after you called Tcl_Finalize?

I also noticed that Websh doesn't properly use Tcl_Preserve and Tcl_Release on interps retrieved from data structures. Could that be an issue? (like trying to Tcl_DeleteInterp an already deleted interp? ...)
Comment 3 Mikhail T. 2013-02-05 20:41:08 UTC
Another question is why even bother with the clean-up, if the current httpd process is exiting? Granted, this fact may not be easy to detect, but it may be worth trying...
Comment 4 Mikhail T. 2013-02-05 23:01:07 UTC
Created attachment 29923 [details]
Delete our own interpreter instead of calling Tcl_Finalize

Maybe, instead of calling Tcl_Finalize -- which makes it impossible to libtcl in the process again, rivet should limit itself to deleting just the interpreter it created?

Then everyone should be happy...
Comment 5 Massimo Manghi 2013-02-06 23:42:18 UTC
(In reply to comment #4)
> Created attachment 29923 [details]
> Delete our own interpreter instead of calling Tcl_Finalize
> 
> Maybe, instead of calling Tcl_Finalize -- which makes it impossible to
> libtcl in the process again, rivet should limit itself to deleting just the
> interpreter it created?
> 
> Then everyone should be happy...

It's probably so, separating the pools wouldn't make any difference because Tcl is not allocating from it and moreover you're corrent in saying that if the child process is exiting where is the point of calling Tcl_Finalize as log as we are using prefork as MPM. 

 Mikhail, please, would you test on your installation if this should work out a solution? After all your set up seems to be the de-facto standard for the reproducibility of the bug.
Comment 6 Massimo Manghi 2013-02-12 17:04:28 UTC
I reiterate my request for cooperation to Mikhail: please, are you willing to test a modified rivet code where the call to Tcl_Finalized is removed to see if it cures this fault? If you could help me doing it I could close the bug and include the fix in the next patchlevel release of Rivet, thanks
Comment 7 Mikhail T. 2013-02-12 17:59:25 UTC
Created attachment 29944 [details]
Do not call Tcl_Finalize in Rivet_ChildExit

Interpreter-deletion is, actually, already done by Rivet_ChildHandlers(). Thus, all that's needed is to simply stop calling Tcl_Finalize.

Tested (under valgrind) on FreeBSD-9.1/amd64.
Comment 8 Massimo Manghi 2013-02-12 21:54:10 UTC
fixed in trunk