SA Bugzilla – Bug 4852
[review] child_handler isn't catching children that exit on their own before a ping discovers the child has exited
Last modified: 2006-04-15 16:34:25 UTC
http://marc.theaimsgroup.com/?l=spamassassin-users&m=114382841527773&w=2 When a child process calls exit(), the child is still pinged (unsuccessfully) later. A new child isn't spawned until this happens. Looking briefly, it appears that child_handler() should be preventing this, but apparently isn't.
It appears that, under load, child_handler() is being called but isn't being completed before main_server_poll() ends up doing a ping. Mar 30 01:51:57 quark spamd[98553]: __alarm__ Mar 30 01:51:57 quark spamd[98553]: __alarm__ Mar 30 01:51:57 quark spamd[95999]: spamd: result: Y 11 - BAYES_99,DATE_IN_FUTURE_12_24,DCC_CHECK,MSGID_FROM_MTA_ID,SARE_SPEC_XXGEOCITIES2 scantime=147.4,size =1350,user=mithin,uid=58,required_score=5.0,rhost=prime.gushi.org,raddr=65.125.228.130,rport=4909,mid=<200603300633.k2U6XDgD057359@prime.gushi.org>,bayes=0.99 999022835033,autolearn=no Mar 30 01:51:57 quark spamd[18992]: spamd: copy_config timeout (with empty $@), respawning child process after 3 messages at /usr/local/bin/spamd line 982. Mar 30 01:51:57 quark spamd[76128]: prefork: child states: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Mar 30 01:51:57 quark spamd[76128]: prefork: server reached --max-children setting, consider raising it Mar 30 01:52:05 quark spamd[19026]: __alarm__ Mar 30 01:52:07 quark spamd[19026]: __alarm__ Mar 30 01:52:07 quark spamd[19026]: spamd: copy_config timeout (with empty $@), respawning child process after 1 messages at /usr/local/bin/spamd line 982. Mar 30 01:52:13 quark spamd[76128]: syswrite() on closed filehandle GEN118 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451. Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF orkScaling.pm line 330. Mar 30 01:52:13 quark spamd[76128]: prefork: write of ping failed to 98553 fd=: at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 330. Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF orkScaling.pm line 127. Mar 30 01:52:13 quark spamd[76128]: prefork: killing failed child 98553 fd= at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 127. Mar 30 01:52:13 quark spamd[76128]: prefork: killed child 98553 at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 141. Mar 30 01:52:13 quark spamd[76128]: syswrite() on closed filehandle GEN400 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451. Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF orkScaling.pm line 330. Mar 30 01:52:13 quark spamd[76128]: prefork: write of ping failed to 18935 fd=: at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 330. Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF orkScaling.pm line 127. Mar 30 01:52:13 quark spamd[76128]: prefork: killing failed child 18935 fd= at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 127. Mar 30 01:52:13 quark spamd[76128]: prefork: killed child 18935 at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 141. Mar 30 01:52:13 quark spamd[76128]: syswrite() on closed filehandle GEN406 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451. Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF orkScaling.pm line 330. Mar 30 01:52:13 quark spamd[76128]: prefork: write of ping failed to 18992 fd=: at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 330. Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF orkScaling.pm line 127. Mar 30 01:52:13 quark spamd[76128]: prefork: killing failed child 18992 fd= at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 127. Mar 30 01:52:13 quark spamd[76128]: prefork: killed child 18992 at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 141. Mar 30 01:52:14 quark spamd[76128]: spamd: handled cleanup of child pid 18992 due to SIGCHLD Mar 30 01:52:14 quark spamd[76128]: spamd: handled cleanup of child pid 18935 due to SIGCHLD Mar 30 01:52:14 quark spamd[76128]: spamd: handled cleanup of child pid 98553 due to SIGCHLD So far it looks like it's only cosmetic...
Created attachment 3451 [details] patch [dos@silver trunk]$ svn ci -m "bug 4852: race condition in spamd pre-fork causes harmless warnings in logs, fixed" Sending lib/Mail/SpamAssassin/SpamdForkScaling.pm Transmitting file data . Committed revision 391755. [dos@silver trunk]$
Normally I wouldn't bother with getting this in 3.1, but we've been trying to silence all the harmless spamd pre-fork warnings so we can actually track down the real bugs in it. Please review for 3.1.2.
hold on -- does it actually fix the bug? ;) +1 if so
Created attachment 3452 [details] patch Check to see if $sock is defined too... as much fun as "floating point numbers with Matt" on the users@ list this morning was, insomnia sucks. I can't reproduce the bug... it only sometimes happens under high load. I'm going by what I see in Dan's log extract. As far as I can tell, the bug is purely cosmetic. The problem is that under load, Perl takes it's sweat ass time to process the child's SIGCHLD handler after the child kills itself. It appears that the socket file handle is immediately closed, but main_server_poll() can end up trying a ping before the SIGCHLD handler removes the child's PID info from the hash. So, it should fix the problem (crap in the logs). It probably doesn't even need to try and kill the child again, but doing so didn't hurt Dan's spamd process, so I'm inclined to leave it in just incase $sock gets screwed up for some other reason.
+1 It appears worth doing, even though I can't reproduce the bug to verify the positive reasons to do this.
Justin, does your vote still count for the latest patch, or do we still need another vote?
ok, +1
Sending lib/Mail/SpamAssassin/SpamdForkScaling.pm Transmitting file data . Committed revision 394400. [dos@silver 3.1]$