Bug 4852 - [review] child_handler isn't catching children that exit on their own before a ping discovers the child has exited
Summary: [review] child_handler isn't catching children that exit on their own before ...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 normal
Target Milestone: 3.1.2
Assignee: Daryl C. W. O'Shea
URL:
Whiteboard: go
Keywords:
Depends on:
Blocks:
 
Reported: 2006-04-01 10:40 UTC by Daryl C. W. O'Shea
Modified: 2006-04-15 16:34 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
patch patch None Daryl C. W. O'Shea [HasCLA]
patch patch None Daryl C. W. O'Shea [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Daryl C. W. O'Shea 2006-04-01 10:40:24 UTC
http://marc.theaimsgroup.com/?l=spamassassin-users&m=114382841527773&w=2

When a child process calls exit(), the child is still pinged (unsuccessfully)
later.  A new child isn't spawned until this happens.

Looking briefly, it appears that child_handler() should be preventing this, but
apparently isn't.
Comment 1 Daryl C. W. O'Shea 2006-04-02 05:03:58 UTC
It appears that, under load, child_handler() is being called but isn't being
completed before main_server_poll() ends up doing a ping.


Mar 30 01:51:57 quark spamd[98553]: __alarm__
Mar 30 01:51:57 quark spamd[98553]: __alarm__
Mar 30 01:51:57 quark spamd[95999]: spamd: result: Y 11 -
BAYES_99,DATE_IN_FUTURE_12_24,DCC_CHECK,MSGID_FROM_MTA_ID,SARE_SPEC_XXGEOCITIES2
scantime=147.4,size
=1350,user=mithin,uid=58,required_score=5.0,rhost=prime.gushi.org,raddr=65.125.228.130,rport=4909,mid=<200603300633.k2U6XDgD057359@prime.gushi.org>,bayes=0.99
999022835033,autolearn=no
Mar 30 01:51:57 quark spamd[18992]: spamd: copy_config timeout (with empty $@),
respawning child process after 3 messages at /usr/local/bin/spamd line 982.
Mar 30 01:51:57 quark spamd[76128]: prefork: child states:
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Mar 30 01:51:57 quark spamd[76128]: prefork: server reached --max-children
setting, consider raising it
Mar 30 01:52:05 quark spamd[19026]: __alarm__
Mar 30 01:52:07 quark spamd[19026]: __alarm__
Mar 30 01:52:07 quark spamd[19026]: spamd: copy_config timeout (with empty $@),
respawning child process after 1 messages at /usr/local/bin/spamd line 982.
Mar 30 01:52:13 quark spamd[76128]: syswrite() on closed filehandle GEN118 at
/usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451.
Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation
(.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF
orkScaling.pm line 330.
Mar 30 01:52:13 quark spamd[76128]: prefork: write of ping failed to 98553 fd=:
 at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
 line 330.
Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation
(.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF
orkScaling.pm line 127.
Mar 30 01:52:13 quark spamd[76128]: prefork: killing failed child 98553 fd= at
/usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line
 127.
Mar 30 01:52:13 quark spamd[76128]: prefork: killed child 98553 at
/usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 141.
Mar 30 01:52:13 quark spamd[76128]: syswrite() on closed filehandle GEN400 at
/usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451.
Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation
(.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF
orkScaling.pm line 330.
Mar 30 01:52:13 quark spamd[76128]: prefork: write of ping failed to 18935 fd=:
 at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
 line 330.
Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation
(.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF
orkScaling.pm line 127.
Mar 30 01:52:13 quark spamd[76128]: prefork: killing failed child 18935 fd= at
/usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line
 127.
Mar 30 01:52:13 quark spamd[76128]: prefork: killed child 18935 at
/usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 141.
Mar 30 01:52:13 quark spamd[76128]: syswrite() on closed filehandle GEN406 at
/usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451.
Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation
(.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF
orkScaling.pm line 330.
Mar 30 01:52:13 quark spamd[76128]: prefork: write of ping failed to 18992 fd=:
 at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
 line 330.
Mar 30 01:52:13 quark spamd[76128]: Use of uninitialized value in concatenation
(.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdF
orkScaling.pm line 127.
Mar 30 01:52:13 quark spamd[76128]: prefork: killing failed child 18992 fd= at
/usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line
 127.
Mar 30 01:52:13 quark spamd[76128]: prefork: killed child 18992 at
/usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 141.
Mar 30 01:52:14 quark spamd[76128]: spamd: handled cleanup of child pid 18992
due to SIGCHLD
Mar 30 01:52:14 quark spamd[76128]: spamd: handled cleanup of child pid 18935
due to SIGCHLD
Mar 30 01:52:14 quark spamd[76128]: spamd: handled cleanup of child pid 98553
due to SIGCHLD


So far it looks like it's only cosmetic...
Comment 2 Daryl C. W. O'Shea 2006-04-05 20:46:38 UTC
Created attachment 3451 [details]
patch

[dos@silver trunk]$ svn ci -m "bug 4852: race condition in spamd pre-fork
causes harmless warnings in logs, fixed"
Sending        lib/Mail/SpamAssassin/SpamdForkScaling.pm
Transmitting file data .
Committed revision 391755.
[dos@silver trunk]$
Comment 3 Daryl C. W. O'Shea 2006-04-05 20:48:39 UTC
Normally I wouldn't bother with getting this in 3.1, but we've been trying to
silence all the harmless spamd pre-fork warnings so we can actually track down
the real bugs in it.

Please review for 3.1.2.
Comment 4 Justin Mason 2006-04-05 21:03:00 UTC
hold on -- does it actually fix the bug? ;)

+1 if so
Comment 5 Daryl C. W. O'Shea 2006-04-05 21:34:04 UTC
Created attachment 3452 [details]
patch

Check to see if $sock is defined too... as much fun as "floating point numbers
with Matt" on the users@ list this morning was, insomnia sucks.


I can't reproduce the bug... it only sometimes happens under high load.  I'm
going by what I see in Dan's log extract.

As far as I can tell, the bug is purely cosmetic.  The problem is that under
load, Perl takes it's sweat ass time to process the child's SIGCHLD handler
after the child kills itself.

It appears that the socket file handle is immediately closed, but
main_server_poll() can end up trying a ping before the SIGCHLD handler removes
the child's PID info from the hash.

So, it should fix the problem (crap in the logs).  It probably doesn't even
need to try and kill the child again, but doing so didn't hurt Dan's spamd
process, so I'm inclined to leave it in just incase $sock gets screwed up for
some other reason.
Comment 6 Sidney Markowitz 2006-04-08 01:40:13 UTC
+1

It appears worth doing, even though I can't reproduce the bug to verify the
positive reasons to do this.
Comment 7 Sidney Markowitz 2006-04-08 23:33:38 UTC
Justin, does your vote still count for the latest patch, or do we still need
another vote?
Comment 8 Justin Mason 2006-04-15 15:06:06 UTC
ok, +1
Comment 9 Daryl C. W. O'Shea 2006-04-15 23:34:25 UTC
Sending        lib/Mail/SpamAssassin/SpamdForkScaling.pm
Transmitting file data .
Committed revision 394400.
[dos@silver 3.1]$