4650 – child timeout processing eventually causes spamd to hang

Bug 4650 - child timeout processing eventually causes spamd to hang

Summary: child timeout processing eventually causes spamd to hang

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	spamc/spamd (show other bugs)
Version:	3.1.0
Hardware:	HP Linux

Importance:	P5 normal
Target Milestone:	3.3.0
Assignee:	SpamAssassin Developer Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-10-28 16:12 UTC by James E.J. Bottomley
Modified:	2008-04-21 02:03 UTC (History)
CC List:	2 users (show)

Attachment	Type	Modified	Status	Actions	Submitter/CLA Status
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description James E.J. Bottomley 2005-10-28 16:12:29 UTC

This problem is seen with the debian version: 3.1.0a-1

I run spamassassin on a very slow machine (HP B180) which means that operations
take much longer than on a modern ix86 system.  The symptoms I see are that
eventually spamd fails to fork.  when I do a ps, there are 5 (from the
--max-children 5 argument) spamd processess, all owned by my users, all idle but
none returning to the spamd fork pool.  For each of them there's a message in
the log saying

spamd[26351]: bayes: expire_old_tokens: child processing timeout at
/usr/sbin/spamd line1088. 

(however, there are more of these lines than the five children).  What it looks
like is that there's some error leg after timeout where the child fails to
return to the prefork pool.

On an unmodified spamd installation, this causes spamd to become unusable after
about a day of processing emails.

I've worked around this problem on my system by adding the option 
--timeout-child 3600 and now spamd has been running properly for the last week.

Comment 1 Dallas Engelken 2005-10-28 17:08:39 UTC

see bug 3828 for my comments regarding child timeouts and bayes expiry.

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3828#c38

and theo's response on c39

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3828#c39

i guess the secondary helper call never made it into 3.1..

d

Comment 2 Dallas Engelken 2005-10-28 17:12:38 UTC

also, to avoid relying on auto-expiry by spamd, you can disable auto-expiry, 
and run sa-learn --force-expire (as the user you run spamd as) via cron on a 
semi regular basis.  this will mitigate the issue until a better solution is in 
place for auto-expiry via spamd.

Comment 3 Dallas Engelken 2005-10-28 17:22:19 UTC

okay.. i hate to keep writing follow-ups, but i keep thinking of things. ;)

have you converted to sql-driven bayes??  my token expiration runs usually take 
less than 10 seconds now that i have bayes in sql.  this may be a more elegant 
solution to the problem. 

# time sa-learn --force-expire

real    0m3.894s
user    0m1.220s
sys     0m0.060s

Comment 4 James E.J. Bottomley 2005-10-28 17:47:36 UTC

Actually, I'm perfectly happy currently with the solution I'm employing.  The
reason for reporting isn't the time it takes (I have CPU time in abundance on
the network gateway) it's the bug that the child processes don't return to the
prefork pool causing spamd to hang eventually.  That one needs looking at before
it trips someone else up.

Comment 5 Justin Mason 2006-12-11 04:06:24 UTC

if you can ever capture -D debug logs, or strace logs, for this situation, that
would be very helpful.

Comment 6 Justin Mason 2006-12-12 12:40:18 UTC

moving RFEs and low-priority stuff to 3.3.0 target

Comment 7 Justin Mason 2008-04-21 02:03:36 UTC

this is a very old bug, and hasn't been touched in several years; it's probably not an issue with current versions.  If this is not the case, feel free to reopen.