Bug 898 - Memory & CPU hogging on network outage
Summary: Memory & CPU hogging on network outage
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: 2.41
Hardware: PC Linux
: P5 normal
Target Milestone: 2.50
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords: dns
Depends on:
Blocks:
 
Reported: 2002-09-16 02:07 UTC by Matthias Andree
Modified: 2003-02-07 23:04 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Andree 2002-09-16 02:07:29 UTC
Yesterday, my DSL link was down. I saw some process hogging the CPU, but did not
pay attention, and some hours later, my 320 MB RAM machine was 350 MB deep into
swap (it's normally well below 20 MB swap usage). 

I figured that some spamd child process was looping and allocating ever more
memory. I could not quite figure where this happened, I straced at no avail.
Kill -15 did not help, kill -9 of course did.

Please send further instructions on how to track this down. I do know Perl, but
I am totally unacquainted with SpamAssassin's and spamd's internal workings.
Comment 1 Duncan Findlay 2002-09-20 20:57:13 UTC
Ummm... I just checked in a possible fix for this. Could you check the latest
CVS (HEAD branch) and try to reproduce?
Comment 2 Matthias Andree 2002-09-23 06:59:39 UTC
This does NOT fix the problem for me, regretfully.
Comment 3 Jamin W. Collins 2002-09-23 20:34:30 UTC
I am seeing a similiar problem with SA on Debian.  I'm running version 2.41 on
Debian sarge.  SA is invoked through maildrop's xfilter command.  Normally,
things work just fine.  However, from time to time, an SA process hangs around
and grows to a little over 200 megs in memory usage (as reported by top) and
~100% CPU usage.
Comment 4 Theo Van Dinter 2002-12-21 21:34:29 UTC
Might this perchance be related to bug 1151 (http://www.hughes-
family.org/bugzilla/show_bug.cgi?id=1151)?  Do you have Razor1 or Razor2 
enabled?  Does the problem go away if you disable one or both?
Comment 5 Matthias Andree 2003-01-04 09:01:47 UTC
I don't have either Razor version activated, so there is nothing to disable here. 
Comment 6 Jeremy Lin 2003-01-09 15:39:37 UTC
I'm also seeing a problem like this on Solaris. According to my procmail log,
this morning at about 6:30 am, procmail started reporting

procmail: Timeout, terminating "/opt/really-local/bin/spamc"

and by 6:50 am, it was saying

procmail: Program failure (74) of "/opt/really-local/bin/spamc"

for most incoming messages. By 3pm or so, when I first checked mail today, the
various unterminated spamd processes were using 800 MB. This isn't the first
time this has happened. I think it's happened about 3 other times, and I had
thought it might have to do with the auto-whitelisting database getting pretty
large (~5 MB), but I restarted spamd on 1/5 without -a, and the problem showed
up just 4 days later (it usually seemed to happen about once a month or so).
Comment 7 Allen Smith 2003-01-09 19:06:43 UTC
> I don't have either Razor version activated, so there is nothing to disable here.

Huh. Same for you, Jeremy? Other than Razor, the main ones that have had problems
recently are orbs.dorkslayers.com (RCVD_IN_ORBS) and relays.osirusoft.com (any
number of ones). And this is just with spamc/spamd? I'm wondering if socket
problems may be happening with spamd.

     -Allen
Comment 8 Jeremy Lin 2003-01-10 00:01:20 UTC
Actually, no. I do have remote tests, and spamd just stopped working for several
hours today. Using -L to use only local tests seems to fix the problem, so it
might be that what I experienced doesn't apply to this bug after all. Will have
to see if something like this happens again with the -L.
Comment 9 Allen Smith 2003-01-10 16:27:18 UTC
Jeremy:

> Actually, no. I do have remote tests,

I'm sorry, I evidently wasn't clear enough. Do you have Razor/Razor2 going or not,
as opposed to other remote tests (DNS/RBL, DCC, Pyzor...)?

     Thanks,

     -Allen
Comment 10 Jeremy Lin 2003-01-10 16:29:28 UTC
Allen, yes, I have Razor1.
Comment 11 Allen Smith 2003-01-11 09:06:24 UTC
Jeremy:

> Allen, yes, I have Razor1.

Hmm... that makes it uncertain whether you're having Razor (Bug 1151) or DNS
problems causing this, unfortunately. Any error messages about Razor (or DNS/RBL)?

    -Allen
Comment 12 Theo Van Dinter 2003-02-03 21:52:43 UTC
Any updates on this?  It looks as if the issue was caused by Razor1 going 
haywire.  Thanks.
Comment 13 Justin Mason 2003-02-08 08:04:22 UTC
Assuming the disabling of RAzor1 will have fixed this.  If not, the reporter
can reopen the bug ;)