Bug 5801 - spamd collecting segfaults in log
Summary: spamd collecting segfaults in log
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: 3.2.4
Hardware: Other other
: P5 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL: https://bugzilla.redhat.com/show_bug....
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-29 16:32 UTC by Michal Jaegermann
Modified: 2008-01-30 10:50 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Michal Jaegermann 2008-01-29 16:32:50 UTC
Relatively recently I started to collect in logs messages like that:

Jan 27 06:39:51 ... spamd[27361]: segfault at 0000008d0369c600 rip
0000003542af0c25 rsp 00007fff5d241a80 error 4
Jan 27 10:42:03 ... spamd[6959]: segfault at 0000000006d80398 rip
000000352e8708ab rsp 00007fff5d241a10 error 4
Jan 27 11:57:32 ... spamd[9533]: segfault at 00000052432d54e1 rip
0000003542a56e69 rsp 00007fff5d241c40 error 4
Jan 27 13:15:28 ... spamd[10396]: segfault at 00000000059620b8 rip
000000352e8708ab rsp 00007fff5d241a10 error 4
Jan 27 15:46:17 ... spamd[11386]: segfault at 0000004100000030 rip
0000003542af260a rsp 00007fff5d241b70 error 4
Jan 28 08:37:46 ... spamd[21679]: segfault at 000000182d51aa48 rip
0000003542a2861d rsp 00007fff5d241b40 error 4
Jan 28 14:56:03 ... spamd[4674] general protection rip:3542af27da
rsp:7fff07917200 error:0
Jan 28 18:06:51 ... spamd[5879]: segfault at 0000000004b65000 rip
000000352e87b8ab rsp 00007fff079172f8 error 4
Jan 28 19:25:46 ... spamd[6415]: segfault at 000000000581fe98 rip
0000003542af27da rsp 00007fff07917200 error 4
Jan 28 20:41:57 ... spamd[9396]: segfault at 0000000126710768 rip
0000003542a2861d rsp 00007fff07917360 error 4

This is on a Fedora 8 installation.  The last four entries are from
a freshly installed 3.2.4; earlies ones from 3.2.3.  So far nothing
from today and first samples I noticed are from January 14th. No
obvious candidates in other, possibly related, software changed on
that or close preceding date.

In mail logs I see a corresponding lines like these:
... spamd[9396]: spamd: processing message <479EA06A.9020300@verizon.net> for ...
... spamd[4671]: prefork: child states: BI
... spamd[4671]: spamd: handled cleanup of child pid 9396 due to SIGCHLD
... spamd[4671]: spamd: server successfully spawned child process, pid 11150

Segfaults are always in children forked to handle specific messages
so a general operation is not affected and a parent spamd continues
to run.

Looking closer through logs all these incidents happen with mail
to one account and on that account .procmailrc is present and nearly
all mail (there are some other, rare, cases) is handled like that:

:0:
* ^X-Spam-Flag:.*YES
/dev/null

:0
! send@somewhere.else

That .procmailrc was not changed for over two years.

This is the only mail server on x86_64 which I can watch in such
detail and with a significant amount of mail traffic.  I have another
server which handles even more mail, and it has spamassassin-3.2.4
in operation, but that one is on an i386 machine.  I did not observe
anything of the similar sort there.

This was initially reported for 3.2.3 as
https://bugzilla.redhat.com/show_bug.cgi?id=430167
Comment 1 Justin Mason 2008-01-30 01:35:59 UTC
that sounds pretty unusual.  bear in mind that the spamd children are the
processes that do all the heavy CPU and RAM work -- have you run memtest
recently?  any chance you could capture strace output for one of the crashing
processes?
Comment 2 Mark Martinec 2008-01-30 04:10:54 UTC
I'm aware of two sources of crashes within SpamAssassin:

One is a runaway regexp on large paragraphs (typically HTML long tables),
which is already described in bug 5795 and bug 5717.  I'm seeing about
one or two per week recently, these are more frequent now than last year.

The other source of crashes due to memory exhaustion (that I'm aware of)
comes from third-party plugin ImageCheck, or better, its underlying
perl module Image::Info. It was reported to the author, but is not easy
to fix. I stopped using Plugin::ImageCheck for this reason.

Try to obtain the message that caused a crash, and see if it falls
into one or the other category.
Comment 3 Michal Jaegermann 2008-01-30 10:18:05 UTC
> that sounds pretty unusual.

Agreed.

> bear in mind that the spamd children are the
> processes that do all the heavy CPU and RAM work

Yes, I am aware of that.

> have you run memtest recently?

That machine performs various duties and a number of those pretty
memory and CPU intensive.  It would be quite surprising if memory
faults would be so selective that only spamd would be hit without
affecting any other process.

Looking through logs it appears that there is a common factor
in those incidents although I cannot be sure if that is indeed
something which shows up always.  "from=<...>" lines are pretty long,
i.e. over 90 characters.  OTOH many more messages where these strings
are of a similar length are processed without any troubles.

> any chance you could capture strace output for one of the crashing
> processes?

I am not sure how.  These are spawned children of a long running
process.  Moreover so far "Jan 28 20:41:57" happens to be for some
reasons the last recorded crash of that sort.  I rather hoped that
maybe somebody already seen something similar.  I really would love
to have some test case but so far I could not find one.
Comment 4 Michal Jaegermann 2008-01-30 10:50:12 UTC
> One is a runaway regexp on large paragraphs (typically HTML long tables),

That one could be not a bad candidate here although I could not really
tell.

> The other source of crashes due to memory exhaustion (that I'm aware of)
> comes from third-party plugin ImageCheck

I do not have that plugin installed.