|Summary:||spamd collecting segfaults in log|
|Product:||Spamassassin||Reporter:||Michal Jaegermann <michal>|
|Component:||spamc/spamd||Assignee:||SpamAssassin Developer Mailing List <dev>|
Description Michal Jaegermann 2008-01-29 16:32:50 UTC
Relatively recently I started to collect in logs messages like that: Jan 27 06:39:51 ... spamd: segfault at 0000008d0369c600 rip 0000003542af0c25 rsp 00007fff5d241a80 error 4 Jan 27 10:42:03 ... spamd: segfault at 0000000006d80398 rip 000000352e8708ab rsp 00007fff5d241a10 error 4 Jan 27 11:57:32 ... spamd: segfault at 00000052432d54e1 rip 0000003542a56e69 rsp 00007fff5d241c40 error 4 Jan 27 13:15:28 ... spamd: segfault at 00000000059620b8 rip 000000352e8708ab rsp 00007fff5d241a10 error 4 Jan 27 15:46:17 ... spamd: segfault at 0000004100000030 rip 0000003542af260a rsp 00007fff5d241b70 error 4 Jan 28 08:37:46 ... spamd: segfault at 000000182d51aa48 rip 0000003542a2861d rsp 00007fff5d241b40 error 4 Jan 28 14:56:03 ... spamd general protection rip:3542af27da rsp:7fff07917200 error:0 Jan 28 18:06:51 ... spamd: segfault at 0000000004b65000 rip 000000352e87b8ab rsp 00007fff079172f8 error 4 Jan 28 19:25:46 ... spamd: segfault at 000000000581fe98 rip 0000003542af27da rsp 00007fff07917200 error 4 Jan 28 20:41:57 ... spamd: segfault at 0000000126710768 rip 0000003542a2861d rsp 00007fff07917360 error 4 This is on a Fedora 8 installation. The last four entries are from a freshly installed 3.2.4; earlies ones from 3.2.3. So far nothing from today and first samples I noticed are from January 14th. No obvious candidates in other, possibly related, software changed on that or close preceding date. In mail logs I see a corresponding lines like these: ... spamd: spamd: processing message <479EA06A.email@example.com> for ... ... spamd: prefork: child states: BI ... spamd: spamd: handled cleanup of child pid 9396 due to SIGCHLD ... spamd: spamd: server successfully spawned child process, pid 11150 Segfaults are always in children forked to handle specific messages so a general operation is not affected and a parent spamd continues to run. Looking closer through logs all these incidents happen with mail to one account and on that account .procmailrc is present and nearly all mail (there are some other, rare, cases) is handled like that: :0: * ^X-Spam-Flag:.*YES /dev/null :0 ! firstname.lastname@example.org That .procmailrc was not changed for over two years. This is the only mail server on x86_64 which I can watch in such detail and with a significant amount of mail traffic. I have another server which handles even more mail, and it has spamassassin-3.2.4 in operation, but that one is on an i386 machine. I did not observe anything of the similar sort there. This was initially reported for 3.2.3 as https://bugzilla.redhat.com/show_bug.cgi?id=430167
Comment 1 Justin Mason 2008-01-30 01:35:59 UTC
that sounds pretty unusual. bear in mind that the spamd children are the processes that do all the heavy CPU and RAM work -- have you run memtest recently? any chance you could capture strace output for one of the crashing processes?
Comment 2 Mark Martinec 2008-01-30 04:10:54 UTC
I'm aware of two sources of crashes within SpamAssassin: One is a runaway regexp on large paragraphs (typically HTML long tables), which is already described in bug 5795 and bug 5717. I'm seeing about one or two per week recently, these are more frequent now than last year. The other source of crashes due to memory exhaustion (that I'm aware of) comes from third-party plugin ImageCheck, or better, its underlying perl module Image::Info. It was reported to the author, but is not easy to fix. I stopped using Plugin::ImageCheck for this reason. Try to obtain the message that caused a crash, and see if it falls into one or the other category.
Comment 3 Michal Jaegermann 2008-01-30 10:18:05 UTC
> that sounds pretty unusual. Agreed. > bear in mind that the spamd children are the > processes that do all the heavy CPU and RAM work Yes, I am aware of that. > have you run memtest recently? That machine performs various duties and a number of those pretty memory and CPU intensive. It would be quite surprising if memory faults would be so selective that only spamd would be hit without affecting any other process. Looking through logs it appears that there is a common factor in those incidents although I cannot be sure if that is indeed something which shows up always. "from=<...>" lines are pretty long, i.e. over 90 characters. OTOH many more messages where these strings are of a similar length are processed without any troubles. > any chance you could capture strace output for one of the crashing > processes? I am not sure how. These are spawned children of a long running process. Moreover so far "Jan 28 20:41:57" happens to be for some reasons the last recorded crash of that sort. I rather hoped that maybe somebody already seen something similar. I really would love to have some test case but so far I could not find one.
Comment 4 Michal Jaegermann 2008-01-30 10:50:12 UTC
> One is a runaway regexp on large paragraphs (typically HTML long tables), That one could be not a bad candidate here although I could not really tell. > The other source of crashes due to memory exhaustion (that I'm aware of) > comes from third-party plugin ImageCheck I do not have that plugin installed.