SA Bugzilla – Bug 4594
[review] spamd dies unexpectedly: prefork: ordered child to accept, but child reported state '1'
Last modified: 2006-12-11 11:53:40 UTC
Upgraded from 3.0.4 to 3.1.0. No further configuration change. etc/mail/spamassassin/local.cf contains just "report_safe 0". Starting spamd as /usr/local/bin/spamd -c -d -r /var/run/spamd/spamd.pid -u spamd -H /home/spamd - x -i merak.origin-it.net -A 194.8.96.226,194.8.96.234,194.8.96.250 -m 15 Syslog before crash: Sep 19 21:28:57 merak spamd[52026]: prefork: child states: IBBI Sep 19 21:28:58 merak spamd[52026]: prefork: child states: BIBI Sep 19 21:29:03 merak spamd[52026]: prefork: child states: IIII Sep 19 21:29:03 merak spamd[52026]: spamd: handled cleanup of child pid 60800 due to SIGCHLD Sep 19 21:29:03 merak spamd[52026]: prefork: child states: III Sep 19 21:29:03 merak spamd[52026]: spamd: handled cleanup of child pid 60797 due to SIGCHLD Sep 19 21:29:03 merak spamd[52026]: prefork: child states: II Sep 19 21:29:03 merak spamd[52026]: prefork: child states: II Sep 19 21:29:40 merak spamd[52026]: prefork: child states: II Sep 19 21:29:44 merak spamd[52026]: prefork: child states: II Sep 19 21:29:48 merak spamd[52026]: prefork: child states: II Sep 19 21:29:54 merak spamd[52026]: prefork: child states: BB Sep 19 21:29:54 merak spamd[52026]: spamd: server successfully spawned child process, pid 60801 Sep 19 21:29:54 merak spamd[52026]: prefork: child states: BBI Sep 19 21:29:54 merak spamd[52026]: prefork: child states: III Sep 19 21:29:54 merak spamd[52026]: spamd: handled cleanup of child pid 60801 due to SIGCHLD Sep 19 21:29:54 merak spamd[52026]: prefork: child states: II Sep 19 21:30:01 merak spamd[52026]: prefork: ordered child to accept, but child reported state '1' at /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/ SpamdForkScaling.pm line 390.
I also upgraded from 3.0.4 to 3.1.0 Spamd is dieing every few hours. I tested limiting the children, and tweaking those settings however death still occurs. Sep 21 04:36:43 spamd6 spamd[22770]: prefork: child states: BBBBBBBBI Sep 21 04:36:43 spamd6 spamd[22770]: prefork: ordered child to accept, but child reported state '1' at /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 390. Sep 21 07:18:08 spamd6 spamd[58292]: prefork: ordered child to accept, but child reported state '1' at /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 390.
I have recreated the exact same problem on 5 different mailservers. We were upgrading from 3.0.3 to 3.1.0 on Linux 2.4.22. spamd dies every 4-12 hours. Sep 21 04:05:56 jsmx3 spamd[969]: prefork: child states: III Sep 21 04:05:56 jsmx3 spamd[969]: spamd: handled cleanup of child pid 12623 due to SIGCHLD Sep 21 04:05:56 jsmx3 spamc[12816]: connect(AF_INET) to spamd at 127.0.0.1 failed, retrying (#1 of 3): Connection refused Sep 21 04:05:57 jsmx3 spamc[12816]: connect(AF_INET) to spamd at 127.0.0.1 failed, retrying (#2 of 3): Connection refused Sep 21 04:05:58 jsmx3 spamc[12816]: connect(AF_INET) to spamd at 127.0.0.1 failed, retrying (#3 of 3): Connection refused Sep 21 04:05:59 jsmx3 spamc[12816]: connection attempt to spamd aborted after 3 retries
1. Simon, are you sure it's the same bug? the 'prefork: ordered child to accept, but child reported state '1' at /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 390.' line does not appear in that syslog output you posted. If that error does not appear, it's not the same bug. 2. Helge, Ryan, can you provide ktrace output for the spamd processes? we'll need dumps from the master spamd process at least. 3. What exactly happens after that message appears in the syslog? in other words what does "death" mean?
ad 2) will do. ad 3) "Death" means that spamd is gone. The process just terminates. Note that said line 390 is a "die" statement, and that's just what happens. Of course there is nothing in the syslog after the spamd process has terminated.
I will gather a ktrace for ya. Just like Helge states, the spamd process terminates. With debugging on you still see the other children process(s) finish up. With debugging off all you see is 'line 390.' (end of log) Sep 21 09:29:37 spamd4 spamd[58422]: prefork: ordered child to accept, but child reported state '1' at /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 390. Sep 21 09:29:37 spamd4 spamd[58554]: spamd: connection from mx5.oct.nac.net [209.123.233.205] at port 64776 Sep 21 09:29:37 spamd4 spamd[58554]: spamd: processing message (...):0 Sep 21 09:29:37 spamd4 spamd[58801]: spamd: identified spam (28.5/5.0) for (...):0 in 5.5 seconds, 1653 bytes. Sep 21 09:29:37 spamd4 spamd[58801]: spamd: result: Y 28 - BAYES_99,DATE_IN_FUTURE_03_06,DCC_CHECK,DIGEST_MULTIPLE,HELO_DYNAMIC_IPADDR2,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DSBL,RCVD_IN_XBL,URIBL_JP_SURBL,URIBL_SBL scantime=5.5,size=1653,user=(...),uid=0,required_score=5.0,rhost=mx5.oct.nac.net,raddr=209.123.233.205,rport=61787,mid=(...),bayes=1,autolearn=spam Sep 21 09:29:38 spamd4 spamd[58775]: spamd: identified spam (32.4/4.8) for (...):0 in 5.4 seconds, 1231 bytes. Sep 21 09:29:38 spamd4 spamd[58775]: spamd: result: Y 32 - BAYES_99,DCC_CHECK,DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,REMOVE_BEFORE_LINK,REPLICA_WATCH,UNPARSEABLE_RELAY,URIBL_AB_SURBL,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL scantime=5.4,size=1231,user=(...),uid=0,required_score=4.8,rhost=mx5.oct.nac.net,raddr=209.123.233.205,rport=56482,mid=(...),bayes=1,autolearn=spam Sep 21 09:29:38 spamd4 spamd[58793]: spamd: identified spam (22.6/6.0) for (...):0 in 6.6 seconds, 3290 bytes. Sep 21 09:29:38 spamd4 spamd[58793]: spamd: result: Y 22 - BAYES_50,HTML_40_50,HTML_MESSAGE,MIME_HTML_MOSTLY,RCVD_DOUBLE_IP_LOOSE,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_NJABL_DUL,RCVD_IN_NJABL_PROXY,RCVD_IN_SORBS_DUL,RCVD_IN_XBL,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_WS_SURBL scantime=6.6,size=3290,user=(...),uid=0,required_score=6.0,rhost=mx2.oct.nac.net,raddr=209.123.233.202,rport=4633,mid=(...),bayes=0.499973650187907,autolearn=spam Sep 21 09:29:39 spamd4 spamd[58803]: spamd: identified spam (15.8/4.0) for (...):0 in 5.3 seconds, 2180 bytes. Sep 21 09:29:39 spamd4 spamd[58803]: spamd: result: Y 15 - BAYES_99,FUZZY_REFINANCE,HTML_MESSAGE,MIME_BOUND_NEXTPART,RCVD_IN_SORBS_DUL,URIBL_JP_SURBL,URIBL_SBL,URIBL_WS_SURBL scantime=5.3,size=2180,user=(...),uid=0,required_score=4.0,rhost=mx3.oct.nac.net,raddr=209.123.233.203,rport=4585,mid=(...),bayes=0.999999748904047,autolearn=no Sep 21 09:29:39 spamd4 spamd[58783]: spamd: identified spam (9.7/4.0) for (...)m:0 in 6.0 seconds, 3629 bytes. Sep 21 09:29:39 spamd4 spamd[58783]: spamd: result: Y 9 - BAYES_50,BODY_ENHANCEMENT2,INFO_TLD,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,URIBL_OB_SURBL,URIBL_WS_SURBL scantime=6.0,size=3629,user=(...),uid=0,required_score=4.0,rhost=mx5.oct.nac.net,raddr=209.123.233.205,rport=50425,mid=(...),bayes=0.499999389778673,autolearn=no Sep 21 09:29:42 spamd4 spamd[58554]: spamd: identified spam (45.9/6.0) for (...):0 in 4.8 seconds, 1401 bytes. Sep 21 09:29:42 spamd4 spamd[58554]: spamd: result: Y 45 - BAYES_99,DCC_CHECK,DIGEST_MULTIPLE,DK_POLICY_SIGNSOME,DK_POLICY_TESTING,DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,HELO_DYNAMIC_IPADDR2,INFO_TLD,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_XBL,SUBJECT_FUZZY_MEDS,SUBJ_BUY,URIBL_AB_SURBL,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL scantime=4.8,size=1401,user=(...),uid=0,required_score=6.0,rhost=mx5.oct.nac.net,raddr=209.123.233.205,rport=64776,mid=(...),bayes=1,autolearn=spam Sep 21 09:29:43 spamd4 spamd[58794]: spamd: identified spam (33.5/4.0) for (...):0 in 10.3 seconds, 1247 bytes. Sep 21 09:29:43 spamd4 spamd[58794]: spamd: result: Y 33 - BAYES_99,DCC_CHECK,RCVD_IN_WHOIS_INVALID,REMOVE_BEFORE_LINK,REPLICA_WATCH,SUBJ_YOUR_OWN,UNPARSEABLE_RELAY,URIBL_AB_SURBL,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL scantime=10.3,size=1247,user=(...),uid=0,required_score=4.0,rhost=mx5.oct.nac.net,raddr=209.123.233.205,rport=62609,mid=(...),bayes=1,autolearn=spam Sep 21 10:55:43 spamd4 spamd[62536]: logger: removing stderr method Sep 21 10:55:50 spamd4 spamd[62539]: spamd: server started on port 783/tcp (running version 3.1.0) Sep 21 10:55:50 spamd4 spamd[62539]: spamd: server pid: 62539 Just pre-fork info Sep 21 09:29:32 spamd4 spamd[58422]: prefork: child states: BBBBBBBBBBB Sep 21 09:29:32 spamd4 spamd[58422]: prefork: child states: BBBBBBBBBBBI Sep 21 09:29:32 spamd4 spamd[58422]: prefork: child states: BBBBBBBBBIBI Sep 21 09:29:32 spamd4 spamd[58422]: prefork: child states: BBBBBBBIBIBI Sep 21 09:29:32 spamd4 spamd[58422]: prefork: child states: BBBBBBBIBIBK Sep 21 09:29:33 spamd4 spamd[58422]: prefork: child states: BBBIBBBBBBB Sep 21 09:29:33 spamd4 spamd[58422]: prefork: child states: BBBBBBBBBBB Sep 21 09:29:33 spamd4 spamd[58422]: prefork: child states: BBBBBBBBBBBB Sep 21 09:29:33 spamd4 spamd[58422]: prefork: child states: BBBBBBBBBBBBI Sep 21 09:29:34 spamd4 spamd[58422]: prefork: child states: BBBBBBBBIBBBI Sep 21 09:29:37 spamd4 spamd[58422]: prefork: ordered child to accept, but child reported state '1' at /usr/local/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 390. Sep 21 10:55:50 spamd4 spamd[62539]: prefork: child states: BI Sep 21 10:55:55 spamd4 spamd[62539]: prefork: child states: BB Sep 21 10:55:55 spamd4 spamd[62539]: prefork: child states: BB Sep 21 10:55:55 spamd4 spamd[62539]: prefork: child states: BBB Sep 21 10:55:55 spamd4 spamd[62539]: prefork: child states: BBBB Sep 21 10:55:56 spamd4 spamd[62539]: prefork: child states: BBBBB Sep 21 10:55:56 spamd4 spamd[62539]: prefork: child states: BBBBBB Sep 21 10:55:56 spamd4 spamd[62539]: prefork: child states: BBBBBBB Sep 21 10:55:56 spamd4 spamd[62539]: prefork: child states: BBBBBBBB Sep 21 10:55:56 spamd4 spamd[62539]: prefork: child states: BBBBBBBBB
Created attachment 3146 [details] ktrace from spamd4.oct.nac.net (last 1000 lines)
Created attachment 3148 [details] ktrace from merak.origin-it.net (last 1000 lines) Original ktrace.out is 98 MB
Adding a me too. This is happening on Linux 2.6 and not FreeBSD for me though. 2.63 - 3.0.4 worked fine for me on the same server. "perl --version" output. This is perl, v5.8.7 built for i686-linux-thread-multi Let me know if you need anything else.
is there any way we can get a strace/ktrace of *both* the child and the parent, preferably either as two separate files with timestamps, or as one file with the lines interleaved? both traces so far have provided only half the picture, unfortunately; I'm still not sure what's happening.
oh btw I presume everyone knows that --round-robin is a workaround to avoid this issue?
Jason, can you provide the command I need to run to get the proper strace for you? Unfortunately mine hasn't crashed in about 24 hours so I don't know if it'l help. Should lowering/raising my children make it happen more often?
if you strace -fp ${pid} -o log, where ${pid} is the PID of the master process, then kill the kids so that it restarts new ones, that should get the strace covering all processes. however, I don't know of a way to cause this to happen sooner :(
What would be the equivalent for ktrace? It appears that ktrace -d does *not* catch the new children.
warning: strace seems to die when the log file hits 2gb and takes spamd with it. 2gb lasts me a couple hours and with spamd dying less often than that, it's hard to catch with strace.
*** Bug 4618 has been marked as a duplicate of this bug. ***
anyone else think it's odd that this is reproducable on FreeBSD, Linux, and Solaris at least? Is everyone else running Perl 5.8.x also?
Using Perl 5.8.5 and Solaris 8 on an Ultra 10 with no problems.
> Is everyone else running Perl 5.8.x also? No, other reports of this were uing 5.6.1 I assume that you have no problem when you run --round-robin as a spamd option?
yes, --round-robin works as a work around for me.
I'm running Perl 5.8.7 on FreeBsd 4.8 (2 machines), 5.3 (4 machines).
This is a me too with this same issue. I am seeing this on Solaris 8 and Solaris 10 with spamassassin 3.1.0, perl 5.8.5. I did not have this issue with previous spamassassin releases - 3.0.4, 3.0.3, 3.0.2, 3.0.1, and 3.0.0. I have just now restarted spamd with --round-robin and will see how this goes. Some more on my environment: Main "mail servers" (2) Sun Ultra 80's w/4x450, Solaris 8. These 2 handle around 20k messages a day between the 2 of them. The rest of my setup are mainly Solaris 10 computers - Ultra 30's, Ultra 60's, Blade 1000's, and Blade 2500's that handle only the email for the user that "owns" the workstation. I have had 1 or 2 (total) instances of spamd dieing on these personal workstations. I have have 3 or 4 instances of spamd dieing on the 2 main mail servers.
It would seem to not require a high load condition to happen. This is very strange as we have never had this problem on an Ultra 10, Solaris 8 and perl 5.8.5. What compiler did you use to build your perl? Did you use the default optimization? I used Sun's Forte Developer 7 C 5.4 and I think that I accepted the default optimization. Could this problem happen only for perl compiled with gcc?
I got to wondering if I might have something in my configuration that would prevent this problem, and perhaps I do. I have the following in my configuration: bayes_auto_expire 0 bayes_journal_max_size 0 bayes_learn_to_journal 1 To take care of the housekeeping, I have a cron job every 10 minutes that does a 'sa-learn --sync' and a cron job that runs once a day that does a 'sa-learn --force-expire'. That way, a child never has to do any bayes maintenance. The cron jobs run scripts that have logic to make sure that sa-learn is not run if another is still active. Perhaps this bug is triggered by a child doing maintenance.
(In reply to comment #23) Not in my case Tom. I actually have all the Bayes features disabled and the error still happened on my installation.
Apologies for commenting back so late however right after I rec'd the comment notification email our T1 line to the internet went down. I am replying to comment #22 from Tom Schulz... The perl in use is from sunfreeware.com. I belive that they use Sun's Forte compiler. The spamassassin (and other required) perl modules were "compiled" on Solaris 8 using gcc 3.4.2.
There are some major bugfixes for the apache-style preforking subsystem in bug 4696. If you are seeing this issue, please try applying the patch in that bug, and see if it helps.
Did apply the patch from bug 4696, but spamd still dies every couple of hours at random intervals.
The latest patches right (id 3366 or 3368)? Has the logging output changed?
Yes, latest patches "aggregated patch, against the 3.1.0 release tarball, take 2" dated 2006.02.11 02:08. No, I still get: Feb 15 23:02:07 merak spamd[99234]: prefork: ordered child to accept, but child reported state '1' at /usr/local/lib/perl5/site_perl/5.8.7/ Mail/SpamAssassin/SpamdForkScaling.pm line 433. However, the "tcp timeout" messages have disappeared.
Triage time. This may be a severity critical bug, but does anyone think it should block the 3.1.1 release now that bug 4696 has been closed? I would like to retarget this to 3.1.2 unless anyone objects or has a fix. To clarify the state of the problem: It seems that Helge still sees it with the latest patches from bug 4696, is that still true? The other people who have reported problems, Randy, Luke, Tom, and Ryan, do you still see it with the latest patches, or is it now fixed for you? Is there anyone for whome the --round-robin workaround not work? Especially if --round-robin is a workaround that works for everyone, I want to retarget this to 3.1.2. I'll do that if no objections surface soon
Definitely not a blocker.
Does the latest round of patches address spamd not releasing the port before existing when using --roundrobin? When killing the spamd pid port 783 remains in use until each of the spamd children die because spamd exits immedietly. This causes trouble when restarting spamd for rules_du_jour. I only ask because without --roundrobin this same problem doesn't exist.
That's an issue with the included redhat-rc-script.sh init script which was fixed (by using a PID file) about a month or so ago.
(In reply to comment #33) > That's an issue with the included redhat-rc-script.sh init script which was > fixed (by using a PID file) about a month or so ago. A little help on this? Where's the bug report with the solution or patched init script?
bug 4655 The patched init script would be in svn since it's been fixed.
"critical" is used for release blockers; pretty sure this isn't one of those
got the same problem here on Debian Linux perl 6.8.4
(In reply to comment #37) > got the same problem here on Debian Linux perl 6.8.4 Please attach a copy of your spamd log, complete with time stamps, around the time it happens. Also, can you do anything to reproduce this? Any observations about when it happens? Does it only happen under heavy load? How heavy?
(In reply to comment #38) > Please attach a copy of your spamd log, complete with time stamps, around the > time it happens. Sorry, I didn't read the bug thoroughly enough. --round-robin fixed the issue. By the way, we get significantly better performance with --round-robin too. With roughly the same load on the spamd servers, we can feed in a lot more from the MX before they start refusing connections due to high load. > Also, can you do anything to reproduce this? It were pretty much exactly the same symptoms as described above. also the log entry was the same. > Any observations about when it happens? Does it only happen under heavy load? > How heavy? Well, having the servers run at a load of about 4-6 you can pretty much wait for it to happen. On 15 servers you wouldn't wait much longer than 15 mins for one to crash. Chances are, within 4 hours 50% of them are gone. With --round-robin it didn't happen once :-) Thanks and greetings... Stephan
On 3 pretty loaded servers during prime spam crunching time, you could anticipate this to happen within an hour on one of them. After the first one goes, the second two are always close behind since they get more requests. --round-robin fixes it for me also (spamd uptimes typical of amount of time between SA releases).
lowering pri; --round-robin is a functioning workaround, sounds like
Joe Schaefer reports this in the ASF machine hermes' syslog, and spamd crashing "twice this week": > Jul 13 15:34:48 hermes spamd[28964]: prefork: ordered child to accept, > but child reported state '1' at > /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm > line 440.
Can someone provide me with the following (email me a link to download): * their spamd parameters * the complete (since spamd startup) spamd contents of a maillog demoing this (I'd like to see back to the initial launch of spamd or all of the children being killed off due to --max-conn-per-child once) If you can provide the same info for more than one occurrence that'd be great.
Using Debian with Perl v5.8.4 built for i386-linux-thread-multi. Same problem here on 3 machines. OPTIONS="--create-prefs --max-children 15 --helper-home-dir"
how frequently? can you provide logs?
(In reply to comment #45) > how frequently? can you provide logs? Same problem here: after upgrading from 3.0.2 to 3.1.5 (on Sep 22 2006) we had following two "die": ----------------------- Sep 30 04:47:17 noc2 spamd[15802]: prefork: ordered child to accept, but child reported state '1' at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 441. [...] Oct 4 00:32:14 noc spamd[21247]: prefork: ordered child to accept, but child reported state '1' at /usr/lib/perl5/site_perl/5.8 .5/Mail/SpamAssassin/SpamdForkScaling.pm line 441. ------------------------ Since the last restart (on Oct 6 12:13:32), SA is still running. 3.1.5 is running with "perl, v5.8.5 built for i386-linux-thread-multi" under linux "2.6.8.1-12mdksmp" With previous version (3.0.2) SA run fine with _no_ problem, since Mar 18 2005. Now, I'm trying the "--round-robin" workaround.
I've got an occurance of this today. Upgraded to 3.1.6 (Via FreeBSD ports around 09:20 am (CDT)), and spamd apparently died around 12:54:42 with this error. Maillog including both these timestamps is at: http://lists.acesthetics.com/~ler/maillog.spamd.gz -rw-r--r-- 1 ler ler 2551573 Oct 9 23:52 maillog.spamd.gz Let me know if you need anything else.
I had another occurance today. I have the log since midnight, and can supply the logs since the last one if you want. Seeing that nobody's downloaded the previous log, I'll hold off till I hear from someone.
damn, busted ;) Larry, if you can get a full log with --debug, that would be really useful...
Created attachment 3713 [details] attempted patch here's a patch to SpamdForkScaling.pm that may help. Basically, it'll catch the failure condition (I think!) and recover from it... I haven't been able to repro it, but based on simulation, this seems to recover quite well. if you try it out and spamd successfully recovers from the error condition (it'll still be logged), please comment here.
I really would hate to do that on this server as it's a client server. This server is rather busy. What could we supply in place of that? (It's been a 2 occurance thing since the upgrade to 3.1.6). It's been on a random time scale as well, which sux(tm). What else can I provide on a post-mortem basis?
I'm interested in another log, with debug enabled if you can. BTW, what version did you upgrade from?
I have, however, applied the patch, and will check over the next few days to see whether it handles it gracefully. Thanks, JM..... Daryl: I upgraded from 3.1.4 on this box.
Weird, you should have seen this happen with 3.1.4 too. Was anything else upgraded?
(In reply to comment #54) > Weird, you should have seen this happen with 3.1.4 too. Was anything else upgraded? Nope. Just adding load to it. What's also strange is, I have a number of other servers of similar ilk, and hadn't seen this insect till my first report on the list a couple of days ago.
Log from midnight (US/Central) till just after I did the restart: http://lists.acesthetics.com/~ler/maillog.spamd.20061011.gz -rw-r--r-- 1 ler ler 6027293 Oct 11 13:54 maillog.spamd.20061011.gz this is PRE-Patch. It also is WITHOUT --debug. Let me know what else I can do.
As an update, since applying the patch, I've not had an incident of this. I've also not seen the log message, so I don't know if something else changed (there have been no system changes), or what. Just an update.
> I've also not seen the log message... You probably already know this, but just in case you missed it, the log message has been slightly changed. Are you looking for the changed message?
I've been looking for "prefork: order", and that seems to still be the text.
Just wanted to add that I also saw this bug on one of my mailservers the other day. FreeBSD 6.0, SpamAssassin 3.1.0_3. I Tracked it down to this same error message in the log. Any official word on the proposed patch?
I still have NOT seen the message since applying the patch. So far spamd has stayed up just fine. So, I'm not sure what to say here :(
looks like the patch works: /var/log/maillog:Oct 30 02:37:44 lists spamd[15552]: prefork: ordered child 7268 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 02:42:54 lists spamd[15552]: prefork: ordered child 8306 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 02:43:01 lists spamd[15552]: prefork: ordered child 8319 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 06:31:57 lists spamd[15552]: prefork: ordered child 65448 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 07:33:39 lists spamd[15552]: prefork: ordered child 68949 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 07:33:54 lists spamd[15552]: prefork: ordered child 69086 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 07:39:08 lists spamd[15552]: prefork: ordered child 69246 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 08:46:38 lists spamd[15552]: prefork: ordered child 7350 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 08:51:48 lists spamd[15552]: prefork: ordered child 25608 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 08:51:56 lists spamd[15552]: prefork: ordered child 25656 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 09:43:18 lists spamd[15552]: prefork: ordered child 33718 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 10:16:11 lists spamd[15552]: prefork: ordered child 4479 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 11:32:56 lists spamd[15552]: prefork: ordered child 5382 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 11:38:05 lists spamd[15552]: prefork: ordered child 10114 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 11:38:11 lists spamd[15552]: prefork: ordered child 10159 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 12:17:26 lists spamd[15552]: prefork: ordered child 39870 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 12:22:19 lists spamd[15552]: prefork: ordered child 48921 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 12:22:29 lists spamd[15552]: prefork: ordered child 51038 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 12:22:35 lists spamd[15552]: prefork: ordered child 51144 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 13:00:45 lists spamd[15552]: prefork: ordered child 53163 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 13:05:56 lists spamd[15552]: prefork: ordered child 75100 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. /var/log/maillog:Oct 30 13:10:59 lists spamd[15552]: prefork: ordered child 83117 to accept, but they reported state '1', killing rogue at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 457. ^C$
that looks good alright.
FYI, I sent a PR in to the FreeBSD folks, and it's been committed with the fix as of last night (US/Central).
thanks Larry; setting to [review] status
I suspect that bug 5149 is caused by this fix. This one started appearing just after this commit. Please crosscheck.
+1 on 3713 (In reply to comment #66) > I suspect that bug 5149 is caused by this fix. This one started appearing just > after this commit. Please crosscheck. I doubt it, 3.1.4 (the version reported with the problem in bug 5149) was released 2.5 months before the patch was supplied.
sure. +1
[dos@FC5-VPC 3.1]$ svn ci -m "bug 4594: add workaround for the rare 'ordered child to accept, but they reported state '1'' bug" Sending lib/Mail/SpamAssassin/SpamdForkScaling.pm Transmitting file data . Committed revision 485847.