|
SA Bugzilla – Full Text Bug Listing |
Summary: | copy_config failure caused disk full -> "Attempt to free unreferenced scalar" | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Dallas Engelken <dallas> |
Component: | spamc/spamd | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED WORKSFORME | ||
Severity: | major | CC: | apache, spamassassin |
Priority: | P5 | ||
Version: | 3.1.0 | ||
Target Milestone: | Undefined | ||
Hardware: | Other | ||
OS: | other | ||
Whiteboard: |
Description
Dallas Engelken
2005-11-18 21:56:33 UTC
another one bits the dust after 'copy_config timeout' occurred..... 2005-11-23 11:16:03.973510500 [31352] info: spamd: processing message <OFA8CC8528.B5E84CFE-ON862570C2.005EC146-862570C2.005ED63A@worleywarehousing.com> for bgombert@DOMAIN.TLD:200 2005-11-23 11:16:17.895227500 [31478] error: child processing timeout at /usr/bin/spamd line 1085, <GEN2733> line 4032. 2005-11-23 11:16:18.743445500 [31478] warn: spamd: timeout: (60 second timeout while trying to PROCESS) at /usr/bin/spamd line 1678, <GEN2733> line 4032. 2005-11-23 11:16:30.069835500 [31478] error: __alarm__ 2005-11-23 11:16:31.661297500 [31478] warn: spamd: copy_config timeout, respawning child process after 23 messages at /usr/bin/spamd line 964. 2005-11-23 11:16:33.374891500 [31311] info: spamd: connection from localhost.localdomain [127.0.0.1] at port 56464 2005-11-23 11:16:34.073753500 [31648] error: child processing timeout at /usr/bin/spamd line 1085, <GEN2735> line 4034. 2005-11-23 11:16:34.074582500 [31648] warn: spamd: timeout: (60 second timeout while trying to PROCESS) at /usr/bin/spamd line 1678, <GEN2735> line 4034. 2005-11-23 11:16:41.301635500 [31352] error: __alarm__ 2005-11-23 11:16:44.275401500 [31311] error: __alarm__ 2005-11-23 11:16:44.321221500 [31352] error: __alarm__ 2005-11-23 11:16:44.569257500 [31311] warn: spamd: copy_config timeout, respawning child process after 25 messages at /usr/bin/spamd line 964. 2005-11-23 11:16:48.123209500 [31352] info: spamd: clean message (0.6/5.0) for bgombert@DOMAIN.TLD:200 in 49.2 seconds, 15092 bytes. 2005-11-23 11:16:48.123665500 [31352] info: spamd: result: . 0 - AWL,HTML_MESSAGE,NO_REAL_NAME scantime=49.2,size=15092,user=bgombert@DOMAIN.TLD,uid=200,required_score=5.0,rhost=localhost.localdomain,raddr=127.0.0.1,rport=56461,mid=<OFA8CC8528.B5E84CFE-ON862570C2.005EC146-862570C2.005ED63A@worleywarehousing.com>,autolearn=no 2005-11-23 11:16:48.448167500 [31648] info: spamd: connection from localhost.localdomain [127.0.0.1] at port 56473 2005-11-23 11:16:48.450129500 [31648] info: spamd: handle_user unable to find user: bgombert@DOMAIN.TLD 2005-11-23 11:16:48.511387500 [31478] warn: Attempt to free unreferenced scalar during global destruction. 2005-11-23 11:16:48.511391500 [31478] warn: Attempt to free unreferenced scalar during global destruction. 2005-11-23 11:16:48.511394500 [31478] warn: Attempt to free unreferenced scalar during global destruction. 2005-11-23 11:16:48.511396500 [31478] warn: Attempt to free unreferenced scalar during global destruction. 2005-11-23 11:16:48.511399500 [31478] warn: Attempt to free non-existent shared string during global destruction. 2005-11-23 11:16:48.511436500 [31478] warn: Attempt to free non-existent shared string during global destruction. 2005-11-23 11:16:48.511439500 [31478] warn: Attempt to free unreferenced scalar during global destruction. 2005-11-23 11:16:48.511441500 [31478] warn: Attempt to free non-existent shared string during global destruction. 2005-11-23 11:16:48.511444500 [31478] warn: Attempt to free non-existent shared string during global destruction. 2005-11-23 11:16:48.511497500 [31478] warn: Attempt to free non-existent shared string during global destruction. 2005-11-23 11:16:48.511499500 [31478] warn: Attempt to free non-existent shared string during global destruction. 2005-11-23 11:16:48.511502500 [31478] warn: Attempt to free unreferenced scalar during global destruction. 2005-11-23 11:16:48.511505500 [31478] warn: Attempt to free non-existent shared string during global destruction. 2005-11-23 11:16:48.511527500 [31478] warn: Attempt to free non-existent shared string during global destruction. 2005-11-23 11:16:48.511530500 [31478] warn: Attempt to free unreferenced scalar during global destruction. here is a warn that actually puts a line number to it.... first time i had seen that. @40000000438b36b405207a24 [3205] warn: Attempt to free unreferenced scalar at /usr/bin/spamd line 1085, <GEN136> line 2. okay.. this is semi-reproducable. starting spamd up like this # spamd -p 777 --timeout-child=1 -D and passing a message to it via # cat msg.txt | spamc -p 777 gets me... [27182] dbg: prefork: ordered 27204 to accept [27182] dbg: prefork: sysread(7) not ready, wait max 300 secs [27204] info: spamd: connection from localhost.localdomain [127.0.0.1] at port 37675 [27182] dbg: prefork: child 27204: entering state 2 [27182] dbg: prefork: new lowest idle kid: 27205 [27204] info: spamd: setuid to dallase succeeded [27204] error: child processing timeout at /usr/bin/spamd line 1085, <GEN7> line 2. [27204] warn: spamd: timeout: (1 second timeout while trying to PROCESS) at /usr/bin/spamd line 1678, <GEN7> line 2. [27204] dbg: config: copying current conf from backup [27182] dbg: prefork: child 27204: entering state 1 [27182] dbg: prefork: new lowest idle kid: 27204 [27182] dbg: prefork: child reports idle [27182] info: prefork: child states: II then, if i let the spamd engine sit a bit before i issue a SIGINT to it, i get an error on the last line, which talks about "during global destruction". [27204] dbg: prefork: sysread(8) not ready, wait max 300 secs [27205] dbg: prefork: periodic ping from spamd parent [27205] dbg: prefork: sysread(9) not ready, wait max 300 secs [27204] dbg: prefork: periodic ping from spamd parent [27204] dbg: prefork: sysread(8) not ready, wait max 300 secs [27204] dbg: prefork: periodic ping from spamd parent [27204] dbg: prefork: sysread(8) not ready, wait max 300 secs [27205] dbg: prefork: periodic ping from spamd parent [27205] dbg: prefork: sysread(9) not ready, wait max 300 secs [27205] dbg: prefork: periodic ping from spamd parent [27205] dbg: prefork: sysread(9) not ready, wait max 300 secs [27182] dbg: prefork: new lowest idle kid: 27204 [27182] info: spamd: handled cleanup of child pid 27205 due to SIGCHLD [27182] dbg: prefork: new lowest idle kid: none [27182] info: spamd: handled cleanup of child pid 27204 due to SIGCHLD [27182] info: spamd: server killed by SIGINT, shutting down [27182] error: tcp timeout at /usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/SpamdForkScaling.pm line 195 during global destruction. tcp timeout at /usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/SpamdForkScaling.pm line 195 during global destruction. i'm not sure if these are related, but i'd wager a guess that they are... seeing that in Comment #1, a --timeout-child=60 was reached prior to the system spinning out of control from the global destruction log entries. I hate to say it, but 'Attempt to free unreferenced scalar' and 'Attempt to free non-existent shared string during global destruction' errors are extremely bad news. They typically indicate bugs in either the perl interpreter itself, or in an XS module (ie. compiled C code). I'd suggest trying another perl interpreter version to avoid these, if possible. hmm.. strange. this was just a stock perl 5.6.1 on redhat 7.3, which meets the requirements. that said, i cant seem to reproduce that behavior on a 5.8.5 or 5.8.6 perl. hmm -- this sounds like the message on the users list today... (In reply to comment #6) > hmm -- this sounds like the message on the users list today... thread at http://www.nabble.com/strange-bug%2C-filling-syslog-t843486.html bug 4696 comment 31 From Dallas Engelken: 'i havent had the problem outlined in bug 4693 since applying this patch.. FWIW.' I'm not entirely sure this is closable just yet though -- what do you think Dallas? well, it probably wouldnt hurt to leave open for a bit. all my production boxes are currently set to override the $copy_config_p=1 setting in /usr/bin/spamd to just disable config copies since i load prefs via sql. on my svn installs that i test with i cant seem to reproduce this anymore with the patch from bug 4696 applied. "all my production boxes are currently set to override the $copy_config_p=1 setting in /usr/bin/spamd to just disable config copies since i load prefs via sql" Unless I'm forgetting something, it makes no difference if you're using SQL prefs or not. You'll still end up appending config options to multi-value config options and replace the current value of single value config options. Most noticable would be the leaking of white/black-list entries across users. well i guess leaking white/black entries is a heck of a lot better than DoSing the system due to a full disk. FWIW, running 3.1.1 or latest svn with $copy_config_p set back to 1 has caused this problem to resurface on rh7.3 boxes with perl 5.6.1. i've seen a handful of boxes in the last 2 days with full disks due to this. i've never seen it occur on fc3 with perl 5.8.5. i thought the recent updates related to bug 4696 may have helped, but it doesnt appear so. Patrick von der Hagen is the only other person i have seen report this issue... somehow i wish it was more widespread so i could stop banging my head on my desk and yelling WHY ME! ;) so, i'd say leave this open for a while until i debug it down far enough to pinpoint the issue. "Other than" the nasty warnings causing your log volume to go down in flames, does spamd continue to work OK? I'm assuming that copy_config doesn't time out on every message, right? You could avoid logging the errors. Obviously it wouldn't fix the problem, but at least you wouldn't get logged to death. Adding something like this at line 142 of Logger.pm would do it: return if ($level eq "warn" && $message[0] =~ /Attempt to free .* during global destruction/); well, i suppose it does, bug i'm just guessing because i dont hear anyone complain until their disk is full and their mail stops coming in. i've seen cases where daily logrotate will save it for a while.... in this case, it was happending daily for the last 8 days. -rw-rw-r-- 1 doublech doublech 1098107980 Mar 13 04:02 spamd.log.3.gz -rw-rw-r-- 1 doublech doublech 1142570815 Mar 12 04:01 spamd.log.4.gz -rw-rw-r-- 1 doublech doublech 1188325192 Mar 11 04:02 spamd.log.5.gz -rw-rw-r-- 1 doublech doublech 1231933772 Mar 10 04:03 spamd.log.6.gz -rw-rw-r-- 1 doublech doublech 1287468111 Mar 9 04:02 spamd.log.7.gz -rw-rw-r-- 1 doublech doublech 1327049798 Mar 8 04:02 spamd.log.8.gz -rw-rw-r-- 1 doublech doublech 1904957 Mar 7 03:11 spamd.log.9.gz -rw-r--r-- 1 doublech doublech 1294305 Mar 6 01:07 spamd.log.10.gz -rwxr-xr-x 1 doublech doublech 1151532 Mar 3 19:19 spamd.log.11.gz -rwxr-xr-x 1 doublech doublech 864265 Mar 1 22:15 spamd.log.12.gz -rwxr-xr-x 1 doublech doublech 863982 Feb 28 22:03 spamd.log.13.gz -rwxr-xr-x 1 doublech doublech 985953 Feb 27 21:00 spamd.log.14.gz shouldnt be hard to identify when this this an issue if i monitor disk usage. When its in this high speed logging loop, the CPU is also hitting pretty hard... so i'm not sure how much of that CPU is due to the actual log write, and how much is due to it being stuck in a loop. i can try adding that line to Logging.pm and see what happens. i'll post more if i can catch it in progress. ok, i caught one. looks like child pid 23618 is the one that went nuts.. child 6613 answered my spamc request just fine. i've done a little grep magic to weed out the info lines and show how the warns from 23618 intertwine with them... otherwise you'd be looking at literally hundreds of warn lines between each info line. [root@dcem spamassassin]# tail -f current | grep -A1 -B1 info | tai64nlocal 2006-03-15 12:53:17.467291500 [23618] warn: Attempt to free unreferenced scalar, <GEN2263> line 255 during global destruction. 2006-03-15 12:53:17.471020500 [6613] info: spamd: connection from localhost.localdomain [127.0.0.1] at port 53583 2006-03-15 12:53:17.496962500 [23618] warn: Attempt to free non-existent shared string, <GEN2263> line 255 during global destruction. 2006-03-15 12:53:17.497196500 [23618] warn: Attempt to free unreferenced scalar, <GEN2263> line 255 during global destruction. 2006-03-15 12:53:17.503950500 [6613] info: spamd: processing message (unknown) for root:200 2006-03-15 12:53:17.577813500 [23618] warn: Attempt to free non-existent shared string, <GEN2263> line 255 during global destruction. -- 2006-03-15 12:53:18.631218500 [6613] warn: pyzor: check failed: internal error 2006-03-15 12:53:18.658736500 [6613] info: spamd: clean message (1.8/5.0) for root:200 in 1.2 seconds, 7 bytes. 2006-03-15 12:53:18.659210500 [6613] info: spamd: result: . 1 - BAYES_40,MISSING_DATE,MISSING_SUBJECT,NO_RECEIVED,NO_RELAYS,TO_CC_NONE scantime=1.2,size=7,user=root,uid=200,required_score=5.0,rhost=localhost.localdo main,raddr=127.0.0.1,rport=53583,mid= (unknown),bayes=0.362223941701955,autolearn=no,urihits=none 2006-03-15 12:53:18.713748500 [23618] warn: Attempt to free unreferenced scalar, <GEN2263> line 255 during global destruction. Closing ancient stale bug. |