SA Bugzilla – Bug 6303
CLOSE_WAIT and defunct process problems
Last modified: 2021-05-27 11:03:46 UTC
Hi, I have identical bug describe into #6117. Only solution: reboot server. (Use opensuse 11.0 with all patch). Note: Tipically (2-3 days on week) the problem is present in the morning after backup (with tar command and load CPU with 2 average). Help Me, Thank's (sorry for my english). +++ This bug was initially created as a clone of Bug #6117 +++ HI, recently use spamassassin 3.2.5-26-7 in openSuse 11.0 with postfix 2.5.5-6.7 and amavisd 2.6.1. If my mail Server received high number of mail for minutes, after I have into my server the follow situation: 1) CPU hight load (view top command into %id cpu information) 2) very much socket in CLOSE_WAIT condition (with netstat -tpan) 3) two or more process (see after) defunct (??). This proces not close if send kill -9 <pid> command (and after ending spamd deamon). only solution (today) is reboot server. Example (process defunct): # ps aux | grep spam nobody 735 0.0 2.4 53680 49776 ? D 01:12 0:07 spamd child nobody 5136 0.0 2.4 53428 49496 ? D 04:22 0:07 spamd child Help me! Thank's.
> I have identical bug describe into #6117. Only solution: reboot server. > (Use opensuse 11.0 with all patch). Identical? If so, try answering first some questions asked in Bug 6117, and find some solutions posted there. In particular, for a defunct process there is nothing to kill, the process no longer exists anyway. To get rid of a defunc process entry, kill its parent, or find out why this parent process did not reclaim its child process exit status - no need to reboot the machine. > Note: Tipically (2-3 days on week) the problem is present in the morning after > backup (with tar command and load CPU with 2 average). You'd need to present some more evidence on what is happening, and make up your mind on whether you are running spamd or amavisd.
(In reply to comment #1) > > I have identical bug describe into #6117. Only solution: reboot server. > > (Use opensuse 11.0 with all patch). > > Identical? If so, try answering first some questions asked in Bug 6117, > and find some solutions posted there. In particular, for a defunct process > there is nothing to kill, the process no longer exists anyway. To get rid > of a defunc process entry, kill its parent, or find out why this parent > process did not reclaim its child process exit status - no need to reboot > the machine. > Hi, yes Identical! on my Server running follow program: postfix, spamassassin, amavisd (max 4 servers). view bottom for master.cf postfix. No parent process defunct found!! (or better, I not found :-( ). > > Note: Tipically (2-3 days on week) the problem is present in the morning after > > backup (with tar command and load CPU with 2 average). > > You'd need to present some more evidence on what is happening, and make up > your mind on whether you are running spamd or amavisd. The follow the situation this morning. =================================================== The CPU (firsts line top command, note %id percentage), but NO important program work (the backup tar terminate 2 hours ago): top - 07:08:37 up 23:41, 1 user, load average: 4.15, 4.11, 4.06 Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie Cpu(s): 8.4%us, 1.6%sy, 0.0%ni, 86.8%id, 3.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 2049296k total, 1486568k used, 562728k free, 382076k buffers Swap: 2096472k total, 116k used, 2096356k free, 490372k cached =================================================== netstat -tpan | grep CLOSE_WAIT tcp 1335 0 localhost:783 localhost:46687 CLOSE_WAIT - tcp 0 0 localhost:783 localhost:59465 CLOSE_WAIT 32045/spamd child tcp 18298 0 localhost:783 localhost:54335 CLOSE_WAIT - tcp 21079 0 localhost:783 localhost:40478 CLOSE_WAIT - tcp 18290 0 localhost:783 localhost:40475 CLOSE_WAIT - tcp 24523 0 localhost:783 localhost:40639 CLOSE_WAIT - tcp 13871 0 localhost:783 localhost:44688 CLOSE_WAIT - tcp 6196 0 localhost:783 localhost:44681 CLOSE_WAIT - tcp 21892 0 localhost:783 localhost:44711 CLOSE_WAIT - tcp 26659 0 localhost:783 localhost:52179 CLOSE_WAIT - tcp 1334 0 localhost:783 localhost:42391 CLOSE_WAIT - tcp 23908 0 localhost:783 localhost:40524 CLOSE_WAIT - tcp 1329 0 localhost:783 localhost:56698 CLOSE_WAIT - tcp 23502 0 localhost:783 localhost:56488 CLOSE_WAIT - tcp 4701 0 localhost:783 localhost:56590 CLOSE_WAIT - tcp 4492 0 localhost:783 localhost:56677 CLOSE_WAIT - tcp 24528 0 localhost:783 localhost:52178 CLOSE_WAIT - tcp 1334 0 localhost:783 localhost:41660 CLOSE_WAIT - tcp 24222 0 localhost:783 localhost:40481 CLOSE_WAIT - tcp 1335 0 localhost:783 localhost:60240 CLOSE_WAIT - tcp 1545 0 localhost:783 localhost:52213 CLOSE_WAIT - tcp 18821 0 localhost:783 localhost:57365 CLOSE_WAIT - tcp 1256 0 localhost:783 localhost:44719 CLOSE_WAIT - tcp 1256 0 localhost:783 localhost:45545 CLOSE_WAIT - tcp 19503 0 localhost:783 localhost:56674 CLOSE_WAIT - tcp 5496 0 localhost:783 localhost:60247 CLOSE_WAIT - tcp 24677 0 localhost:783 localhost:40649 CLOSE_WAIT - tcp 0 0 localhost:783 localhost:44699 CLOSE_WAIT 22982/spamd child tcp 17906 0 localhost:783 localhost:40472 CLOSE_WAIT - tcp 17982 0 localhost:783 localhost:40650 CLOSE_WAIT - tcp 20655 0 localhost:783 localhost:40525 CLOSE_WAIT - tcp 15550 0 localhost:783 localhost:38514 CLOSE_WAIT - tcp 0 0 localhost:783 localhost:42201 CLOSE_WAIT 32149/spamd child tcp 2355 0 localhost:783 localhost:41553 CLOSE_WAIT - tcp 13134 0 localhost:783 localhost:50605 CLOSE_WAIT - tcp 4369 0 localhost:783 localhost:39285 CLOSE_WAIT - tcp 60092 0 localhost:783 localhost:44706 CLOSE_WAIT - tcp 1335 0 localhost:783 localhost:57437 CLOSE_WAIT - tcp 3774 0 localhost:783 localhost:56687 CLOSE_WAIT - tcp 1167 0 localhost:783 localhost:52656 CLOSE_WAIT - tcp 4450 0 localhost:783 localhost:56680 CLOSE_WAIT - tcp 2877 0 localhost:783 localhost:54997 CLOSE_WAIT - tcp 1335 0 localhost:783 localhost:55007 CLOSE_WAIT - tcp 5689 0 localhost:783 localhost:50606 CLOSE_WAIT - tcp 1332 0 localhost:783 localhost:45548 CLOSE_WAIT - tcp 60111 0 localhost:783 localhost:44705 CLOSE_WAIT - tcp 17737 0 localhost:783 localhost:46682 CLOSE_WAIT - tcp 6113 0 localhost:783 localhost:54200 CLOSE_WAIT - tcp 6172 0 localhost:783 localhost:55004 CLOSE_WAIT - tcp 1334 0 localhost:783 localhost:52212 CLOSE_WAIT - tcp 3519 0 localhost:783 localhost:41603 CLOSE_WAIT - tcp 0 0 localhost:783 localhost:52339 CLOSE_WAIT 31515/spamd child tcp 3680 0 localhost:783 localhost:50609 CLOSE_WAIT - tcp 13904 0 localhost:783 localhost:52334 CLOSE_WAIT - tcp 32098 0 localhost:783 localhost:44716 CLOSE_WAIT - tcp 27052 0 localhost:783 localhost:40638 CLOSE_WAIT - tcp 17828 0 localhost:783 localhost:40470 CLOSE_WAIT - tcp 4065 0 localhost:783 localhost:44695 CLOSE_WAIT - tcp 7501 0 localhost:783 localhost:56705 CLOSE_WAIT - tcp 6182 0 localhost:783 localhost:57358 CLOSE_WAIT - tcp 5781 0 localhost:783 localhost:60229 CLOSE_WAIT - =================================================== ps aux | grep spam (maskerate the email with xxxx@xxxx.it, the tail line view are defuncts process): ostfix 3697 0.0 0.0 6332 1848 ? S 06:37 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} postfix 3992 0.0 0.0 6332 1844 ? S 06:45 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} postfix 4091 0.0 0.0 6332 1844 ? S 06:50 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} postfix 4192 0.0 0.1 9888 3648 ? S 06:55 0:00 smtpd -n 127.0.0.1:10025 -t inet -u -o content_filter spamassassin -o local_recipient_maps -o smtpd_client_restrictions -o smtpd_helo_restrictions -o smtpd_sender_restrictions -o smtpd_recipient_restrictions permit_mynetworks,reject -o mynetworks 127.0.0.0/8 -o strict_rfc821_envelopes yes -o smtpd_error_sleep_time 0 -o smtpd_soft_error_limit 1001 -o smtpd_hard_error_limit 1000 postfix 4195 0.0 0.0 6332 1816 ? S 06:55 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} nobody 4196 0.0 0.0 3056 724 ? Ss 06:55 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f delivery@xxxx.xxx.it nobody 4298 0.0 0.0 3056 724 ? Ss 06:55 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f xxxx@xxxx.it gxxxx@xxxx.it xxxx@xxxx.it xxxx@xxxx.it xxxx@xxxx.it postfix 4362 0.0 0.0 6332 1816 ? S 06:56 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} nobody 4363 0.0 0.0 3056 728 ? Ss 06:56 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx@yyyyy.it xxxx@xxxxx.it postfix 4386 0.0 0.0 6332 1820 ? S 06:58 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} nobody 4387 0.0 0.0 3056 740 ? Ss 06:58 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx@yyyyy.it xxxx.xxxxx@yyyyy.it postfix 4406 0.0 0.0 6332 1820 ? S 06:59 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} nobody 4407 0.0 0.0 3056 728 ? Ss 06:59 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx@yyyyy.it xxxx.xxxxx@yyyyy.it postfix 4480 0.0 0.1 9892 3632 ? S 07:00 0:00 smtpd -n 127.0.0.1:10025 -t inet -u -o content_filter spamassassin -o local_recipient_maps -o smtpd_client_restrictions -o smtpd_helo_restrictions -o smtpd_sender_restrictions -o smtpd_recipient_restrictions permit_mynetworks,reject -o mynetworks 127.0.0.0/8 -o strict_rfc821_envelopes yes -o smtpd_error_sleep_time 0 -o smtpd_soft_error_limit 1001 -o smtpd_hard_error_limit 1000 nobody 4483 0.0 0.0 3056 776 ? Ss 07:00 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ester.tattoli=xxxx.xxxxx@yyyyy.it xxxx@xxxx.it postfix 4484 0.0 0.0 6332 1820 ? S 07:00 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} nobody 4485 0.0 0.0 3056 776 ? Ss 07:00 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ester.tattoli=xxxx.xxxxx@yyyyy.it xxxx.xxxxx@yyyyy.it nobody 4497 0.0 0.0 3056 744 ? Ss 07:00 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f sentto-10929333-26372-1264485643-ik1odo=sxxxx@xxxx.it xxxx@xxxx.it postfix 4527 0.0 0.0 6332 1816 ? S 07:01 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} nobody 4528 0.0 0.0 3056 756 ? Ss 07:01 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx@yyyyy.it xxxx.xxxxx@yyyyy.it postfix 4530 0.0 0.0 6332 1820 ? S 07:01 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} nobody 4531 0.0 0.0 3056 728 ? Ss 07:01 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx@yyyyy.it bxxxx.xxxxx@yyyyy.it postfix 4593 0.0 0.0 6332 1816 ? S 07:02 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} nobody 4594 0.0 0.0 3056 724 ? Ss 07:02 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx@yyyyy.it xxxx.xxxxx@yyyyy.it postfix 4599 0.0 0.0 6332 1820 ? S 07:02 0:00 pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} nobody 4600 0.0 0.0 3056 728 ? Ss 07:02 0:00 /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx@yyyyy.it xxxx.xxxxx@yyyyy.it root 4625 0.0 0.0 3232 724 pts/0 R+ 07:03 0:00 grep spam root 6654 0.0 2.2 49456 45396 ? Ss Jan25 0:20 /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid nobody 22982 0.0 2.4 54096 50236 ? D Jan25 0:13 spamd child nobody 31515 0.0 2.4 53748 49880 ? D 04:10 0:06 spamd child nobody 32045 0.0 2.3 52124 48140 ? D 04:27 0:05 spamd child nobody 32149 0.0 2.3 52924 48968 ? D 04:31 0:02 spamd child ================================= master.cf (only line relative spam and amavis): #smtp inet n - n - - smtpd 127.0.0.1:smtp inet n - n - - smtpd ::1:smtp inet n - n - - smtpd 151.8.133.126:smtp inet n - n - - smtpd -o content_filter=smtp-amavis:[127.0.0.1]:10024 151.8.133.122:smtp inet n - n - - smtpd -o content_filter=smtp-amavis:[127.0.0.1]:10024 # remove standard line postfix smtp-amavis unix - - y - 2 smtp -o smtp_data_done_timeout=1200 -o disable_dns_lookups=yes 127.0.0.1:10025 inet n - n - - smtpd -o content_filter=spamassassin -o local_recipient_maps= -o smtpd_client_restrictions= -o smtpd_helo_restrictions= -o smtpd_sender_restrictions= -o smtpd_recipient_restrictions=permit_mynetworks,reject -o mynetworks=127.0.0.0/8 -o strict_rfc821_envelopes=yes -o smtpd_error_sleep_time=0 -o smtpd_soft_error_limit=1001 -o smtpd_hard_error_limit=1000 spamassassin unix - n n - - pipe user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} Thank's, thank's, thank's!
So you are indeed running both amavisd as well as spamc/spamd. Your postfix feeds mail first to amavisd, which returns it to postfix on port 10025, which then spawns spamc and feeds mail to it through a pipe, which in turn transfers it to spamd on port 783, and then spamc (based on the result) pipes the message to a mail submission program 'sendmail', which stores the message into a maildrop queue, to be picked up by a postfix pickup daemon for further delivery. Ugh, doable, but quite complicated and not very efficient. Since you are already running amavisd-new, why do you not let it call SpamAssassin directly, and save yourself and your mailer the trouble of dealing with two content filters? Anyway, back to the reported problem. There are no defunct (zombie) processes on your system according to the output of top(1). There are indeed lots of open TCP sessions to localhost port 783 in a CLOSE_WAIT state. Unfortunately you have not provided a full list of processes as reported by ps(1). According to the port number and the CLOSE_WAIT state I can assume each of these correspond to an existing spamd child process, where its spamc client has long gone, but for some reason spamd failed to close its end of the socket. This can be confirmed by a lsof utility and ps. It would be interesting to know what ps reports on a state of these spamd child processes. The next step would be to run spamd with debugging enabled, and when the situation reoccurs, see what were the last logged entries of each of the hung processes, and to what event on the system these correspond (nfs trouble? disk down? network outage? backup? running out of swap space?).
(In reply to comment #3) > So you are indeed running both amavisd as well as spamc/spamd. > > Your postfix feeds mail first to amavisd, which returns it to postfix > on port 10025, which then spawns spamc and feeds mail to it through > a pipe, which in turn transfers it to spamd on port 783, and then spamc > (based on the result) pipes the message to a mail submission program > 'sendmail', which stores the message into a maildrop queue, > to be picked up by a postfix pickup daemon for further delivery. > Ugh, doable, but quite complicated and not very efficient. > > Since you are already running amavisd-new, why do you not let it call > SpamAssassin directly, and save yourself and your mailer the trouble > of dealing with two content filters? > Ops... if you know the best configuration, welcome! Can you get my one or two example (or link for this example)?? > Anyway, back to the reported problem. There are no defunct (zombie) > processes on your system according to the output of top(1). > True! But is sure that the idle CPU are high for this problems (I don't know if is for spamd, amavis or other....) this morning after same problems, I have search parent (PPID) of spamd defunct process. The parent was /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid . I stop the deamon and after, the same defunct process after change parent in the PPID 1 (init). Is correct?? > There are indeed lots of open TCP sessions to localhost port 783 > in a CLOSE_WAIT state. Unfortunately you have not provided a full > list of processes as reported by ps(1). According to the port number > and the CLOSE_WAIT state I can assume each of these correspond to an > existing spamd child process, where its spamc client has long gone, > but for some reason spamd failed to close its end of the socket. > I'm sorry, i have restart server after read this part. one or two day (next problems) and I test with lsof the process join this close_wait > This can be confirmed by a lsof utility and ps. It would be interesting > to know what ps reports on a state of these spamd child processes. > The next step would be to run spamd with debugging enabled, and when > the situation reoccurs, see what were the last logged entries of > each of the hung processes, and to what event on the system these > correspond (nfs trouble? disk down? network outage? backup? running > out of swap space?). The system are openSUSE 11.1 (i586) with amavis, spamassassin etc standard (with yast utility and NOT force manual version!). The volume work into drbd disk for fault-tollerance. The only events join this problems is backup but are not sure! The backup is a tar command and mysql-dump and the cpu go tu 2 or 3. Thank's.
(In reply to comment #4) > (In reply to comment #3) > > So you are indeed running both amavisd as well as spamc/spamd. > > > > Your postfix feeds mail first to amavisd, which returns it to postfix > > on port 10025, which then spawns spamc and feeds mail to it through > > a pipe, which in turn transfers it to spamd on port 783, and then spamc > > (based on the result) pipes the message to a mail submission program > > 'sendmail', which stores the message into a maildrop queue, > > to be picked up by a postfix pickup daemon for further delivery. > > Ugh, doable, but quite complicated and not very efficient. > > > > Since you are already running amavisd-new, why do you not let it call > > SpamAssassin directly, and save yourself and your mailer the trouble > > of dealing with two content filters? > > > Ops... if you know the best configuration, welcome! Can you get my one or two > example (or link for this example)?? > > > > Anyway, back to the reported problem. There are no defunct (zombie) > > processes on your system according to the output of top(1). > > > > True! But is sure that the idle CPU are high for this problems (I don't know if > is for spamd, amavis or other....) > > this morning after same problems, I have search parent (PPID) of spamd defunct > process. The parent was /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid . > I stop the deamon and after, the same defunct process after change parent in > the PPID 1 (init). Is correct?? > > > > There are indeed lots of open TCP sessions to localhost port 783 > > in a CLOSE_WAIT state. Unfortunately you have not provided a full > > list of processes as reported by ps(1). According to the port number > > and the CLOSE_WAIT state I can assume each of these correspond to an > > existing spamd child process, where its spamc client has long gone, > > but for some reason spamd failed to close its end of the socket. > > > > I'm sorry, i have restart server after read this part. one or two day (next > problems) and I test with lsof the process join this close_wait > > > This can be confirmed by a lsof utility and ps. It would be interesting > > to know what ps reports on a state of these spamd child processes. > > The next step would be to run spamd with debugging enabled, and when > > the situation reoccurs, see what were the last logged entries of > > each of the hung processes, and to what event on the system these > > correspond (nfs trouble? disk down? network outage? backup? running > > out of swap space?). > > The system are openSUSE 11.1 (i586) with amavis, spamassassin etc standard > (with yast utility and NOT force manual version!). > The volume work into drbd disk for fault-tollerance. > > The only events join this problems is backup but are not sure! The backup is a > tar command and mysql-dump and the cpu go tu 2 or 3. > > Thank's. News for this problem? Thank's
> > Since you are already running amavisd-new, why do you not let it call > > SpamAssassin directly, and save yourself and your mailer the trouble > > of dealing with two content filters? > > Ops... if you know the best configuration, welcome! Can you get me > one or two example (or link for this example)?? Unless you explicitly disable spam checking in amavisd.conf (by @bypass_spam_checks_maps or its derivatives), amavisd is calling SpamAssassin by default. You may need to adjust score thresholds and what should happen to spam, but that is basically all. Something like: @bypass_spam_checks_maps = (); $sa_tag2_level_deflt = 5.0; # labels passed mail as spam $sa_kill_level_deflt = 8.5; # blocks & quarantines at this level $sa_spam_subject_tag = '***SPAM*** '; $final_spam_destiny = D_DISCARD; $spam_quarantine_to = 'spam-quarantine'; $spam_quarantine_method = 'local:spam-%m.gz'; or to just label spam but deliver anyway: $final_spam_destiny = D_PASS; > > Anyway, back to the reported problem. There are no defunct (zombie) > > processes on your system according to the output of top(1). 86.8%id - idle is good, host is only lightly loaded. > this morning after same problems, I have search parent (PPID) of spamd defunct > process. The parent was /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid . > I stop the deamon and after, the same defunct process after change parent in > the PPID 1 (init). Is correct?? Yes, this is correct behaviour. It is a function of the init process to garbage collect all orphaned process entries and clear them. After a short while the reparented defunct entries should be removed by the init process. So that spamd parent process which you had to kill was the culprit. It failed to collect exit statuses of (i.e. to ripe) its child processes which had already terminated some time ago. Don't know why this would happen. It was stuck for some reason, or lost track of its child processes. Was this process dormant when you killed it, or was it spinning CPU? Was it still able to accept connections and process them? Running it with debugging enabled and examining log from such process when a problem reoccurs might shed some light. > > There are indeed lots of open TCP sessions to localhost port 783 > > in a CLOSE_WAIT state. Unfortunately you have not provided a full > > list of processes as reported by ps(1). According to the port number > > and the CLOSE_WAIT state I can assume each of these correspond to an > > existing spamd child process, where its spamc client has long gone, > > but for some reason spamd failed to close its end of the socket. > > I'm sorry, i have restart server after read this part. one or two day (next > problems) and I test with lsof the process join this close_wait > The only events join this problems is backup but are not sure! The backup is a > tar command and mysql-dump and the cpu go tu 2 or 3. For some reason that parent spamd process which you had to kill got stuck when you are running a backup. Now that you mention a mysql dump, perhaps this causes bayes/awl tables to be locked during a backup, thus blocking spamd operations.
(In reply to comment #6) > > > Since you are already running amavisd-new, why do you not let it call > > > SpamAssassin directly, and save yourself and your mailer the trouble > > > of dealing with two content filters? > > > > Ops... if you know the best configuration, welcome! Can you get me > > one or two example (or link for this example)?? > > Unless you explicitly disable spam checking in amavisd.conf (by > @bypass_spam_checks_maps or its derivatives), amavisd is calling > SpamAssassin by default. You may need to adjust score thresholds > and what should happen to spam, but that is basically all. > > Something like: > @bypass_spam_checks_maps = (); > $sa_tag2_level_deflt = 5.0; # labels passed mail as spam > $sa_kill_level_deflt = 8.5; # blocks & quarantines at this level > $sa_spam_subject_tag = '***SPAM*** '; > $final_spam_destiny = D_DISCARD; > $spam_quarantine_to = 'spam-quarantine'; > $spam_quarantine_method = 'local:spam-%m.gz'; > > or to just label spam but deliver anyway: > $final_spam_destiny = D_PASS; > it's alsoit is already thus in my amavis.conf it is present this row: @bypass_spam_checks_maps = (1); and, if i restart amavis, into /var/log/mail it is present this log: Feb 10 18:50:24 xxxx amavis[24757]: ANTI-SPAM code NOT loaded Feb 10 18:50:24 xxxx amavis[24757]: ANTI-SPAM-SA code NOT loaded > > > > Anyway, back to the reported problem. There are no defunct (zombie) > > > processes on your system according to the output of top(1). > > 86.8%id - idle is good, host is only lightly loaded. > yes, but all mail is delivery after many minutes (and not second as usual). I think's burn for this problem (CPU or spamd defuct... I don't know). > > this morning after same problems, I have search parent (PPID) of spamd defunct > > process. The parent was /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid . > > I stop the deamon and after, the same defunct process after change parent in > > the PPID 1 (init). Is correct?? > > Yes, this is correct behaviour. It is a function of the init process to > garbage collect all orphaned process entries and clear them. After a short > while the reparented defunct entries should be removed by the init process. > > So that spamd parent process which you had to kill was the culprit. > It failed to collect exit statuses of (i.e. to ripe) its child processes > which had already terminated some time ago. > > Don't know why this would happen. It was stuck for some reason, or lost track > of its child processes. Was this process dormant when you killed it, or was > it spinning CPU? Was it still able to accept connections and process them? > Running it with debugging enabled and examining log from such process > when a problem reoccurs might shed some light. > The next incident, verify if idle process are on defunct process. But after killed, the CPU averange not change. I don't know if it still also accept connection. How I verify? :-( .... help me for activate, read and send you this debugging. Excuse me, but not know debug procedure. (or done me sample howto link). > > > > There are indeed lots of open TCP sessions to localhost port 783 > > > in a CLOSE_WAIT state. Unfortunately you have not provided a full > > > list of processes as reported by ps(1). According to the port number > > > and the CLOSE_WAIT state I can assume each of these correspond to an > > > existing spamd child process, where its spamc client has long gone, > > > but for some reason spamd failed to close its end of the socket. > > > > I'm sorry, i have restart server after read this part. one or two day (next > > problems) and I test with lsof the process join this close_wait > > > > The only events join this problems is backup but are not sure! The backup is a > > tar command and mysql-dump and the cpu go tu 2 or 3. > > For some reason that parent spamd process which you had to kill got stuck > when you are running a backup. Now that you mention a mysql dump, perhaps > this causes bayes/awl tables to be locked during a backup, thus blocking > spamd operations. No. postfix use mysql table but not spamassasin. If is this the question. For now, thank's for this Comment's.
> > @bypass_spam_checks_maps = (); > it's also it is already thus > in my amavis.conf it is present this row: > @bypass_spam_checks_maps = (1); > > and, if i restart amavis, into /var/log/mail it is present this log: > Feb 10 18:50:24 xxxx amavis[24757]: ANTI-SPAM code NOT loaded > Feb 10 18:50:24 xxxx amavis[24757]: ANTI-SPAM-SA code NOT loaded Yes, that's what I'm saying, you have explicitly disabled spam scanning for all recipients with your setting. Remove that 1 in the list if this is not desired, just use: @bypass_spam_checks_maps = (); > > Don't know why this would happen. It was stuck for some reason, or lost track > > of its child processes. Was this process dormant when you killed it, or was > > it spinning CPU? Was it still able to accept connections and process them? > > Running it with debugging enabled and examining log from such process > > when a problem reoccurs might shed some light. > > The next incident, verify if idle process are on defunct process. But after > killed, the CPU averange not change. > > I don't know if it still also accept connection. How I verify? $ telnet localhost 783 or using spamc.
Hi, this morning same incident. Response your question and adding (I hope) other information. Yes, telnet response (but response the defunct process?? on :783 Listen are only 4 defunct process!). One defunct process have follow resource open (with lsof -p <defunct spamd process>. you it is useful? COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME spamd 18202 nobody cwd DIR 8,1 568 2 / spamd 18202 nobody rtd DIR 8,1 568 2 / spamd 18202 nobody txt REG 8,1 2469696 27221 /usr/bin/perl spamd 18202 nobody DEL REG 8,1 226286 /var/run/nscd/dbxTyl6i spamd 18202 nobody mem REG 8,1 1264076 10574 /usr/lib/libdb-4.5.so spamd 18202 nobody mem REG 8,1 128676 8467 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/DB_File/DB_File.so spamd 18202 nobody mem REG 8,1 26168 9740 /usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/Digest/SHA1/SHA1.so spamd 18202 nobody mem REG 8,1 38688 668 /lib/libnss_nis-2.9.so spamd 18202 nobody mem REG 8,1 30676 664 /lib/libnss_compat-2.9.so spamd 18202 nobody mem REG 8,1 217016 223321 /var/run/nscd/passwd spamd 18202 nobody mem REG 8,1 42748 666 /lib/libnss_files-2.9.so spamd 18202 nobody mem REG 8,1 26192 8554 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Sys/Syslog/Syslog.so spamd 18202 nobody mem REG 8,1 50908 8286 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/List/Util/Util.so spamd 18202 nobody mem REG 8,1 75744 9767 /usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/HTML/Parser/Parser.so spamd 18202 nobody mem REG 8,1 34908 672 /lib/librt-2.9.so spamd 18202 nobody mem REG 8,1 17956 8268 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Cwd/Cwd.so spamd 18202 nobody mem REG 8,1 13804 66838 /usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/Net/DNS/DNS.so spamd 18202 nobody mem REG 8,1 30328 8558 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Time/HiRes/HiRes.so spamd 18202 nobody mem REG 8,1 22124 8282 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/File/Glob/Glob.so spamd 18202 nobody mem REG 8,1 26096 8506 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/MIME/Base64/Base64.so spamd 18202 nobody mem REG 8,1 174200 8289 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/POSIX/POSIX.so spamd 18202 nobody mem REG 8,1 17964 8280 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Fcntl/Fcntl.so spamd 18202 nobody mem REG 8,1 9704 8552 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Sys/Hostname/Hostname.so spamd 18202 nobody mem REG 8,1 26188 8284 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/IO/IO.so spamd 18202 nobody mem REG 8,1 1419604 658 /lib/libc-2.9.so spamd 18202 nobody mem REG 8,1 119873 7577 /lib/libpthread-2.9.so spamd 18202 nobody mem REG 8,1 9928 674 /lib/libutil-2.9.so spamd 18202 nobody mem REG 8,1 59148 660 /lib/libcrypt-2.9.so spamd 18202 nobody mem REG 8,1 161824 662 /lib/libm-2.9.so spamd 18202 nobody mem REG 8,1 14012 661 /lib/libdl-2.9.so spamd 18202 nobody mem REG 8,1 88044 7569 /lib/libnsl-2.9.so spamd 18202 nobody mem REG 8,1 26220 8458 /usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Socket/Socket.so spamd 18202 nobody mem REG 8,1 125888 6788 /lib/ld-2.9.so spamd 18202 nobody 0r CHR 1,3 0t0 1665 /dev/null spamd 18202 nobody 1w CHR 1,3 0t0 1665 /dev/null spamd 18202 nobody 2w CHR 1,3 0t0 1665 /dev/null spamd 18202 nobody 3r REG 8,1 102279 68141 /usr/sbin/spamd spamd 18202 nobody 4u unix 0xf51a3780 0t0 4006047 socket spamd 18202 nobody 5u IPv4 9837 0t0 TCP localhost:783 (LISTEN) spamd 18202 nobody 6r REG 8,1 4374 67704 /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/Plugin/VBounce.pm spamd 18202 nobody 7u unix 0xf4420700 0t0 3849425 socket spamd 18202 nobody 8u unix 0xf51a3380 0t0 4006048 socket spamd 18202 nobody 9u IPv4 4100385 0t0 TCP localhost:783->localhost:49099 (CLOSE_WAIT) spamd 18202 nobody 10u unix 0xf4421680 0t0 4009730 socket spamd 18202 nobody 11u REG 8,1 21491712 70249 /var/lib/nobody/.spamassassin/auto-whitelist spamd 18202 nobody 13u IPv4 4014177 0t0 UDP myserver.it:35363->ns.interbusiness.it:domain After, if execute the command "lsof | grep :783" I have 2 tipology of process: First, in state "LISTEN" ONLY Defunct Process? (But, is correctly that ONLY defuct process and this response with telnet command??) other, in FIN_WAIT1 status, all the follow command: /usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f MAILER-DAEMON xxxx@yyyy.it zzzzzz@kkkkk.it Question? /usr/bin/spamc is called from "pipe -n spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient}" correct? Other Information: Yes, the spamd defunct process are top five process for CPU utilization (the first, but is correct, is Mysql for other job). Finally: if remove "1" into @bypass_spam_checks_maps = (1);...after Amavis restart ANTI-SPAM CODE il loaded. Is not correct, true? Good investigation!
Have you suggestions for me? Thank's!
(In reply to comment #10) > Have you suggestions for me? > Thank's! Hi! the problems is also present! New information: I have change configuration, remove the content_filter into postfix master.cf: ... 127.0.0.1:10025 inet n - n - - smtpd -o content_filter=spamassassin <===== REMOVE THIS -o local_recipient_maps= -o smtpd_client_restrictions= ... and enable spamassasin into amavis.conf: @bypass_spam_checks_maps = (1); <==== REMOVE THIS. Result: two days ok...and after... al Server Mail defunct because amavis defunct process present. Now return in the old configuration. Suggestion??
moving all open 3.3.1 bugs to 3.3.2
Moving back off of Security, which got changed by accident during the mass Target Milestone move.
Hi, I have the similiar problem. I am running SA 3.3.1 on FreeBSD 8 amd64 Some of the child process use %100 CPU. When I check the process activity with truss -p PID, the result is absolutely nothing. lsof displays CLOSED or CLOSE_WAIT status connections. Here is an example lsof output from a hang spamd child. If I restart spamd or kill this process. the cpu load decreases. There is no error in the log files. # lsof -p 43021 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME perl 43021 mail rtd VDIR 0,90 512 2 / perl 43021 mail txt VREG 0,92 7152 213112 /usr/local/bin/perl5.10.1 perl 43021 mail txt VREG 0,90 246776 800769 /libexec/ld-elf.so.1 perl 43021 mail txt VREG 0,92 1636236 237075 /usr/local/lib/perl5/5.10.1/mach/CORE/libperl.so perl 43021 mail txt VREG 0,90 154320 471045 /lib/libm.so.5 perl 43021 mail txt VREG 0,90 33792 471043 /lib/libcrypt.so.5 perl 43021 mail txt VREG 0,90 64856 471050 /lib/libutil.so.8 perl 43021 mail txt VREG 0,90 1295416 471042 /lib/libc.so.7 perl 43021 mail txt VREG 0,92 29907 237545 /usr/local/lib/perl5/5.10.1/mach/auto/Socket/Socket.so perl 43021 mail txt VREG 0,92 24660 237345 /usr/local/lib/perl5/5.10.1/mach/auto/IO/IO.so perl 43021 mail txt VREG 0,92 28857 285378 /usr/local/lib/perl5/site_perl/5.10.1/mach/auto/Socket6/Socket6.so perl 43021 mail txt VREG 0,92 21204 237327 /usr/local/lib/perl5/5.10.1/mach/auto/Fcntl/Fcntl.so perl 43021 mail txt VREG 0,92 122478 237364 /usr/local/lib/perl5/5.10.1/mach/auto/POSIX/POSIX.so perl 43021 mail txt VREG 0,92 29249 354309 /usr/local/lib/perl5/site_perl/5.10.1/mach/auto/Time/HiRes/HiRes.so perl 43021 mail txt VREG 0,92 11255 237575 /usr/local/lib/perl5/5.10.1/mach/auto/Sys/Hostname/Hostname.so perl 43021 mail txt VREG 0,92 19446 237354 /usr/local/lib/perl5/5.10.1/mach/auto/MIME/Base64/Base64.so perl 43021 mail txt VREG 0,92 28745 237330 /usr/local/lib/perl5/5.10.1/mach/auto/File/Glob/Glob.so perl 43021 mail txt VREG 0,92 34634 401718 /usr/local/lib/perl5/site_perl/5.10.1/mach/auto/NetAddr/IP/Util/Util.so perl 43021 mail txt VREG 0,92 69083 401705 /usr/local/lib/perl5/site_perl/5.10.1/mach/auto/HTML/Parser/Parser.so perl 43021 mail txt VREG 0,92 12408 285467 /usr/local/lib/perl5/site_perl/5.10.1/mach/auto/Net/DNS/DNS.so perl 43021 mail txt VREG 0,92 42618 237290 /usr/local/lib/perl5/5.10.1/mach/auto/Data/Dumper/Dumper.so perl 43021 mail txt VREG 0,92 15190 237284 /usr/local/lib/perl5/5.10.1/mach/auto/Cwd/Cwd.so perl 43021 mail txt VREG 0,92 35205 237351 /usr/local/lib/perl5/5.10.1/mach/auto/List/Util/Util.so perl 43021 mail txt VREG 0,92 56047 237302 /usr/local/lib/perl5/5.10.1/mach/auto/Digest/SHA/SHA.so perl 43021 mail txt VREG 0,92 51901 237286 /usr/local/lib/perl5/5.10.1/mach/auto/DB_File/DB_File.so perl 43021 mail txt VREG 0,92 28186 285392 /usr/local/lib/perl5/site_perl/5.10.1/mach/auto/Digest/SHA1/SHA1.so perl 43021 mail txt VREG 0,92 22327 237578 /usr/local/lib/perl5/5.10.1/mach/auto/Sys/Syslog/Syslog.so perl 43021 mail txt VREG 0,92 25741 285502 /usr/local/lib/perl5/site_perl/5.10.1/mach/auto/Razor2/Preproc/deHTMLxs/deHTMLxs.so perl 43021 mail txt VREG 0,90 2498211 424313 /var/db/spamassassin/compiled/5.010/3.003001/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.so perl 43021 mail txt VREG 0,90 11641 424585 /var/db/spamassassin/compiled/5.010/3.003001/auto/Mail/SpamAssassin/CompiledRegexps/body_500/body_500.so perl 43021 mail txt VREG 0,92 39291 237543 /usr/local/lib/perl5/5.10.1/mach/auto/SDBM_File/SDBM_File.so perl 43021 mail 0r VCHR 0,29 0t0 29 /dev/null perl 43021 mail 1u PIPE 0xffffff000e53f9e0 0 ->0xffffff000e53f888 perl 43021 mail 2u PIPE 0xffffff000e53f9e0 0 ->0xffffff000e53f888 perl 43021 mail 3r VREG 0,92 108282 216586 /usr/local/bin/spamd perl 43021 mail 4u PIPE 0xffffff000e53f9e0 0 ->0xffffff000e53f888 perl 43021 mail 5u IPv4 0xffffff01672cc000 0t0 TCP localhost.localdomain:783 (LISTEN) perl 43021 mail 6u unix 0xffffff01677ab000 0t0 ->(none) perl 43021 mail 7u unix 0xffffff000ec58d48 0t0 ->(none) perl 43021 mail 8u unix 0xffffff01b6f5e550 0t0 ->(none) perl 43021 mail 9u unix 0xffffff033f787550 0t0 ->(none) perl 43021 mail 10u unix 0xffffff02a97afaa0 0t0 ->0xffffff00644d9aa0 perl 43021 mail 11u unix 0xffffff03f2f43aa0 0t0 ->0xffffff04112f6550 perl 43021 mail 12u unix 0xffffff0050680550 0t0 ->(none) perl 43021 mail 13u unix 0xffffff0130a8f550 0t0 ->(none) perl 43021 mail 14u unix 0xffffff0355bd9550 0t0 ->(none) perl 43021 mail 15u unix 0xffffff032dbf82a8 0t0 ->(none) perl 43021 mail 16u unix 0xffffff02997b67f8 0t0 ->(none) perl 43021 mail 17u unix 0xffffff0079b8c7f8 0t0 ->(none) perl 43021 mail 18u unix 0xffffff03cb9f82a8 0t0 ->0xffffff0429b127f8 perl 43021 mail 19u unix 0xffffff0156bb8aa0 0t0 ->0xffffff0355f3d7f8 perl 43021 mail 20u unix 0xffffff034cd487f8 0t0 ->0xffffff02a97ac000 perl 43021 mail 21u unix 0xffffff02b5d00d48 0t0 ->0xffffff02b2641d48 perl 43021 mail 22u IPv4 0xffffff02b2bec000 0t0 TCP localhost.localdomain:783->localhost.localdomain:52662 (CLOSED) perl 43021 mail 23u unix 0xffffff03bf97e550 0t0 ->(none) perl 43021 mail 24u unix 0xffffff0219707d48 0t0 ->(none) perl 43021 mail 25u unix 0xffffff0187208000 0t0 ->(none) perl 43021 mail 26u unix 0xffffff03140c87f8 0t0 ->(none) perl 43021 mail 27u unix 0xffffff03bfdfb2a8 0t0 ->(none) perl 43021 mail 28u unix 0xffffff03a45e7000 0t0 ->(none) perl 43021 mail 29u unix 0xffffff02202a2d48 0t0 ->0xffffff03bf4727f8 perl 43021 mail 30u unix 0xffffff0050680d48 0t0 ->0xffffff024286e550 perl 43021 mail 31u unix 0xffffff006ec9caa0 0t0 ->0xffffff0130d02d48 perl 43021 mail 32u unix 0xffffff03ca9b0d48 0t0 ->0xffffff0355f4f550 perl 43021 mail 33u unix 0xffffff03140ee7f8 0t0 ->0xffffff034cd49d48 perl 43021 mail 34u unix 0xffffff015cada000 0t0 ->(none) perl 43021 mail 35u unix 0xffffff0220686550 0t0 ->(none) perl 43021 mail 36u unix 0xffffff0109e0c7f8 0t0 ->0xffffff0220209aa0 perl 43021 mail 42u IPv4 0xffffff000e085140 0t0 UDP mail.mydomain.com:30598->dnsserver.mydomain.com:domain
Moving all open bugs where target is defined and 3.4.0 or lower to 3.4.1 target
This issue is being closed as worksforme with no recent reports and the issue being reported with a very old version of 3.2.5 except for the final comment which is more of a similar issue hijack.
127.0.0.1:10025 inet n - n - - smtpd -o content_filter=spamassassin -o local_recipient_maps= -o smtpd_client_restrictions= -o smtpd_helo_restrictions= -o smtpd_sender_restrictions= https://goo.gl/rbGaQn -o smtpd_recipient_restrictions=permit_mynetworks,reject -o mynetworks=127.0.0.0/8 -o strict_rfc821_envelopes=yes -o smtpd_error_sleep_time=0 -o smtpd_soft_error_limit=1001 -o smtpd_hard_error_limit=1000
CVE-2018-11780[0]: potential remote code execution bug with the PDFInfo plugin It is fixed in new upstream version 3.4.2. If you fix the vulnerability please also make sure to include the CVE (Common Vulnerabilities & Exposures) id in your changelog entry. For further information see: [0] https://security-tracker.debian.org/tracker/CVE-2018-11780 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-11780 [1] https://www.openwall.com/lists/oss-security/2018/09/16/1 [2] http://bit.ly/2J3erCO Please adjust the affected versions in the BTS as needed.