SA Bugzilla – Bug 2578
Spamd closes connection too early
Last modified: 2003-10-10 23:05:21 UTC
I'm running ||/ Name Version Description +++-==============-==============-============================================ ii spamassassin 2.60-1 Perl-based spam filter using text analysis on Debian 3.0. We do not use spamc, but our own software to access spamd using the spamc protocol. Unfortunately, spamd seems to close the connection too early. About one out of 3-5 connection is terminated too early. I did an tcpdump of the connections, that I will attach to this bug. It shows that the 'normal' connections answer with: SPAMD/1.1 0 EX_OK Spam: True ; 15.0 / 5.0 While the failing connections only do SPAMD/1.1 0 EX_OK and then close the connection. Both lines are sent in separate TCP/IP-packages, probably due to 'client->autoflush(1)'. I had a look at the source, but did not find where the connection is closed. IMHO, this looks like a race condition where our code is simply to slow to read fast enough, so that the close is done before the rest of the buffer has been sent.
Created attachment 1477 [details] TCPDUMP dump of 2 correct and one failed connection (the last) Open this with ethereal, select the first package and do 'Tools - Follow TCP Stream'. This is what it should look like (scroll to the end to see the response of spamd). Now close the window with the stream and click on 'Reset' in the status bar of ethereal to see all packages again. Then select the last package (time 1548.xx) and follow TCP Stream again. This connection failed, closing the connection before sending the rating.
I just changed the socket-close code to: if (defined $client) { logmsg "socket was open"; $client->close; } # paranoia vs fd leaks Which gives this log: Oct 10 16:02:48 mail spamd[10733]: connection from mail.rk4.huwig.de [212.88.142.131] at port 51491 Oct 10 16:02:48 mail spamd[10733]: socket was open Oct 10 16:02:48 mail spamd[11626]: Still running as root: user not specified with -u, not found, or set to root. Fall back to nobody. Oct 10 16:02:48 mail spamd[11626]: checking message <1065793730.2516.15.camel@localhost.localdomain> for (unknown):65534. Oct 10 16:02:48 mail spamd[11626]: clean message (0.0/5.0) for (unknown):65534 in 0.0 seconds, 3158 bytes. As you can see, $client->close() is called before the 'clean message' has been sent. I just moved the $client->close() from there to the 'check' method just after loggin to syslog. Maybe this helps.
Could this be a problem with the server running spamD? I've re-created a program using the spamD protocol and I have never seen any issues like this. It might also be an issue with your program? My Source code here: http://sf.net/projects/winspamc Can you reproduce this issue with a certian message or only while under a heavy load? Do you set maxchildren of SpamD? Which type of reporting are you using, ie: CHECK SYMBOLS REPORT REPORT_IFSPAM PROCESS Have you run spamD in debug mode? Do you see anything strange while doing that? I'm sure the real dev team will have more to ask, but this information would be helpful.
I'm not sure if this is related or not, but ever since we moved our spamd server to a remote machine, spamc occasionally logs this to syslog: spamc[10677]: failed sanity check, 5504 bytes claimed, -1 bytes seen Reviewing the spamc code, this could only happen if spamc received the "SPAMD" header (including Content-Length, etc) but was unable to retrieve the scanned message (using "process" method, not "check"). I'm guessing that read() called on a socket that was closed by the remote end would return -1, correct? I looked over the logs (a years worth) of running spamd on the same machine as spamc and I never saw the "failed sanity check" error. The instant we moved spamd to a different server, we started getting these errors. spamc machine is RH 7.2, spamd is RH 9.0.
I guess this is a timing issue, so it may be hardware dependent, as well as client dependent. The problem appears on light load. There is about one message every 5-10 minutes. The machine does other things, but 17:16:30 up 4 days, 21 min, 1 user, load average: 0.04, 0.04, 0.00 most of the time it is idle. I am using CHECK, as you can see from the tcpdump output. I just activated debug mode and wait for another occurence. > spamc[10677]: failed sanity check, 5504 bytes claimed, -1 bytes seen -1 means (man 2 read): "On error, -1 is returned" This looks like the error I have: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.net.SocketInputStream.read(SocketInputStream.java:182) The tcpdump shows that there is a rst indeed.
OK, I just found a bug in my software: I did not read the output from spamd till the end, therefore the RST of the connection. I changed this to something that reads till the end. Moving the $client->close() has proven to be fatal for this change, so I switched back to original 2.60. If this was the reason for this, I would like to ask why the code fails if I do not read all bytes sent to me. It seems like the RST of on connection affects the other connection. If this is true, this might be used for some DoS.
I think I found it. My original code did not send enough bytes some time ago. So I "fixed" it by adding a CRLF to each mail. After correcting this bug, the two bytes were still there. This is what I think had happened: I send 'Content-length' + 2 bytes to spamd. The 2 addition bytes came in a separate tcp-package. Spamd receives 'Content-length' bytes and tries to send the answer. The answer consists of two lines, each sent within a separate tcp-package. That happened was, that sometimes, the 2 additional bytes were received, after the '$client->close()' happened, but before the second line has been sent. That means that the socket was in the state 'I do not want any additional data' but the second line was still in the send buffer. The two bytes then causes a RST, taking down the connection and apparently discarding the second line. Lesson learned: do not send more than content-length bytes! I'll let the new version run for some time. If everything works fine, I'll close this bug. Thanks for your help and the good product!
It seems to work now.