Bug 6421

Summary: Spamc (3.2.5 & 3.3.2) w/SSL seems to cut all messages over 163770 (+/- 2) bytes
Product: Spamassassin Reporter: guerrero
Component: LibrariesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED WORKSFORME    
Severity: major CC: apache, bjorn, guerrero, kmcgrail, wolfsplat
Priority: P2    
Version: 3.3.2   
Target Milestone: 4.0.0   
Hardware: PC   
OS: Linux   
Whiteboard:
Attachments: Tcpdump log as requested
Tcdump log filtered
Test mail used to trigger the described bug
Logs showing the bug triggered with --ssl and not triggered without it

Description guerrero 2010-04-23 04:41:25 UTC
I'm using spamassassin in two Debian lenny computers, one a virtual openvz server and the other a physical one. I have spamc installed in the first and spamassassin in the second. The first one receives mail through postfix, and then uses spamc to forward it, through an SSL connection, to the second one and check it.Both spamc and spamassassin have been installed using the Debian package system (so they are version 3.2.5) and are kept updated.

I've found that some spam messages were not even being checked. When investigating this, I discovered that for those messages spamc were reporting errors like this:

Apr 22 15:33:41 correu spamc[23878]: failed sanity check, 240674 bytes claimed, 163771 bytes seen
Apr 22 15:34:50 correu spamc[24028]: failed sanity check, 189634 bytes claimed, 163770 bytes seen
Apr 22 15:37:09 correu spamc[24135]: failed sanity check, 240674 bytes claimed, 163771 bytes seen
Apr 22 15:37:13 correu spamc[24136]: failed sanity check, 312445 bytes claimed, 163770 bytes seen
Apr 22 15:37:26 correu spamc[24148]: failed sanity check, 240674 bytes claimed, 163771 bytes seen
Apr 22 15:38:40 correu spamc[24199]: failed sanity check, 240674 bytes claimed, 163771 bytes seen
Apr 22 15:40:35 correu spamc[24293]: failed sanity check, 311545 bytes claimed, 163770 bytes seen
Apr 22 15:41:31 correu spamc[24345]: failed sanity check, 240674 bytes claimed, 163771 bytes seen
Apr 22 15:43:02 correu spamc[24436]: failed sanity check, 240674 bytes claimed, 163771 bytes seen
Apr 22 15:48:15 correu spamc[24677]: failed sanity check, 240674 bytes claimed, 163771 bytes seen

then discarded the returned message and instead gave back the original one to postfix.

I modified and compiled a version of spamc to not register a failure when this error happened and tried it manually, to see what was spamc receiving. I found that it really got back the message from spamd cut at the length it reported. I checked the logs in the spamd machine, and there were not errors, but there was an entry identifying as spam the message I had just tried (which was). Moreover, using the option -R with those messages, I always get the correct spam status.

I modified your provided sample-spam.txt and sample-nonspam.txt to be longer than 170 KB and got the same behaviour with both of them: the data returned by spamd was always cut around 163770 bytes (and yet the spam status was correctly identified with the -R option). I tried spamassassin manually in the spamd machine for those files and I obtained, locally, the full correct record.I tried modifying the MTU length for the ethernet interface of the spamd computer, but it made no difference. I googled for this error and I found some similar cases; it seems in one case it was solved by setting LANG=C but in my case that changed nothing. There was another case which seemed to involve using spamassassin in a cygwin environment, which aparently could be worked around adding a sleep instruction somewhere in the spamd code, but it was an old version (not 3.2.5) and was not very clear on where to put this modification, so I have not been able to try that.

It may be I had inadvertently misconfigured something, but as far as I can tell, I'm using standard options everywhere (except, maybe, for the SSL connection) and other network connections (ssh, scp) between both computers are working flawlessly.

I would be grateful for any help in that problem. Thanks in advance!

Josep Guerrero
Comment 1 Mark Martinec 2010-04-23 06:09:46 UTC
So the question is: did spamd fail to send more than 160k of data over SSL,
or did spamc fail to receive it past the indicated size.

Could you repeat the experiment by using a large message and capture the
traffic with 'tcpdump -s 0 -w 0.log ...'. Even though the traffic is encrypted,
the size of transfered data should roughly correspond to the size of a
nonencrypted data. This could provide the answer to the above question.

Glancing over spamd's use of syswrite, I see that it never checks or
reports I/O errors, which is really unforgivable.

- some calls are to syswrite() directly, other through syswrite_full_buffer;
  the later should be used throughout as it caters for partial writes and
  does retries;
- even though syswrite_full_buffer() does check for errors, its caller
  never checks the return value; either the syswrite_full_buffer should be
  modified to log a warning when all retries fails, or the caller should
  do so.
Comment 2 guerrero 2010-04-26 06:18:22 UTC
Created attachment 4754 [details]
Tcpdump log as requested

I wanted to attach the result of the tcpdump log, but it looks as it is too big. I did all I could to remove other connections while executing tcdump, but I had to leave on (at least) the ssh connection to the spamd computer and a NFS mount. I've been able to open the log in wireshark, but I have no idea how to extract the total size of the packets passed from spamd to spamc from there (I would be grateful for pointers on that. I understand your time is limited and that is something I should be able to do myself).

I have put the attachment at this address:

http://www.ice.csic.es/files/guerrero/tcdump.log.bz2

Please, tell me when you download it, so I can remove it. The attached file is just a text file with the same address.

On Friday I sent the same attachment directly answering to your message request. I guess that was not the correct procedure, since the message is not appearing in the bug report. I apologize for any inconveniences I may have caused.
Comment 3 guerrero 2010-04-29 09:37:45 UTC
Created attachment 4755 [details]
Tcdump log filtered

Playing with wireshark, I've managed to filter out all the communications not originating from spamd on the spamd computer. The result is the attached file, whose size roughly matches the size of the shortened mail, so it looks that spamd is the one not sending all the data. I hope this may help to identify the problem.
Comment 4 guerrero 2010-07-06 05:21:54 UTC
Since it is a bit more than two months since the last comment (where I attached the filtered the tcpdumb log), I wanted to ask if there has been any progress on this bug. Has someone been able to reproduce it or/and find a workaround?

Just in case more information is required, the bug (or at least, the unwanted behaviour) continues to be there: the system continues to return cut every message longer than 163770+/-2 bytes. The computer running spamassassin (a 64 bit Debian 5) has been kept updated the whole time and has been rebooted at least twice (so it doesn't look like the problem is related to some incompatibility between, say, library versions loaded in memory and versions on disk) and no other errors (in other protocols or programs) have been detected.

Thaks for your time and attention,

Josep Guerrero
Comment 5 Kevin A. McGrail 2011-10-29 03:02:23 UTC
This still continues to baffle me.  I've never seen the sanity check error and check emails all the time larger than ~150KB

Can you list the command line for your spamd?

Also, do you perhaps have anything enforcing memory restraints on Spamd that might be causing issues?

I'm close to considering this a WORKSFORME bug but want to get more information if you are still having the issue.
Comment 6 Bjørn Mork 2011-10-29 07:38:50 UTC
Hello, I am seeing the exact same issue as the original reporter and am watching this bug for that reason.  I just didn't think I had any more useful information to offer.  A few of my most recent log messages:

Oct 27 13:04:54 canardo spamc[7031]: failed sanity check, 701458 bytes claimed, 163770 bytes seen
Oct 27 13:57:12 canardo spamc[9586]: failed sanity check, 4526382 bytes claimed, 163769 bytes seen
Oct 27 20:14:26 canardo spamc[27097]: failed sanity check, 195214 bytes claimed, 163769 bytes seen
Oct 28 09:15:23 canardo spamc[29946]: failed sanity check, 241547 bytes claimed, 163769 bytes seen
Oct 28 10:17:39 canardo spamc[32734]: failed sanity check, 426548 bytes claimed, 163769 bytes seen
Oct 28 10:27:31 canardo spamc[755]: failed sanity check, 521802 bytes claimed, 163769 bytes seen
Oct 28 11:10:52 canardo spamc[2872]: failed sanity check, 279988 bytes claimed, 163770 bytes seen
Oct 28 18:09:29 canardo spamc[23848]: failed sanity check, 1537235 bytes claimed, 163768 bytes seen

As the original reporter, I am also using SSL between spamc and spamd, and my gut feeling tells me that this may be related to the problem.  As the OR says:  That is probably the only  non-standard setting we use.  



My spamd command line is (with somewhat modified addresses)
/usr/sbin/spamd --username=spamd --max-children 15 --helper-home-dir --allow-tell --nouser-config --sql-config -i -A 127.0.0.1,172.16.185.158,10.138.100.138 --ssl -d --pidfile=/var/run/spamd/spamd.pid


Thanks,
Bjørn
Comment 7 Kevin A. McGrail 2011-10-29 13:23:47 UTC
(In reply to comment #6)
> Hello, I am seeing the exact same issue as the original reporter and am
> watching this bug for that reason.  I just didn't think I had any more useful
> information to offer.  A few of my most recent log messages:

Thanks Bjorn.  Knowing others have the issue allows me to justify escalating it.

Can you A) trap a message that exhibits this behavior, B) attach it and C) try it with and without SSL?  I've got a test bed with SSL so I can do the same but I'd like to remove some variables.

Regards,
KAM
Comment 8 guerrero 2011-10-29 14:16:47 UTC
It's good to see some activity on this!

Since I reported the bug, the server which runs spamd has been upgraded to 
Debian Squeeze  and now its spamassassin version is 3.3.1(but not the computer 
that runs spamc, which is being kept updated but has not been upgraded) . The 
problem still persists. As far as I can tell, it happens with every message 
over 163770+/-2 bytes (by the way, since most spam is shorter than that, it's 
very easy to overlook the problem). I think it doesn't really matter which 
message it is, only its length, but as soon as I can, I'll send some sample.

I'll see too about trying without ssl, but it may take some time.

My command line is

/usr/sbin/spamd --ssl --create-prefs --max-children 5 -i 172.16.73.20 -A 
172.16.22.20 --helper-home-dir /var/lib/spamassassin/ -s 
/var/lib/spamassassin/spamd.log -d --pidfile=/var/lib/spamassassin/spamd.pid

Thanks again for looking into this.

Josep Guerrero
Comment 9 Bjørn Mork 2011-10-29 14:28:10 UTC
Created attachment 4999 [details]
Test mail used to trigger the described bug
Comment 10 Bjørn Mork 2011-10-29 14:29:22 UTC
Created attachment 5000 [details]
Logs showing the bug triggered with --ssl and not triggered without it

I should have done this before, but better late than never...

And I obviously forgot to mention an important part of my setup: I am using
spamass-milter.  The MTA is sendmail.

It seems that spamass-milter is the missing part of the puzzle.  I am not able
to recreate the problem by merely calling spamc directly, even if I give it
input which has previously failed (when spamc was called via spamass-milter).

But resending the same mail via sendmail, having it call spamc through
spamass-milter, will reliably trigger the problem.

In fact, I can reliably trigger it using *any* sufficienly large enough mail as
long as it goes through spamass-milter.  But --ssl also affects this.  The
problem does not show up unless I use --ssl between spamass-milter and spamd.

I am attaching a completely dummy "test.mail" which does trigger the problem
for me, as long as it is passed via ssl to spamd by spamass-milter. The mail
contains nothing but a large enough file full of null bytes.  My tests
indicates that the actual contents doesn't matter, just the size.

I am also attaching "spamd-spamc-sendmail-log.txt" with logs showing
"test.mail" received twice: first using --ssl and failing, and then without
--ssl and working.

Note that I sent it to an external address which is forwarded back to me, as my
setup won't normally scan anything received from a host on the local network.
This was just done so because it was simpler than modifying the setup to make
spamass-milter scan local mail.



Bjørn
Comment 11 guerrero 2011-10-29 17:11:22 UTC
Hello!

Just to add more data, after reading Bjørn messages:

* I'm not using spamass-milter. It's not even installed on the servers.
* I'm using postfix as mail server
* This is the spamc command I use from within postfix:

spamassassin unix -     n       n       -       -       pipe
        user=nobody argv=/usr/bin/spamc -u ${user} -S -d 172.16.73.20 -s 500000 -e /usr/sbin/sendmail -oi -f ${sender} ${recipient}

Josep Guerrero
Comment 12 Mark Martinec 2012-05-15 18:47:04 UTC
I tried this on our system (openssl-1.0.1_2) and a 4 MB message was
successfully passed from spamc to spamd over ssl.

Even though the reason for a spamc cutoff threshold of about 160 kB
on reporter's systems remains unexplained, looking as the libspamc.c
code I can see that calls to SSL_write (and other SSL routines) lack
any status checking, so anything is possible and any error reported
by openssl remains stashed under a carpet. Not good.
Comment 13 Kevin A. McGrail 2015-04-13 21:48:25 UTC
In general the SSL code in spamc is in need of modernization and deprecation of things like SSLv3.

However, the +/- bytes issue is not repeatable for me.  I've checked 60 days of logs and the only failed sanity checks that I have are caused by service interruptions on the spamd side.

If you still have the issue with 3.4+, let us know.
Comment 14 guerrero 2015-04-14 07:11:23 UTC
I'm now using Debian Wheezy, with spamassassin 3.3.2 both in the spamc client and the spamd daemon, and the issue still happens. Since the new Debian version, Jessie, is scheduled to appear in a few weeks, I may be able to try soon a newer version of spamassassin. On the other hand, a few weeks ago I decided to remove the ssl communication and use only unencrypted communication between both computers, and the issue disappeared, so at least I can confirm that, whatever happens, it is ssl related.
Comment 15 Kevin A. McGrail 2015-04-14 12:51:09 UTC
Reopening as a likely bug in SSL.
Comment 16 Kevin A. McGrail 2018-09-04 15:23:15 UTC
Pushing to 3.4.3
Comment 17 Henrik Krohns 2022-03-06 13:49:05 UTC
Closing stale bug