Bug 1468 - [RFC] A new spamd protocol
Summary: [RFC] A new spamd protocol
Status: RESOLVED LATER
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P5 enhancement
Target Milestone: 3.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on: 1452
Blocks:
  Show dependency tree
 
Reported: 2003-02-10 17:03 UTC by Malte S. Stretz
Modified: 2003-08-22 07:26 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Malte S. Stretz 2003-02-10 17:03:17 UTC
In bug 1452 there was some discussion about a new spamd protocol because the 
current is too unflexible and hard to extend. The last few weeks I already thought 
about this issue, too. But first the discussion from the other bug: 
 
--- Justin wrote: 
Backwards-compatibility with old spamc's is not an issue BTW. 
Basically, we can assume that if spamd is upgraded, spamc must be as 
well.  But if we modify the protocol we should bump the version 
number, so that it's very clear to third-party clients that they 
may also need modifications. 
 
--- Eugene Miretsky wrote: 
That's why I actualy want to modify spamd/spamc protocol so that 
it can easily be extended in the future (since backwards compat is not 
an issue).  I do not want to change the way things work right now, 
just the way request/response headers are generated and handled. 
 
[...] Pretty much I will design protocol 
request/response headers to be binary structures that can carry spam 
level, threshold, etc for example.  
 
--- Justin wrote: 
Hmm, I don't know about that :( I would strongly prefer keeping it similar 
to HTTP/1.0 in design... I think the other SpamAssassin developers would 
probably agree. 
 
--- Eugene Miretsky wrote: 
Using binary structures has very strong benefits.  For example, all 
strings sent in the protocol will be prefixed by their length 
(encoded in 3 bytes), so that exactly the right amount of memory 
can be allocated to read response.  No more searching for "\r\n", no 
more sscanf()  -- just read in N bytes as reported by length prefixed string. 
There are other benefits as well, but length prefixed strings is the 
single, most important benefit since it would eliminate the need 
for parsing, and, in my experience, the best parsers are the ones 
that don't need to parse :)... 
 
--- Antony Mawer wrote: 
Well, I wrote several parsers for the CORBA IIOP protocol in a previous 
job, which is binary and therefore had those attributes as well.  However 
my problem with it was: 
 
- harder to parse visually, which makes debugging and implementation 
  harder IMO 
 
- a length-word prefix means a DoS is trivial -- send 0xffffffff as the 
  length, and it's DoS'd.  (Yes, that's trivial to avoid.  Someone should 
  have told the ORB vendors though ;) 
 
- binary-format incompatibilities mean you have to also create a set of 
  marshalling rules for data -- ie. agree on endianness, size of types, 
  padding etc.  (this was IIOP's biggest problem.) 
 
I think we might have to "agree to disagree" on this one, sorry ;)  I'm 
really a big fan of the ASCII-protocol-with-\r\n-line-endings approach. 
 
--- Eugene Miretsky wrote: 
[ad 1] 
I always go for efficiency and avoidance of parsing need. 
Protocol will become much simpler, and will simplify debugging 
by a lot.  It will make it simpler to return more information 
in a single response.  It will greatly simplify parsing. 
 
[ad 2] 
3 byte length prefix can store at most 255^3 (FD02FF) -- enough for int -- no 
DoS. 
 
[ad 3] 
No need to complicate things.  All I need is to encode length: 
  byte1 = (len >> 16) 
  byte2 = (len >> 8) & 0xFF 
  byte3 = len & 0xFF 
 
--- Theo wrote: 
I'm all for simplicity, which I don't consider binary protocols to be. 
I haven't looked at the current protocol, but I'd like to keep the simple 
layout it has now.
Comment 1 Malte S. Stretz 2003-02-10 17:03:56 UTC
--- My thoughts: 
 
We handle mail, so why don't we base spamp/2.0 on (E)SMTP? It would make 
things much easier. (I must admit that I wanted to study the RFCs a bit further 
before I make this proposal. But there came Eugine... :o) 
 
A sample handshake (spamc connects to spamd): 
 
01: spamd: 220 spamd.example.com (spamd/3.0) at your service 
02: spamc: EHLO localhost 
03: spamd: 250-spamd.example.com greets you 
04: spamd: 250-HELP 
05: spamd: 250-8BITMIME 
06: spamd: 250 XSPAMD OK 
07: spamc: DATA 
08: spamd: 354 Here we go... 
09: spamc: <data> 
10: spamc: <data> 
11: spamc: . 
12: spamd: 354 And now? 
13: spamc: XCHECK 
14: spamd: 220 SPAM 15 5 
15: spamc: XREQUEST 
16: spamd: 250-1234 bytes 
17: spamd: 250-<data> 
18: spamd: 250 <data> 
19: spamc: QUIT 
20: spamd: 220 Goodbye 
 
Some explanation: 
* In line 6 the server answers that it supports the SPAMD extension. The OK 
means that the transaction is already initiated and we can skip the MAIL and 
RCPT part. If the client doesn't like the answer, it just QUITs here. It should be ok 
to demand that spamc needs the 8BITMIME extension, too; makes life easier for 
us. 
* After we sent our data, the server wants to know what to do now. The CHECK 
command initiates the process. The server answers with a 220 code and SPAM or 
HAM followed by the points and the needed points. 
* In line 15 spamc REQUESTs the (modified) mail back. It's sent as an continued 
answer. Alternative commands are: REPORT or SYMBOLS (same meaning as in the 
current protocol) or just QUIT. 
 
Why is this so fancy? Imagine that somebody could plug spamd or spamc directly 
into a MTA. Here's another sample: 
01: spamd: 220 spamd.example.com ESMTP (Exim with spamd/3.0) 
02: spamc: EHLO spamc.example.com 
03: spamd: 250-spamd.example.com greets you 
04: spamd: 250-PIPELINING 
05: spamd: 250-8BITMIME 
06: spamd: 250-XSPAMD 
07: spamd: 250 ETRN 
08: spamc: MAIL FROM:<foo@example.com> 
09: spamd: 250 OK 
10: spamc: RCPT TO:<bar@example.com> 
11: spamd: 250 OK 
12: spamc: DATA 
13: spamd: 354 Here we go... 
14: spamc: <data> 
15: spamc: <data> 
16: spamc: . 
17: spamd: 354 And now? 
18: spamc: XCHECK 
19: spamd: 220 SPAM 15 5 
20: spamc: XDELIVER 
21: spamd: 220 OK 
22: spamc: QUIT 
23: spamd: 220 Goodbye 
 
Here we've got the DELIVER command, initializing the delivery to the given 
mailbox. Note also that the OK in line 6 is missing, meaning that we need a MAIL 
FROM and RCVD TO. Of course our spamc and spamd do not have to understand 
all the possible commands. They just QUIT or give an error if they encounters 
something they don't know. But I think this looks like a nice clean protocol we 
could define for others to use (too). We could even write an RFC to make it a 
standard extension for ESMTP (and getting rid of all the Xes). I'm not too shure 
about the details (like error codes) yet but that shouldn't be any problem. 
 
The SMTP protocol also has the advantage over HTTP that it is an session 
oriented protocol. You can play the question-and-asnwer game quite nicely :o) 
 
Ok, I'm tired now (it's 3 o'clock in the morning here). I probably forgot half of the 
ideas I had but now it's time to take a nap. Go, discussing that stuff. 
 
Cheers, 
Malte 
Comment 2 Justin Mason 2003-02-11 05:21:43 UTC
I don't know about this.

In the abstract, I like the idea of an SMTP-like protocol, which uses the nnn
status codes, a "-" to indicate continuing replies, and so on.  (I actually
prefer the HTTP model, but SMTP/FTP are quite close ;).   However, I can't see
any point in looking a lot like SMTP unless we *are* SMTP. And the fundamental
way messages are passed with spamc, ie. "DATA", get marked-up message as a
reply, is totally different from SMTP: "DATA", get status line saying "OK
accepted for delivery".

I think the best idea is to fix the current spamd protocol to ignore extra
headers (a la HTTP), fix bug 1452 that way, and that's it.

Don't forget third-party clients to the spamd protocol.  IMO, it would be good
to keep the protocol similar to what it is now so they don't have to totally
rewrite their clients.
Comment 3 Malte S. Stretz 2003-02-11 06:26:32 UTC
I made a mistake in the sample above: There has to be an initialization which  
makes the SMTP protocol defer delivery before the DATA part, else it would 
break normal SMTP communication. As I said before: That are just the thoughts 
I had the last few weeks, far from perfect ;-)  
 
The problem with our current stateless protocol is, that it's too unflexible: 
We now have REPORT and (undocumented btw) REPORT_IFSPAM. Should we extend this 
to SYMBOLS_IFSPAM or whatever in future, too? The SMTP way does first the  
CHECK and then the _client_ decides if it wants/needs more data. As I said  
above, I forgot half of what I thought of in the proposal above. But I have to  
scribble down a draft first before. 
 
Justin, you said it doesn't make sense to look like SMTP until we are SMTP. 
That's what I like about my idea (doesn't everybody like his own ideas?): It's 
possible to do the spam check and transfer the message to another server the 
same time. I imagine something like this:  
  
  +--> World 
  |  
  v                                                  receiving 
+----+     deliver &      +---------------------+          ~~~~~~~~ 
| MX |------{SMTP}------->|  Local Mail Server  |-{POP3}->| Client | 
+----+     spamcheck      | (running spamd/3.0) |          ~~~~~~~~ 
  ^                       +---------------------+              | 
  |                                                    sending | 
  +-----------------------{SMTP}-------------------------------+ 
 
 
The spamd/3.0 would of course still support the old protocol, so nobody has to 
write new clients if he doesn't want to. New features would go to the new one 
only. Everybody using libspamc wouldn't have any problem at all (at least if 
he doesn't rely on new features): The library will use the correct protocol 
automagically (if possible; I don't know the libs interface yet ;-). Se we 
would provide a library which understands most of the new protocol. To avoid 
bloat in spamc which doesn't use all the features, it would link against a 
basic libspamctiny. 
  
Note that this is a proposal for 3.0. I think I made it clear that if we ever 
have a new protocol I am strictly against a binary one ;-) And of course has 
bug 1452 (and thus the current protocol) to be fixed first. 
Comment 4 Duncan Findlay 2003-02-11 16:32:59 UTC
I'd say that we can have 2 goals when deciding on a protocol:

1. Flexibility and backwards capability.
2. Performance.

What we have now is in the middle of the road. An SMTP-like protocol would be
essentially all #1 and no #2, while a binary protocol would be #2 entirely.

I'd like to see us worry a little more about performance. I'm not opposed to a
binary protocol, however I do see an advantage of backwards capability.
Comment 5 Justin Mason 2003-02-12 14:57:40 UTC
FYI -- see bug 1452: I've just added support for multiple headers in the reply
message and called it protocol version 1.3.
Comment 6 Tore Anderson 2003-02-25 01:38:17 UTC
Just to be sure my wishlist as stated in bug #1452 doesn't go into
oblivion (as that bug's rightfully closed):

I would very much like to see a way of supplying the (virtual) user's
home directory directly when speaking to spamd (through libspamc), instead
of supplying the username and let spamd figure it out, because you can
very well have (I do, at least :-) a specialiced setup, where I would have to
add code to the handle_virtual_user() routine in spamd just to figure out the
location of the configuration file from spamd itself (and I doubt that patch
will ever make it into your CVS, effectively "forcing" me to maintain my own
branch of spamd).

Someone mentioned that this would have security implications - I fail to
see how that could be, and I never got an explanation as to what exactly
his concerns were, either.

See the log of bug #1452 for more background.

Thanks,
Tore Anderson :)
Comment 7 Malte S. Stretz 2003-08-22 15:26:53 UTC
Closing as LATER/NEVER ;-)