SA Bugzilla – Bug 1468
[RFC] A new spamd protocol
Last modified: 2003-08-22 07:26:53 UTC
In bug 1452 there was some discussion about a new spamd protocol because the current is too unflexible and hard to extend. The last few weeks I already thought about this issue, too. But first the discussion from the other bug: --- Justin wrote: Backwards-compatibility with old spamc's is not an issue BTW. Basically, we can assume that if spamd is upgraded, spamc must be as well. But if we modify the protocol we should bump the version number, so that it's very clear to third-party clients that they may also need modifications. --- Eugene Miretsky wrote: That's why I actualy want to modify spamd/spamc protocol so that it can easily be extended in the future (since backwards compat is not an issue). I do not want to change the way things work right now, just the way request/response headers are generated and handled. [...] Pretty much I will design protocol request/response headers to be binary structures that can carry spam level, threshold, etc for example. --- Justin wrote: Hmm, I don't know about that :( I would strongly prefer keeping it similar to HTTP/1.0 in design... I think the other SpamAssassin developers would probably agree. --- Eugene Miretsky wrote: Using binary structures has very strong benefits. For example, all strings sent in the protocol will be prefixed by their length (encoded in 3 bytes), so that exactly the right amount of memory can be allocated to read response. No more searching for "\r\n", no more sscanf() -- just read in N bytes as reported by length prefixed string. There are other benefits as well, but length prefixed strings is the single, most important benefit since it would eliminate the need for parsing, and, in my experience, the best parsers are the ones that don't need to parse :)... --- Antony Mawer wrote: Well, I wrote several parsers for the CORBA IIOP protocol in a previous job, which is binary and therefore had those attributes as well. However my problem with it was: - harder to parse visually, which makes debugging and implementation harder IMO - a length-word prefix means a DoS is trivial -- send 0xffffffff as the length, and it's DoS'd. (Yes, that's trivial to avoid. Someone should have told the ORB vendors though ;) - binary-format incompatibilities mean you have to also create a set of marshalling rules for data -- ie. agree on endianness, size of types, padding etc. (this was IIOP's biggest problem.) I think we might have to "agree to disagree" on this one, sorry ;) I'm really a big fan of the ASCII-protocol-with-\r\n-line-endings approach. --- Eugene Miretsky wrote: [ad 1] I always go for efficiency and avoidance of parsing need. Protocol will become much simpler, and will simplify debugging by a lot. It will make it simpler to return more information in a single response. It will greatly simplify parsing. [ad 2] 3 byte length prefix can store at most 255^3 (FD02FF) -- enough for int -- no DoS. [ad 3] No need to complicate things. All I need is to encode length: byte1 = (len >> 16) byte2 = (len >> 8) & 0xFF byte3 = len & 0xFF --- Theo wrote: I'm all for simplicity, which I don't consider binary protocols to be. I haven't looked at the current protocol, but I'd like to keep the simple layout it has now.
--- My thoughts: We handle mail, so why don't we base spamp/2.0 on (E)SMTP? It would make things much easier. (I must admit that I wanted to study the RFCs a bit further before I make this proposal. But there came Eugine... :o) A sample handshake (spamc connects to spamd): 01: spamd: 220 spamd.example.com (spamd/3.0) at your service 02: spamc: EHLO localhost 03: spamd: 250-spamd.example.com greets you 04: spamd: 250-HELP 05: spamd: 250-8BITMIME 06: spamd: 250 XSPAMD OK 07: spamc: DATA 08: spamd: 354 Here we go... 09: spamc: <data> 10: spamc: <data> 11: spamc: . 12: spamd: 354 And now? 13: spamc: XCHECK 14: spamd: 220 SPAM 15 5 15: spamc: XREQUEST 16: spamd: 250-1234 bytes 17: spamd: 250-<data> 18: spamd: 250 <data> 19: spamc: QUIT 20: spamd: 220 Goodbye Some explanation: * In line 6 the server answers that it supports the SPAMD extension. The OK means that the transaction is already initiated and we can skip the MAIL and RCPT part. If the client doesn't like the answer, it just QUITs here. It should be ok to demand that spamc needs the 8BITMIME extension, too; makes life easier for us. * After we sent our data, the server wants to know what to do now. The CHECK command initiates the process. The server answers with a 220 code and SPAM or HAM followed by the points and the needed points. * In line 15 spamc REQUESTs the (modified) mail back. It's sent as an continued answer. Alternative commands are: REPORT or SYMBOLS (same meaning as in the current protocol) or just QUIT. Why is this so fancy? Imagine that somebody could plug spamd or spamc directly into a MTA. Here's another sample: 01: spamd: 220 spamd.example.com ESMTP (Exim with spamd/3.0) 02: spamc: EHLO spamc.example.com 03: spamd: 250-spamd.example.com greets you 04: spamd: 250-PIPELINING 05: spamd: 250-8BITMIME 06: spamd: 250-XSPAMD 07: spamd: 250 ETRN 08: spamc: MAIL FROM:<foo@example.com> 09: spamd: 250 OK 10: spamc: RCPT TO:<bar@example.com> 11: spamd: 250 OK 12: spamc: DATA 13: spamd: 354 Here we go... 14: spamc: <data> 15: spamc: <data> 16: spamc: . 17: spamd: 354 And now? 18: spamc: XCHECK 19: spamd: 220 SPAM 15 5 20: spamc: XDELIVER 21: spamd: 220 OK 22: spamc: QUIT 23: spamd: 220 Goodbye Here we've got the DELIVER command, initializing the delivery to the given mailbox. Note also that the OK in line 6 is missing, meaning that we need a MAIL FROM and RCVD TO. Of course our spamc and spamd do not have to understand all the possible commands. They just QUIT or give an error if they encounters something they don't know. But I think this looks like a nice clean protocol we could define for others to use (too). We could even write an RFC to make it a standard extension for ESMTP (and getting rid of all the Xes). I'm not too shure about the details (like error codes) yet but that shouldn't be any problem. The SMTP protocol also has the advantage over HTTP that it is an session oriented protocol. You can play the question-and-asnwer game quite nicely :o) Ok, I'm tired now (it's 3 o'clock in the morning here). I probably forgot half of the ideas I had but now it's time to take a nap. Go, discussing that stuff. Cheers, Malte
I don't know about this. In the abstract, I like the idea of an SMTP-like protocol, which uses the nnn status codes, a "-" to indicate continuing replies, and so on. (I actually prefer the HTTP model, but SMTP/FTP are quite close ;). However, I can't see any point in looking a lot like SMTP unless we *are* SMTP. And the fundamental way messages are passed with spamc, ie. "DATA", get marked-up message as a reply, is totally different from SMTP: "DATA", get status line saying "OK accepted for delivery". I think the best idea is to fix the current spamd protocol to ignore extra headers (a la HTTP), fix bug 1452 that way, and that's it. Don't forget third-party clients to the spamd protocol. IMO, it would be good to keep the protocol similar to what it is now so they don't have to totally rewrite their clients.
I made a mistake in the sample above: There has to be an initialization which makes the SMTP protocol defer delivery before the DATA part, else it would break normal SMTP communication. As I said before: That are just the thoughts I had the last few weeks, far from perfect ;-) The problem with our current stateless protocol is, that it's too unflexible: We now have REPORT and (undocumented btw) REPORT_IFSPAM. Should we extend this to SYMBOLS_IFSPAM or whatever in future, too? The SMTP way does first the CHECK and then the _client_ decides if it wants/needs more data. As I said above, I forgot half of what I thought of in the proposal above. But I have to scribble down a draft first before. Justin, you said it doesn't make sense to look like SMTP until we are SMTP. That's what I like about my idea (doesn't everybody like his own ideas?): It's possible to do the spam check and transfer the message to another server the same time. I imagine something like this: +--> World | v receiving +----+ deliver & +---------------------+ ~~~~~~~~ | MX |------{SMTP}------->| Local Mail Server |-{POP3}->| Client | +----+ spamcheck | (running spamd/3.0) | ~~~~~~~~ ^ +---------------------+ | | sending | +-----------------------{SMTP}-------------------------------+ The spamd/3.0 would of course still support the old protocol, so nobody has to write new clients if he doesn't want to. New features would go to the new one only. Everybody using libspamc wouldn't have any problem at all (at least if he doesn't rely on new features): The library will use the correct protocol automagically (if possible; I don't know the libs interface yet ;-). Se we would provide a library which understands most of the new protocol. To avoid bloat in spamc which doesn't use all the features, it would link against a basic libspamctiny. Note that this is a proposal for 3.0. I think I made it clear that if we ever have a new protocol I am strictly against a binary one ;-) And of course has bug 1452 (and thus the current protocol) to be fixed first.
I'd say that we can have 2 goals when deciding on a protocol: 1. Flexibility and backwards capability. 2. Performance. What we have now is in the middle of the road. An SMTP-like protocol would be essentially all #1 and no #2, while a binary protocol would be #2 entirely. I'd like to see us worry a little more about performance. I'm not opposed to a binary protocol, however I do see an advantage of backwards capability.
FYI -- see bug 1452: I've just added support for multiple headers in the reply message and called it protocol version 1.3.
Just to be sure my wishlist as stated in bug #1452 doesn't go into oblivion (as that bug's rightfully closed): I would very much like to see a way of supplying the (virtual) user's home directory directly when speaking to spamd (through libspamc), instead of supplying the username and let spamd figure it out, because you can very well have (I do, at least :-) a specialiced setup, where I would have to add code to the handle_virtual_user() routine in spamd just to figure out the location of the configuration file from spamd itself (and I doubt that patch will ever make it into your CVS, effectively "forcing" me to maintain my own branch of spamd). Someone mentioned that this would have security implications - I fail to see how that could be, and I never got an explanation as to what exactly his concerns were, either. See the log of bug #1452 for more background. Thanks, Tore Anderson :)
Closing as LATER/NEVER ;-)