SA Bugzilla – Bug 4546
[review] spamc/spamd learning has potentially dangerous side-effects
Last modified: 2005-08-24 12:40:43 UTC
Now that spamd supports learning, but still trusts spamc's Username: option, it is trivial for users to posion (accidentally or intentionally) other users' Bayes DBs. For SQL, this isn't a big deal. Users can access the database directly and screw with it all they want, because the username and password are in the config file (usually local.cf). Or, they can use the SpamAssassin internals to do the job, or they can use sa-learn -u. For DBM/SDBM, this is a big deal. Before spamd/spamc learning the only way to write to another users' database is to be root (or have the db permissions wide open). There are ways to limit users from learning as others in both of the above scenarios (I think). In the SQL case, make local.cf or sql.cf or something readable only by root AND use spamd's --auth-ident option. Users won't be able to directly access the database, or use sa-learn, they will only be able to learn via spamc/spamd. In the DBM case, spamd's --auth-ident option should prevent this problem, as it prevents use of spamc -u <random user>. Unfortunately, it's my understanding that --auth-ident doesn't work very well, and not at all with some identd servers. See for example, Debian bug 278030 (URL above). It would be far better IMHO, to be able to disable (via command line option?) spamd learning. There of course still exist the above problems with respect to autolearning -- people can craft messages to be autolearned and poison a DB, but this can be solved by disabling autolearning for the truly (even more) paranoid. At the very least the above should be documented in README.spamd so sysadmins can decide for themselves. Set as blocker -- we need to talk about this before we release. At the very least we should document it, but I think it'd be better to add the command-line options to spamd to prevent it.
Subject: Re: spamc/spamd learning has potentially dangerous side-effects I brought this up when first adding the functionality, and no one cared then. I find it odd that suddenly this is SO important and a release blocker. Michael
Subject: Re: spamc/spamd learning has potentially dangerous side-effects > I brought this up when first adding the functionality, and no one cared > then. I find it odd that suddenly this is SO important and a release > blocker. Either I missed it, or it was primarily in the context of SQL. In SQL, adding this functionality makes NO difference, because of sa-learn -u or direct database edits. I don't think anyone brought up that it would affect DBM like this.
I know I didn't notice this until now. As a matter of fact, only a few days ago I set up spamd on my home server and figured out a way to pipe email through it from procmail on my ISP's server so that I can run the latest SA and use Bayes. It wasn't until then that I realized that there was no authentication of the user name when I set up a system for piping mistakes through spamc -L. I was going to look to see if perhaps I just missed some docuentation on how to do it before bringing it up. I agree that this is a serious matter.
Client/server user authentication is difficult, or at least it will take a lot of thought and work to get right. Unless someone comes up with some standard way of doing it that can be just dropped in to our code, it isn't something to add for the 3.1 time frame. auth-ident is not meant to be a security measure except under specialised circumstances. We could think about whether adding the ability to use client side SSL certificates would make it possible to configure spamc/spamd with secure user authentication without much change to the code, but I think even that small change is too much for 3.1 and is not a sufficiently complete solution. On the other hand, as Duncan points out in the bug description, we have ended up with something that has a vulnerability in any configuration that allows untrusted people to have personal Bayes databases. I don't see why the problem is any different with SQL vs DBM, as in both anyone can call spamc -L with any -u argument that they want. If we only document this, we would have to say that spamd can be used with individual Bayes databases only in configurations in which identd authentication can be trusted, or in which something like a VPN or ssh tunnel is used along with accepting connections only from localhost. That seems to me to be too restrictive. I agree with Duncan that adding a spamd command line option that disables spamc -L, plus adding documentation, is a sufficient step to take that is simple enough to add in for the 3.1 ship. That would allow a sysadmin to run spamd on a server, maintain individual Bayes databases for users, not have to count on --auth-ident, and set up something ad hoc for users to train on errors such as special IMAP folders or special email address to forward mistakes to. We can then open an enhancement bugzilla ticket to consider how to get secure spamc/spamd user authentication for 3.2.
I agree that a command-line flag is the way to fix this -- however, I'd prefer it to be the other way around -- IOW, the flag is required to *enable* remote learning. otherwise we'll expose sysadmins to this danger once they upgrade, on existing systems, which I don't think is good. having said that, I don't think this is a 3.1.0 blocker. IMO it could wait for 3.1.1, if nobody produces a patch in time. (Duncan?)
Subject: Re: spamc/spamd learning has potentially dangerous side-effects > I agree that a command-line flag is the way to fix this -- however, > I'd prefer it to be the other way around -- IOW, the flag is > required to *enable* remote learning. otherwise we'll expose > sysadmins to this danger once they upgrade, on existing systems, > which I don't think is good. That was my preference also. I'll see if I can code something up.
I also agree that learning disabled by default is correct. I've been thinking about how secure authentication can be done. The biggest problems I see: 1) The spamd protocol does not have any notion of session. Authentication that does not pass a password in the clear requires a back and forth protocol and the establishment of a session. A session based protocol is a huge change. 2) I really don't like the idea of rolling our own secure authentication protocol. Security protocols never come out right when you just make them up. 3) Some ideas I thought about don't scale well in, for example, an ISP environment. For example, let's say that we store a password in each user's bayes database and require that the password be somehow passed in the spamc call. Ignoring for the moment how to do that without the password being in the clear on the network (a solvable problem), it seems like it would be difficult for an ISP to set up for all users. Similarly for setting up a client-side SSL certificate for each user, which would have to go both on the client and server side. Here is a simple solution. Add a password configuration option to both the spamc configuration file and the regular (spamd) per-user configuration file. Spamc only uses it when -S is specified. The protocol gets an new field for password. If spamd sees a password configured in the specified user's preferences, it only will process a message if SSL is being used and the correct password is sent by spamc. The sysadmin already has to provide users with the ability to set their own local preferences and the user configuration files or database entries already have to be kept secure from other people. The user can enable learning via spamc -L by setting matching password options in the two config files. By requiring SSL, there is no security hole in passing the password in the clear from spamc to spamd. Note that this is not necessary if auth-identd can be trusted. That will be the case if spamd is configured to only accept connections from a trusted IP address (such as localhost) on a system that supports identd. In that case, the password is not necessary, --auth-ident takes care of the authentication, and everything just works. If SSL adds too much overhead for a system with many users and high load, then the sysadmin just has to set things up so that the spamd server can trust the spamc client machines' identd. Comments?
I've opened bug 4550 as an RFE to continue the discussion of my comment 7 since clearly we are going to go with the simple command line option change that Duncan is coding up for 3.1rc2 in the near term.
Created attachment 3092 [details] Patch to add --allow-tell / -l option to spamd This patch adds an --allow-tell or -l option to spamd to enable TELLs. If disabled, EX_UNAVAILABLE is returned to spamc (which promptly ignores it, see bug 4551). I picked -l because -t and -T were taken, and -l is the first letter of learn and the last letter of tell, and because it wasnt in use already.
Fixed in trunk BTW
I don't get the following in the POD comments: "This is not ususally a concern with an SQL Bayes store as users will typically have read-write access directly to the database, and can also use C<sa-learn> with the B<-u> option to achieve the same result" If the user is able to run sa-learn, it doesn't matter if Bayes is using SQL or something else, as sa-learn will only work if the user's userid has permission to access the database (or to access the password that grants access to the database). If the user is not able to run sa-learn, then spamc -L will have this problem whether the database is SQL or anything else.
Subject: Re: [review] spamc/spamd learning has potentially dangerous side-effects > "This is not ususally a concern with an SQL Bayes store as users > will typically have read-write access directly to the database, and > can also use C<sa-learn> with the B<-u> option to achieve the same > result" I'm not sure I understand your comment. I think the word "This" above is a bit vague. What I'm trying to say is that enabling TELLs is not ususally a problem for systems with an SQL Bayes Store as users could previously use sa-learn -u. Maybe what I should say is: If the Bayes database backend is a DBM or SDBM file, B<--allow-tell> allows any user to write to any other user's database, which is not possible using just C<sa-learn>. If the Bayes database is being stored in SQL, any user can already to write to any other user's database using the B<-u> option to C<sa-learn>, so B<--allow-tell> is not less secure. In addition, B<--allow-tell> alleviates the need to give all users direct database access. Is this clearer? I don't think we need another patch for this -- feel free to vote on this one, the doc can be changed after this is committed, without the need for another vote. Also, I'm not going to be around this week, so waiting for another patch from me is a bad idea!
Subject: Re: [review] spamc/spamd learning has potentially dangerous side-effects > If the Bayes database backend is a DBM or SDBM file, B<--allow-tell> > allows any user to write to any other user's database, which is not > possible using just C<sa-learn>. Maybe I'm missing some fine point of DB semantics here, but from the viewpoint of the English language, the concept of something called "allow-tell" allowing a tool to have world write access seems more than a little strange. I would more expect an "allow-tell" option to grant world *read* access, if anything of the sort. Would it maybe make more sense to name the option --allow-mallicious-data-corruption or some such, that might give a person looking at a system config file a hint that this option is related to granting write access to the database? Loren
Duncan, Ok, I think I understand. With DBM or SDBM users each have their own database file, so running sa-learn can give them no access to another user's database. With SQL, there is only one shared database, so access via sa-learn implies access to all users' data. With DBM and SDBM you can get security by setting things up so users can run sa-learn and either not allow spamc -L or set up --auth-ident properly. With SQL you can't get security if they can run sa-learn. But you still would have to either not allow spamc -L or else set up --auth-ident properly. In any case, I agree that doc can be dealt with later and the voting for the patch can proceed without it being perfect. I'll vote on this as is after I get a chance to build and run with the patch. Loren, --allow-tell is not an option that means "--allow-malicious-data-corruption". It means allow spamc to tell spamd the spam/ham status of the message being sent for the purpose of bayes learning. It just so happens that there is a side effect that allows any user to affect the Bayes database of any other user. The name reflects what it is for, not the side effect. In any case, I think that getting this fix in is what is important. If we figure out a way to get real user authentication for version 3.2, this option could change anyway.
+1
+1. looks good
Committed revision 239986.