579 – RFE: user configs should be read by spamc and sent to spamd

Bug 579 - RFE: user configs should be read by spamc and sent to spamd

Summary: RFE: user configs should be read by spamc and sent to spamd

Status:	NEW

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	Libraries (show other bugs)
Version:	2.31
Hardware:	All All

Importance:	P5 enhancement
Target Milestone:	Future
Assignee:	SpamAssassin Developer Mailing List

URL:
Whiteboard:
Keywords:

Duplicates (1):	4262 (view as bug list)
Depends on:
Blocks:

Reported:	2002-07-19 13:00 UTC by Ben Rosengart
Modified:	2010-01-27 03:16 UTC (History)
CC List:	4 users (show)

Attachment	Type	Actions	Submitter/CLA Status
This patch implements the proposed change.	patch	None	Ben Rosengart
This is documentation for the patch.	text/plain	None	Ben Rosengart
Patch to the lib/ subdirectory	patch	None	Brad "anomie" Jorsch
Patches to spamc	patch	None	Brad "anomie" Jorsch
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ben Rosengart 2002-07-19 13:00:47 UTC

For security and ease of configuration, I think it would be better if
user configuration files were read by spamc and sent to spamd instead of
being read by spamd as they are now.  Has anyone raised this question yet?
What is the general feeling?

Comment 1 joshua 2002-07-21 22:20:22 UTC

I would be all for this idea, having spamc send the user_prefs file 
to spamd would be great. 

The current way of having spamd find the 
per-user config info assumes that the machine running spamd has 
easy access to user information, the user's home space, and that it 
can easily get to that home space, pick up the config read it. 

That's not the case on large sites where many different machines comprise the
mail farm.  Ideally I'd like to run spamd on a machine all by itself without
needing nfs to get to home spaces or mysql, without it having to know
anything about users, etc.  

If spamc just sent along the user_prefs file first, then spamd would be
self-contained, I could use it from any machine on our network where users 
are getting mail without worrying about how I'm going to get those user prefs
over to the spamd machine.  

This would be a great improvement.  I would have made these changes already
myself, but I'm reluctant to do local hacking as it prevents easy upgrades.
:-(

If I made these changes to spamd/spamc, and did it such that it was an 
option in the protocol and thus also backwards compatible, could it be 
included? Or would the people who wrote these be willing to do it?

BTW, someone on the devel list wrote this:
>Every change in the user preferences will need a new revision of the
>spamc/spamd protocol.

But I don't think that's the case.
I would just have spamc send along the whole user_prefs file as is.
Then let spamd parse it normally just as if it had gone out and picked
it up from the user's home directory itself. 
The only thing we'd be changing is how spamd got the user_prefs file. 

Right now it goes out and gets it itself, but why not just send it along
in the first place?

Comment 2 Craig Hughes 2002-08-23 03:32:51 UTC

Discussed fairly extensively on the list.  If someone implements it, feel free
to add the patch here.  Highly unlikely to get included in the standard distro
because I think the architecture is just entirely wrong.  But others might find
your patch useful.

Comment 3 Justin Mason 2002-08-23 04:57:53 UTC

Craig -- I really disagree.  I think there's no security implications here,
as we read user_prefs files in a paranoid way anyway (rules cannot be
defined from them etc.)

Reopening for later arguing once 2.40 is released ;)

Comment 4 Ben Rosengart 2002-08-23 09:45:26 UTC

We have implemented this, as promised, and you should expect to see patches
from us within a week or two.

Comment 5 Craig Hughes 2002-08-23 11:41:13 UTC

I don't see it as a security thing Justin.  I see it as an architectural issue.  Clients just don't 
send servers config information, it's wack.  The server should know when it need to get 
config info, and should have some way of getting it.  If it's some weird callback-to-client 
thing, that fine.  If it's fetch-config-by-http so that you don't have to install a few KB worth of 
database client libs on your mail server and prefer to install a few KB of HTTP client libs, 
then fine.  But ultimately, anything like this should be implemented in the style of 
ConfSourceSQL.pm -- that way spamd can be smart about loading prefs, maybe cache 
them, maybe pass them over the network compressed or encrypted (easy option with 
some DBs), selectively fetch them, etc, etc.  This whole notion of passing the config files 
just really seems badly wrong to me.

Comment 6 joshua 2002-08-24 19:17:57 UTC

I'll add to this again.  

The more I thought about the current architecture of spamc/spamd, the more I 
think it is the current setup is wacky, and this change would make it much more 
normal as far as a client/server relationships go.  Thinking about it as 
sending ‘config' info to the server is probably why it still sounds strange to 
some. 

But really, the user_prefs info is not telling spamd how to ‘configure' itself, 
but simply telling it what to do. 

If you fill out a form on a web page, you naturally send the contents of that 
form to the httpd server and tell it to do something with the information you 
send it.  That is not configuration information.  You don't expect the httpd 
server to connect back to you somehow and get all the information it needs.  

If you did expect that then the server would require all sorts of things like a 
way to connect to you (a server on your machine), authentication information, 
which user you are, where your information is kept, and a secure way to get at 
it.  Isn't that nutty? :-) But that's what spamc/spamd does.  

Why not just tell the server what you want it to do in the first place?  Why 
require all that extra overhead, IO, complications, and security risks?

Seems to me the initial design was thought out only on a local scale, assuming 
that spamd would be running on the same machine as spamc, and the rest was an 
afterthought.  For example, I have to specify every possible other IP I want to 
connect to spamd in the startup, with no wildcards or hostnames or anything 
like that possible.  What if I have lots of machines whose IPs change? (and I 
have this problem too). 

It's also the current design that creates all the security risks.  Changing as 
suggested here would remove them **all**.  Once you ask the server to stop 
trying to get at user information, whether on the same machine or, (worse) on 
some other machine, all the security problems disappear.

That is, since the server is not dependant on knowledge of a user base, it 
would not have to know anything about users or be in any way connected to them, 
no need to be root, no need to make config files readable to the spamd user 
(root or not), no chance of users reading or changing other user's settings, 
etc.  Not to mention security risks involved in letting spamd connect back to 
you and retrieve arbitrary user information.

That would make theses servers completely independent, allowing you to setup a 
farm of them to serve whomever you want without all the currently necessary 
mess of trying to figure out user config information, you could even out-source 
such servers since they are user-independent. 

And then we could also allow spamd to read user created checks from each 
user_conf, removing that annoying limitation as well.

It's hard to believe that larger ISPs haven't already suggested this sort of 
architecture.  Is this really the first ticket on this?  or do they come and 
see this problem and immediately leave the idea behind for something that 
scales better and is more secure?  I would like to use spamd/spamc instead of 
home-growing everything, but these problems are considerable drawbacks.  Maybe 
I represent more people who didn't even bother to stick around contribute? 

There's also the argument of keeping spamc lightweight.  
To this I say that the change would be a couple lines of code, and the function 
could be invoked with a command line flag,  (so as to not burden those who 
don't want this).  

That's not much more, it seems, considering the considerable benefits reaped. 
And it would actually decrease the total IO involved.  spamc would just send 
along everything needed at once, removing all the IO overhead involved in 
asking the server to connect back to find the information that spamc could have 
sent along in the first place.

Maybe I'm missing something here, but the evidence sure seems compelling to me. 
Isn't it time for this to expand its functionality for the times? 

please? :-)

Comment 7 Matt Sergeant 2002-08-27 05:11:54 UTC

Subject: Re: [SAdev]  user configs should be read by spamc and sent
 to spamd

> If you fill out a form on a web page, you naturally send the contents of that 
> form to the httpd server and tell it to do something with the information you 
> send it.  That is not configuration information.  You don't expect the httpd 
> server to connect back to you somehow and get all the information it needs.  

Bad analogy. With web servers you tend to use sessions and/or user 
authentication. The config is still all stored on the web server, not on 
the client end.

I'm with Craig on this one.

Matt.

Comment 8 Malte S. Stretz 2002-08-27 05:32:16 UTC

The analogy is quite good in this case; spamd throws everything away which is   
not privileged. So it's not really a "config" anymore because it doesn't   
change spamd's core behaviour but only the way it displays its results. If you  
submit your webpage to validator.w3.org, the config is _not_ saved on the  
server.  
  
Having a server which polls the config from the client per some Conf:: handler  
is IMO braindead (don't take this personal, ok?). Let's assume you've got a  
company with ~100 workstations, most of those roadwarriors with their own  
laptop connected per VPN (I administer such a company). Employees are coming  
and going. Now you set up a company-"public" spamd server and everybody who   
wants may connect to this server. Shall I tell everybody "Hey, you've got to  
setup your own Apache/NFS/whatever if you want to customize the way  
SpamAssassin works. Oh and please don't forget to update it regularly and  
secure it very well when you're out there."?  
 
I'm with Joshua on this one :o)

Comment 9 joshua 2002-08-27 05:53:47 UTC

Subject: Re:  user configs should be read by spamc and sent to spamd



  There is no perfect analogy, but the idea is the same.  It still
makes no sense to make the spamd server connect back to the client
for information. Call it whatever you want, it's still makes no sense
and is a real limitation in the design, not to mention the other
problems it causes as I mentioned before.  
  How can I entice those on the other side of the fence to engage in
discussion here?  :-)  
  I haven't been able to get anyone to dispute any of the arguments
I've giving for fixing these design problems, and it would be nice to
have something to tell my boss.  :->
  Any help appreciated! thanks.

Comment 10 Justin Mason 2002-08-27 06:07:23 UTC

I can see the point, too.  

IMO, user_prefs is *not* SA configuration -- it's more like browser
configuration, it's just the required_hits threshold, maybe some whitelisted
addrs.  So it's not really configuration, just *user preferences*.   
  
I would think a good analogy is the font setting in your web browser -- this is
a user preference.  Or the validator.w3.org example.  
  
Or cookies -- contrary to what Matt posted, not all sites store user configs in
a db, there's a few that let the browser store a few simple settings in the 
cookie.


Basically, I don't think there's a need for spamd to require that all user
prefs info is kept on the spamd server machine, or loaded from SQL or over NFS.
*let* the client send it, if that fits into the sysadmin's network design
more cleanly.

Comment 11 Matt Sergeant 2002-08-27 06:08:32 UTC

Subject: Re: [SAdev]  user configs should be read by spamc and sent
 to spamd

I'd be curious to know if these problems just plain don't exist if you 
use PPerl instead of spamc/spamd for persistence.

Get PPerl from CPAN.

Matt.

Comment 12 Ben Rosengart 2002-08-27 18:24:14 UTC

Created attachment 283 [details]
This patch implements the proposed change.

Comment 13 Ben Rosengart 2002-08-27 18:35:37 UTC

Created attachment 284 [details]
This is documentation for the patch.

Comment 14 Ben Rosengart 2002-08-27 18:37:34 UTC

The patch was written by Brian Marcotte of Panix.  He has joined the
mailing list in order to be available to answer questions.  In addition
to the brief documentation provided, he asked me to mention that the
changes to the server are very small.  (I count 12 lines.)

Comment 15 Ben Rosengart 2002-09-09 12:52:09 UTC

So, anyone?

Comment 16 Ben Rosengart 2002-09-12 13:51:18 UTC

So, can we get some movement on this?  Is there a committer in the house?

Comment 17 Duncan Findlay 2002-09-12 17:24:11 UTC

Any reason for -j and -n?

Comment 18 Brian Marcotte 2002-09-13 15:58:54 UTC

The -n flag is for the server. This option tells the server to accept user
preferences from the client. This flag isn't strictly necessary, but not
including it would mean that spamd would always accept preferences from spamc.
We thought some admins wouldn't like that.

The -j flag is for the client. This option allows you to specify the path of
the user preferences file you want sent to the server. I could have done it
so that the option always sends ~/.spamassassin/user_prefs, but I figured that
some people may want the option of using alternate user prefs  files.

Comment 19 Brian Marcotte 2002-09-13 16:05:39 UTC

Sorry, if you meant why I chose the particular letters "n" and "j", well,
"all the good ones were taken". The "n" was for "network" as in accept prefs
from the network.

Feel free to change them or suggest something else.

Comment 20 Duncan Findlay 2002-09-13 21:12:02 UTC

Subject: Re: [SAdev]  user configs should be read by spamc and sent to spamd

> Sorry, if you meant why I chose the particular letters "n" and "j", well,
> "all the good ones were taken". The "n" was for "network" as in accept prefs
> from the network.

That was my question. I'd reccommend this be a --longopt only, as it
doesn't seem (to me at least) that this'll be a heavily used option.
Furthermore, that's why we implemented longopts.

Comment 21 Brad "anomie" Jorsch 2002-10-01 12:15:06 UTC

For some unknown reason I decided to hack on this. I didn't really like the
Panix patch, since it sends the entire prefs file over the network for every
single email (among other reasons). So i set about writing code so that the
prefs could be cached, and so spamd could reject the offer if the length and
checksum match what it has cached.

In the process, I realized there's not much difference between "read from spamc
and store to X" and "store to X". And the proliferation of "read from Y" was
getting on my nerves as well. So why not create an interface "read/write to X",
so the only difference between "read/write to a directory", "read/write to
~/.spamassassin/user_prefs", "read/write to SQL", and so on is the backing store
that's being used.

So, i wrote ConfSourceGeneric.pm to define an interface for reading from an
arbitrary source. I also wrote ConfStoreGeneric.pm to extend that interface for
writing to the source. And i wrote a number of modules implementing these
interfaces:

ConfStoreDirectory.pm - Store user prefs in a direcrory, like the spamd
-V/--virtual-config option.

ConfStoreHomedir.pm - Store user prefs in the user's home directory, like the
current default behavior.

ConfStoreSQL.pm - Store user prefs in an SQL database. It improves on the
current ConfSourceSQL by imposing an order on the directives (since no
particular order is guaranteed by the SELECT), and by allowing saving to the DB
as well as reading from it.

ConfStoreVPopmail.pm - Much like the current spamd -v/--vpopmail option. I
haven't been able to really test this one, but it's basically ConfStoreHomedir
that uses vpopinfo instead of getpwent so it *should* work...

ConfStoreMemory.pm - Store user prefs in memory. It even handles communication
from child processes, so it should work with spamd.

ConfStoreNull.pm - Dummy module, which doesn't actually store anything.

ConfSourceSpamc.pm - Has a method to handle a simple protocol for spamc to send
the user prefs to spamd, and then store these prefs into any of the
ConfStoreGeneric subclasses. It implements the Source interface so these prefs
can be read back easily. I also have a patch for libspamc that will support
this. The idea is that spamd will recognize OFFER_PREFS like it does PROCESS or
CHECK now, and hand the socket to this module to do the actual reading of the
prefs (if necessary). The ConfSourceGeneric interface and the protocol are
designed so that the prefs don't need to be sent across the wire every time.

ConfStoreSimple.pm - Stores a single pref set at a time. Only really useful as
the store for ConfSourceSpamc, when you want the prefs sent over the wire every
time.

ConfSourceAlt.pm - Reads prefs from the first of several possible sources. This
way, we could do e.g. "look in $HOME first, then the virtual directory if that
fails".

ConfSourceCat.pm - Reads prefs from all of several possible sources.


I've done enough unit testing on these that i doubt they have too many
bugs left. I'm not sure what would be the best way to integrate these in
with spamd or spamassassin, though, since most of the current
config-reading methods aren't necessary anymore. Do we want to maintain
backward compatibility, or do we want to eliminate the useless methods?

Anyway, i'll attach the patches to this bug.

Comment 22 Brad "anomie" Jorsch 2002-10-01 12:17:05 UTC

Created attachment 366 [details]
Patch to the lib/ subdirectory

Comment 23 Brad "anomie" Jorsch 2002-10-01 12:17:28 UTC

Created attachment 367 [details]
Patches to spamc

Comment 24 Ben Rosengart 2002-10-01 13:16:38 UTC

Why don't you like sending the prefs every time?  Performance?  I suggest
that you benchmark the different systems.  My guess is that the extra
round-trip across the network will consume the time you save by not sending the
user_prefs over.  You have to read the prefs file to compute the checksum, so
you don't save I/O, and you use extra CPU.

However, as long as the functionality that we need is implemented, I don't
particularly care whether it's your patches or Brian's that are used.  And it
seems that you are cleaning up the code in general and making it more
extensible, which is laudable.

Comment 25 Brad "anomie" Jorsch 2002-10-02 10:05:17 UTC

Subject: Re: [SAdev]  user configs should be read by spamc and sent to spamd

On Tue, Oct 01, 2002 at 01:16:38PM -0700, br+spamassassin@panix.com
commented:
> 
> Why don't you like sending the prefs every time?  Performance?

Bandwidth mostly. If you use a persistant cache (e.g.
ConfStoreDirectory, ConfStoreSQL) behind ConfSourceSpamc, this could be
a cheapo way for users to update their prefs on the server without a
login. Of course, for that to work we might want authentication of some
sort in spamd...

> You have to read the prefs file to compute the checksum, so you don't
> save I/O, and you use extra CPU.

Certainly not performance for spamc, considering i used the slower,
non-table-driven crc implementation ;)

> However, as long as the functionality that we need is implemented, I don't
> particularly care whether it's your patches or Brian's that are used.  And it
> seems that you are cleaning up the code in general and making it more
> extensible, which is laudable.

I do try. This time, it all fell out of the problem of how to cache the
prefs. In memory, on disk, why not abstract it so people can choose what
they like?

Comment 26 Matt Sergeant 2002-10-15 03:56:58 UTC

Subject: Re: [SAdev]  user configs should be read by spamc and sent
 to spamd

FWIW, I've checked this code into the SA3 CVS because I generally like 
the idea here. However I'd also like to talk to the author about further 
re-designs of the whole conf system. The other thing is that I didn't 
really know how to actually integrate these changes, so I could do with 
a few pointers or possibly some coding help. Thanks.

(PS: See the README.txt file in SA3 CVS for details on how I'm thinking 
the Conf structure should probably work).

Comment 27 Duncan Findlay 2002-12-13 19:29:13 UTC

This is apparently fixed in SA3. Closing LATER. As in it will be fixed later,
when SA3 is released. FIXED/LATER? What's the dif?

Comment 28 Matt Sergeant 2002-12-16 09:39:12 UTC

Subject: Re: [SAdev]  user configs should be read by spamc and sent
	to spamd

> This is apparently fixed in SA3. Closing LATER. As in it will be fixed later,
> when SA3 is released. FIXED/LATER? What's the dif?

The diff is that it's not fixed in SA3 (yet?) but it might be later ;-)

Comment 29 Justin Mason 2003-05-01 18:07:12 UTC

reopening for 2.60, since 3.0 is far-off... I think I'll try integrating
anomie's code.

Comment 30 Justin Mason 2003-05-08 10:54:57 UTC

taking bug

Comment 31 Justin Mason 2003-05-27 12:12:16 UTC

damn, don't think this will make it into 2.60. :(

Comment 32 Duncan Findlay 2004-01-03 12:32:21 UTC

This looks promising... at least as far as the Conf backend goes. Since SA3 is
never going to happen, it might be worth discussing this.

Comment 33 joshua 2004-01-03 19:24:57 UTC

Subject: Re:  user configs should be read by spamc and sent to spamd


> This looks promising... at least as far as the Conf backend goes. Since SA3 is
> never going to happen, it might be worth discussing this.

Would be great to re-open this.  It's still a major pain to setup
everything so that the spamd server on another machine has to be able to
find user configs for each user on another machine, when the info could
just be sent along with the initial connection in the first place.
It would greatly simplify a spamd server setup.

Comment 34 Justin Mason 2004-02-28 13:23:21 UTC

lowering pri on RFEs

Comment 35 Theo Van Dinter 2004-03-11 15:14:33 UTC

punting to 3.1

Comment 36 Daniel Quinlan 2004-03-11 15:21:15 UTC

no reason given for reassigning to 3.1, no agreed-to plan for 3.1

Comment 37 Daniel Quinlan 2005-03-30 01:08:30 UTC

move bug to Future milestone (previously set to Future -- I hope)

Comment 38 John Madden 2005-04-16 10:50:21 UTC

*** Bug 4262 has been marked as a duplicate of this bug. ***

Comment 39 John Madden 2005-04-16 11:03:45 UTC

Instead of blindly sending the contents of user_prefs to spamd, should spamc do
some minimal processing beforehand? I think any very user specific stuff that
spamc can do quickly should be done by spamc. For example, could spamc check the
From: header against the whitelist and blacklist? If the address is in either,
there's really no need to send the mail to spamd, since the user set these
values. If the From is blacklisted, score the mail +100 and send it back,
instead of sending it on to spamd. Equally for whitelisted address, score it
-100 and send it back.

Comment 40 Justin Mason 2005-04-16 11:58:18 UTC

I think it's better to keep spamc as stupid as possible -- "fast and stupid"
should be its motto. ;)

we should be doing shortcircuiting like that in spamd, and in fact all the code
is there to do early-exit -- it just hasn't been implemented yet...

Comment 41 Daryl C. W. O'Shea 2007-07-10 13:40:18 UTC

I'm now doing this on large clusters with Perl milters passing config via
M::SA::Client to spamd.  It saves having to do additional SQL queries for config
when the milters are already doing a config query for numerous filtering
preferences.

Comment 42 Michael Parker 2007-07-10 13:46:42 UTC

I haven't looked at this in awhile but I wouldn't mind some sort of plugin hook
in spamd that ran for headers.  Then you could pass data in and let a plugin do
whatever it wanted with the data.

Would be trivial to add a user-config header that contained a frozen data
structure or something like that then have the plugin unfreeze it and fold it
into the config object.

Comment 43 Daryl C. W. O'Shea 2007-07-10 13:53:29 UTC

I don't think the client should be doing any of the parsing (which rules out
freeze/unfreeze)... it should just pass the config text as it would be stored in
a user_prefs file or in SQL/etc.  Maintaining both a Perl and C version of the
config parser would be a pain/buggy, especially seeing few of us actually like
playing with C.

If the headers could be compressed like we can now compress the message, that'd
be great.

Comment 44 Michael Parker 2007-07-10 14:00:49 UTC

I wasn't suggesting we offer anything that does the user config stuff in spamc.
 This would be strictly for folks using their own clients.

Comment 45 Justin Mason 2010-01-27 02:31:38 UTC

moving some 3.3.0-targeted bugs into the vague Future.  feel free to retarget to 3.3.1 if you think you'll be able to work on them

Comment 46 Justin Mason 2010-01-27 03:16:21 UTC

reassigning, too