SA Bugzilla – Bug 5752
Let's get rid of the default rules dir and make sa-update mandatory
Last modified: 2009-06-24 14:55:06 UTC
Per discussion in bug 5751, I wanted to float the idea of getting rid of the default rules dir (aka: rules, /usr/share/spamassassin, etc,) and making sa-update mandatory. There are really several reasons for this, but two main ones come to mind: a) We have a project goal of having a separate engine and ruleset. Right now, you get both from the tarball, and it leads to the whole local_state_dir override thing which causes many problems (see bug 5751). It also forces people to install our rules even if they don't want them, etc. b) We have multiple areas that need updating when rules/scores/etc need changing. ie: spamassassin/rules/branches/3.2 and spamassassin/branches/3.2/rules. Do we update both of them? Just the updates area? Do we add new rules to both or just for updates? I think getting rid of the default rules dir, not distributing rules w/ the tarball, and forcing people to use sa-update to get their rules (and then, optionally, updates,) solves these issues, makes it more simple, and helps users avoid common problems.
+1 The current setup of maintaining multiple ruleset versions is a mess and a pain to deal with. I recall, though, that someone was against this when we discussed it for 3.2. Hopefully they've changed their mind.
I'm +1 I think when we originally talked about this there was concern from/for the distribution packagers and how they were going to distribute things to folks who shouldn't have to run sa-update. I don't see why it can't be a separate installable package though, so see no problem with this approach.
I'm OK with splitting out the rules from the code. HOWEVER, I am definitely NOT OK with forcing people to use sa-update. As someone who packages SpamAssassin, I'd want a way to download the rules as a tarball for packaging purposes. We use the SpamAssassin engine and don't necessarily want our customers running sa-update because we like to examine the rules before we give them to them.
(In reply to comment #3) > As someone who packages SpamAssassin, > I'd want a way to download the rules as a tarball for packaging purposes. There already is a way to get a tarball of the rules... wget, or whatever, the sa-update update tarball. Or you can get the rules from SVN.
(In reply to comment #4) > (In reply to comment #3) > > As someone who packages SpamAssassin, > > I'd want a way to download the rules as a tarball for packaging purposes. > > There already is a way to get a tarball of the rules... wget, or whatever, the > sa-update update tarball. Or you can get the rules from SVN. Installation's a bit of a bummer though right now. :( here's a proposal. Currently, sa-update does this for each channel: - nslookup the TXT record for the current version - nslookup/HTTP get the mirrors list - HTTP get the appropriate .tar.gz file for that version under one of the mirrors - GPG verify the package - unpack the tar.gz in a tmp dir - lint check it - extract into /var/lib/spamassassin, using a good deal of paranoia In other words, it's doing two main tasks here; the first 3 steps are the downloading of the appropriate package from our server via HTTP, and the latter 4 steps are an appropriately-paranoid extraction step. Why don't we add a switch to sa-update ("--install"?) which, once given a .tar.gz and its .asc and .sha1 verification stamps, will install that ruleset. That way, people who don't want to require net use for the initial setup can bundle those 3 files, and run "sa-update --install foo.tar.gz foo.tar.gz.asc foo.tar.gz.sha1" at the end of "make install". Who knows, maybe it'd be appropriate for *us* to bundle one of these in our release tarballs, too...
(In reply to comment #5) > That way, people who don't want to require net use for the initial > setup can bundle those 3 files, and run "sa-update --install > foo.tar.gz foo.tar.gz.asc foo.tar.gz.sha1" at the end of "make > install". This is a good idea. Right now, we generate "spamassassin" debs and rpms. We really need a way to have a "spamassassin-rules" deb or rpm that can be installed using the normal package manager. In order to generate such a package, we need a nice way to just populate the rules directories with a bunch of rulesets. In order to keep it repeatable, we really don't want to run "sa-update" each time because who knows which version of the rules we'll get? Regards, David.
(In reply to comment #5) > Who knows, maybe it'd be appropriate for *us* to bundle one of these in our > release tarballs, too... devs: what are your opinions on this?
upping priority, since we definitely want to do this for 3.3.0.
working on this. Should there be a warning if the user attempts to use SA with no rules installed? I presume so.
(In reply to comment #9) > Should there be a warning if the user attempts to use SA with no rules > installed? I presume so. Hrm. I want to say yes, but at the same time I know there are uses of SA to process messages and not scan them, so configs aren't necessarily needed. An argument, of course, would be to at least then always pass in something like "config_text => ''" -- if you don't need configs, make sure you don't have any. How about a warning/error from "spamassassin" and "spamd", but not the modules?
: jm 554...; svn commit -m "bug 5752: add sa-update --install switch, to allow installation of already-downloaded rule update tarballs without performing a download" Sending sa-update.raw Transmitting file data . Committed revision 637451. next things to do: - remove the rules files from the dist tarballs (but leave language files? other stuff in /usr/share/spamassassin that isn't in the updates) - add warning in Mail::SpamAssassin if no rules are found - fix "make disttest" to get the rules somehow BTW, regarding (b) from the original comment 1: 'b) We have multiple areas that need updating when rules/scores/etc need changing. ie: spamassassin/rules/branches/3.2 and spamassassin/branches/3.2/rules. Do we update both of them? Just the updates area? Do we add new rules to both or just for updates?' I suggest we get rid of spamassassin/rules/branches/3.2, and instead start building updates from spamassassin/branches/3.2/rules. that means that dev is simpler, since we have a working tree at all times during development; it's only in the distribution tarball and after that that we use /var/lib.
(In reply to comment #10) > (In reply to comment #9) > > Should there be a warning if the user attempts to use SA with no rules > > installed? I presume so. > > Hrm. I want to say yes, but at the same time I know there are uses of SA to > process messages and not scan them, so configs aren't necessarily needed. An > argument, of course, would be to at least then always pass in something like > "config_text => ''" -- if you don't need configs, make sure you don't have any. > > How about a warning/error from "spamassassin" and "spamd", but not the modules? works for me. note however that we already have something that partially does this -- if you don't have a "Check" plugin registered, lint or check will produce a fatal error. so when using the modules directly you have to add a "miniature" config to load the Check plugin. So adding this to the modules might actually *increase* the usability a little anyway! people who are using the modules directly (Mark M, David) -- what do you guys think?
> people who are using the modules directly (Mark M, David) -- what do > you guys think? If I understand this correctly, these changes won't affect amavisd-new. I don't provide packages, don't provide amavis -specific SA rules or .pre files, or provide a modified SpamAssassin. So people just install SpamAssassin any way they find natural (CPAN, ports, packages, whatever) following its documentation, and if it says sa-update must be executed, or a port/package maintainer does it for them, there is no problem. If it happens there are no rules, and either a ->check or a Mail::SpamAssassin->new dies or always produces empty results, that is fine by me (I slightly prefer dieing, it makes the problem immediately apparent).
r637451 needs thorough review, I don't think anything but the new functionality was tested. For instance, running sa-update without --install is currently broken. :( There are if statements checking for $opt{'install'} that should really be checking for @{$opt{'install'}}. I haven't tried it, but I don't see anything (although it is 4:50am) that prevents you from trying to do both an --install and a regular --channel update which, depending on what order you specify stuff, could work (if the non --install --channelS are listed last) or could royally screw up (if you say, and mean, --channel ... --channel ... --install ...) what versions sa-update thinks are installed of each of the channels (not to mention they'd have the wrong data). Also, if you're going to allow multiple updates to be --install'd at once (suggesting multiple channel source file sources) the requirement for 6 digits in the source files seems rather arbitrary. I think we're best off in the user-confusion-ability department by only allowing one channel to be --install'ed at a time (or at least not allowing regular channel updates simultaneously).
(In reply to comment #14) > r637451 needs thorough review, I don't think anything but the new functionality > was tested. For instance, running sa-update without --install is currently > broken. :( > > There are if statements checking for $opt{'install'} that should really be > checking for @{$opt{'install'}}. oops. fixed. > I haven't tried it, but I don't see anything (although it is 4:50am) that > prevents you from trying to do both an --install and a regular --channel update > which, depending on what order you specify stuff, could work (if the non > --install --channelS are listed last) or could royally screw up (if you say, > and mean, --channel ... --channel ... --install ...) what versions sa-update > thinks are installed of each of the channels (not to mention they'd have the > wrong data). urgh, that's pretty unpleasant (and possible). > Also, if you're going to allow multiple updates to be --install'd at once > (suggesting multiple channel source file sources) the requirement for 6 digits > in the source files seems rather arbitrary. yes, it is. ok, let me make that 3... > I think we're best off in the user-confusion-ability department by only > allowing one channel to be --install'ed at a time (or at least not allowing > regular channel updates simultaneously). what happens currently if you mix --channels? sa-update --channel foo1.site.com sa-update --channel foo1.site.com foo2.site2.org sa-update --channel foo2.site2.org if that works ok, then fine -- I think that's an acceptable way to do it. I agree the --channel --channel --install --install thing is a little confusing, and imposing a limitation of one --install at a time is fine by me.
(In reply to comment #15) > what happens currently if you mix --channels? > > sa-update --channel foo1.site.com > > sa-update --channel foo1.site.com foo2.site2.org This one will ignore foo2 if you don't insert --channel before it. > sa-update --channel foo2.site2.org > if that works ok, then fine -- I think that's an acceptable way to do it. I'm not sure how this is related. Are you questioning whether you can run sa-update separately for batches of channels? You can do that.
(In reply to comment #16) > I'm not sure how this is related. Are you questioning whether you can run > sa-update separately for batches of channels? You can do that. yeah, I'm worried that if you run sa-update with a single channel after previously using multiple channels, it'll overwrite some state causing the other channels to be ignored. If that's not the case, then we're good.
(In reply to comment #17) > (In reply to comment #16) > > I'm not sure how this is related. Are you questioning whether you can run > > sa-update separately for batches of channels? You can do that. > > yeah, I'm worried that if you run sa-update with a single channel after > previously using multiple channels, it'll overwrite some state causing the > other channels to be ignored. If that's not the case, then we're good. We're good. There's no requirement to update all of the channels you are using at once. For instance, I attempt to update my locally generated channels every few minutes, but only attempt to update the default channel every few hours.
: jm 113...; svn commit -m "bug 5752: sa-update --install now can only be used with one --channel switch at a time" sa-update.raw Sending sa-update.raw Transmitting file data . Committed revision 638793.
(In reply to comment #12) > (In reply to comment #10) > > (In reply to comment #9) > > > Should there be a warning if the user attempts to use SA with no rules > > > installed? I presume so. > > > > Hrm. I want to say yes, but at the same time I know there are uses of SA to > > process messages and not scan them, so configs aren't necessarily needed. An > > argument, of course, would be to at least then always pass in something like > > "config_text => ''" -- if you don't need configs, make sure you don't have any. > > > > How about a warning/error from "spamassassin" and "spamd", but not the modules? > > works for me. this is now implemented. : jm 126...; svn commit -m "bug 5752: add a warning telling the user to run 'sa-update' in Mail::SpamAssassin, if no rules are found in the system config dir, which will be possible since we plan to no longer distribute rules in the basic tarball" Sending lib/Mail/SpamAssassin/Conf/Parser.pm Sending lib/Mail/SpamAssassin/Conf.pm Sending lib/Mail/SpamAssassin.pm Sending spamassassin.raw Sending spamd/spamd.raw Transmitting file data ..... Committed revision 648888. next step is to fix the "make dist"/"make tardist"/"make disttest" situations.
(recording a discussion from the mailinglist from back in May 2008:) In environments requiring strict change control or that are otherwise paranoid in terms of security, (direct) online updates are usually not desired (or not possible, because eg the firewall concept forbids mailservers from doing HTTP requests). Thus actually *requiring* online update would make deployment in such environments more complex (it would eg require repackaging with the rules prior to deployment). The same environments would also be rather reluctant to allow direct online updates without at least some form of sanity check (maybe not only lint, but also some messages being checked). A rules package from the same source as the app package is definitely acceptable (since if you trust the code from some repository, you can also apply the same level of trust to rules from the same place). This should apply to distro packagers, CPAN, and other channels.
(In reply to comment #21) > (recording a discussion from the mailinglist from back in May 2008:) > > In environments requiring strict change control or that are otherwise > paranoid in terms of security, (direct) online updates are usually not > desired (or not possible, because eg the firewall concept forbids > mailservers from doing HTTP requests). Do these establishments also veto the timely application of antivirus pattern updates? Or security updates on core infrastructure servers? A core rule update for spamassassin is akin to an antivirus detection patterns update. I'm sorry, but I don't think we should be pandering to the cluelessness of bureaucrats here. If they were truly paranoid about security they'd be getting SA updates on as fast as possible, automatically.
It's already decided -- we're going to distribute a tarball of rules, as described in comment #5, generated when the code tarball is generated. Then, later, there'll also be rule updates generated which can be used instead. The more paranoid can bundle the first tarball and skip the updates if they so desire.
(In reply to comment #23) > It's already decided -- we're going to distribute a tarball of rules, as > described in comment #5, generated when the code tarball is generated. Then, > later, there'll also be rule updates generated which can be used instead. The > more paranoid can bundle the first tarball and skip the updates if they so > desire. > With some AV products (and Windows Update) there is the ability to have a staging server. This is how most "Enterprise" solutions work. I wonder if it is possible to have something similar for SA. Of course it would be possible to just use rsync but then you need to notify spamd (or the process using the SA libraries). The probelm with a "home grown" solution is the supportablity of the script. I was involved in a recent upgrade (not SA) for a customer and there was two ways of doing it - editing a text file (copying it from another live system) or running the vendor supplied script and manually entering the settings. The change control prefered the slower second option because it was "supported". I haven't looked at it yet but would it be possible to modifiy sa-update to talk to a "proxy" daemon running on another system which then returns the updated files once they had been approved by an admin.
(In reply to comment #24) > I haven't looked at it yet but would it be possible to modifiy sa-update to > talk to a "proxy" daemon running on another system which then returns the > updated files once they had been approved by an admin. This is an exact use case of publishing your own channels internally.
(In reply to comment #25) > This is an exact use case of publishing your own channels internally. Yes - but this is the same issue as the "supported" vs "non-supported" implication. This would be an internally developed solution and not something supported by the "vendor". It is the process of publishing an internal channel based on the external. I am suggesting that a daemon is developed that runs sa-update as normal and then creates the channel for internal use. This would be the supported update process in exactly the same way as Microsoft SUS works.
with open source code like SA, I think it's reasonable to assume that _some_ level of homegrown automation/scripting will be required to do that. rsyncing the contents of /etc/mail/spamassassin and /var/lib/spamassassin from a staging server certainly makes the most sense to me. IMO there are too many possible aspects to an "approval" workflow which could vary from organisation to organisation, to make any such mechanism a standard part of sa-update's operation...
I did this a while back, then lost the pending commit due to hosed laptop :( not making the same mistake again! So this checkin is an implementation of what's described in comment 11 -- the rules are no longer part of the main "make dist" tarball. That is created (and can pass its tests) with no rules. : 92...; svn commit -m "bug 5752: MAJOR CHANGE FOR 3.3.0: we no longer install the rules to /usr/share/spamassassin along with the code; admins must run 'sa-update' immediately after installing" Sending MANIFEST Sending MANIFEST.SKIP Sending Makefile.PL Sending t/SATest.pm Sending t/data/01_test_rules.cf Adding t/data/01_test_rules.pre Sending t/dcc.t Transmitting file data ....... Committed revision 730843. Here's what's left in /usr/share/spamassassin after make install -- : 39...; l /usr/local/share/spamassassin/ languages sa-update-pubkey.txt user_prefs.template if you run spamassassin: : 57...; spamassassin --lint config: no rules were found! Do you need to run 'sa-update'? We now need to add a way to make a rules-dist tarball; ie. a .tgz containing just the rules, in sa-update --install format. or at least a way to generate one of these at release time, and put it up on www.apache.org/dist/spamassassin/ alongside the code tarball.
(In reply to comment #28) > We now need to add a way to make a rules-dist tarball; ie. a .tgz containing > just the rules, in sa-update --install format. or at least a way to generate > one of these at release time, and put it up on > www.apache.org/dist/spamassassin/ alongside the code tarball. finally done! : 108...; svn commit -m "bug 5752: get rid of rules in the main distribution tarball; we now have a new tarball for rules, alongside the main source tarball, or sa-update can be used to dl the freshest ruleset (more likely)." Sending build/README Adding build/update_devel_rules Sending build/update_stable Transmitting file data ... Committed revision 788192. maybe we could do an alpha soon, now that this is finally out of the way....
closing