Bug 5752 - Let's get rid of the default rules dir and make sa-update mandatory
Summary: Let's get rid of the default rules dir and make sa-update mandatory
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P1 normal
Target Milestone: 3.3.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-19 14:11 UTC by Theo Van Dinter
Modified: 2009-06-24 14:55 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Theo Van Dinter 2007-12-19 14:11:42 UTC
Per discussion in bug 5751, I wanted to float the idea of getting rid of the
default rules dir (aka: rules, /usr/share/spamassassin, etc,) and making
sa-update mandatory.

There are really several reasons for this, but two main ones come to mind:

a) We have a project goal of having a separate engine and ruleset.  Right now,
you get both from the tarball, and it leads to the whole local_state_dir
override thing which causes many problems (see bug 5751).  It also forces people
to install our rules even if they don't want them, etc.

b) We have multiple areas that need updating when rules/scores/etc need
changing.  ie: spamassassin/rules/branches/3.2 and
spamassassin/branches/3.2/rules.  Do we update both of them?  Just the updates
area?  Do we add new rules to both or just for updates?

I think getting rid of the default rules dir, not distributing rules w/ the
tarball, and forcing people to use sa-update to get their rules (and then,
optionally, updates,) solves these issues, makes it more simple, and helps users
avoid common problems.
Comment 1 Daryl C. W. O'Shea 2007-12-19 14:16:46 UTC
+1

The current setup of maintaining multiple ruleset versions is a mess and a pain
to deal with.  I recall, though, that someone was against this when we discussed
it for 3.2.  Hopefully they've changed their mind.
Comment 2 Michael Parker 2007-12-19 14:24:58 UTC
I'm +1

I think when we originally talked about this there was concern from/for the
distribution packagers and how they were going to distribute things to folks who
shouldn't have to run sa-update.  I don't see why it can't be a separate
installable package though, so see no problem with this approach.
Comment 3 Dianne Skoll 2007-12-19 15:07:57 UTC
I'm OK with splitting out the rules from the code.  HOWEVER, I am definitely NOT
OK with forcing people to use sa-update.  As someone who packages SpamAssassin,
I'd want a way to download the rules as a tarball for packaging purposes.

We use the SpamAssassin engine and don't necessarily want our customers running
sa-update because we like to examine the rules before we give them to them.
Comment 4 Daryl C. W. O'Shea 2007-12-19 15:15:26 UTC
(In reply to comment #3)
> As someone who packages SpamAssassin,
> I'd want a way to download the rules as a tarball for packaging purposes.

There already is a way to get a tarball of the rules... wget, or whatever, the
sa-update update tarball.  Or you can get the rules from SVN.
Comment 5 Justin Mason 2007-12-20 01:48:59 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > As someone who packages SpamAssassin,
> > I'd want a way to download the rules as a tarball for packaging purposes.
> 
> There already is a way to get a tarball of the rules... wget, or whatever, the
> sa-update update tarball.  Or you can get the rules from SVN.

Installation's a bit of a bummer though right now. :(

here's a proposal.  Currently, sa-update does this for each channel:

- nslookup the TXT record for the current version

- nslookup/HTTP get the mirrors list

- HTTP get the appropriate .tar.gz file for that version under one of the mirrors

- GPG verify the package

- unpack the tar.gz in a tmp dir

- lint check it

- extract into /var/lib/spamassassin, using a good deal of paranoia

In other words, it's doing two main tasks here; the first 3 steps are the
downloading of the appropriate package from our server via HTTP, and the
latter 4 steps are an appropriately-paranoid extraction step.

Why don't we add a switch to sa-update ("--install"?) which, once given a .tar.gz
and its .asc and .sha1 verification stamps, will install that ruleset.
That way, people who don't want to require net use for the initial setup
can bundle those 3 files, and run "sa-update --install foo.tar.gz foo.tar.gz.asc
foo.tar.gz.sha1" at the end of "make install".

Who knows, maybe it'd be appropriate for *us* to bundle one of these in our
release tarballs, too...
Comment 6 Dianne Skoll 2007-12-20 04:03:16 UTC
(In reply to comment #5)

> That way, people who don't want to require net use for the initial
> setup can bundle those 3 files, and run "sa-update --install
> foo.tar.gz foo.tar.gz.asc foo.tar.gz.sha1" at the end of "make
> install".

This is a good idea.  Right now, we generate "spamassassin" debs and rpms.
We really need a way to have a "spamassassin-rules" deb or rpm that can
be installed using the normal package manager.  In order to generate such
a package, we need a nice way to just populate the rules directories with
a bunch of rulesets.  In order to keep it repeatable, we really don't want
to run "sa-update" each time because who knows which version of the rules
we'll get?

Regards,

David.
Comment 7 Justin Mason 2008-01-02 15:04:22 UTC
(In reply to comment #5)
> Who knows, maybe it'd be appropriate for *us* to bundle one of these in our
> release tarballs, too...

devs: what are your opinions on this?
Comment 8 Justin Mason 2008-02-04 07:52:18 UTC
upping priority, since we definitely want to do this for 3.3.0.
Comment 9 Justin Mason 2008-03-15 10:57:26 UTC
working on this.

Should there be a warning if the user attempts to use SA with no rules installed?  I presume so.
Comment 10 Theo Van Dinter 2008-03-15 11:40:47 UTC
(In reply to comment #9)
> Should there be a warning if the user attempts to use SA with no rules
> installed?  I presume so.

Hrm.  I want to say yes, but at the same time I know there are uses of SA to process messages and not scan them, so configs aren't necessarily needed.  An argument, of course, would be to at least then always pass in something like "config_text => ''" -- if you don't need configs, make sure you don't have any.

How about a warning/error from "spamassassin" and "spamd", but not the modules?
Comment 11 Justin Mason 2008-03-15 11:41:46 UTC
: jm 554...; svn commit -m "bug 5752: add sa-update --install switch, to allow installation of already-downloaded rule update tarballs without performing a download"
Sending        sa-update.raw
Transmitting file data .
Committed revision 637451.


next things to do: 

- remove the rules files from the dist tarballs (but leave language files? other stuff in /usr/share/spamassassin that isn't in the updates)

- add warning in Mail::SpamAssassin if no rules are found

- fix "make disttest" to get the rules somehow


BTW, regarding (b) from the original comment 1:

'b) We have multiple areas that need updating when rules/scores/etc need
changing.  ie: spamassassin/rules/branches/3.2 and
spamassassin/branches/3.2/rules.  Do we update both of them?  Just the updates
area?  Do we add new rules to both or just for updates?'

I suggest we get rid of spamassassin/rules/branches/3.2, and instead start building updates from spamassassin/branches/3.2/rules.  that means that dev is simpler, since we have a working tree at all times during development; it's only in the distribution tarball and after that that we use /var/lib.
Comment 12 Justin Mason 2008-03-15 11:43:50 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Should there be a warning if the user attempts to use SA with no rules
> > installed?  I presume so.
> 
> Hrm.  I want to say yes, but at the same time I know there are uses of SA to
> process messages and not scan them, so configs aren't necessarily needed.  An
> argument, of course, would be to at least then always pass in something like
> "config_text => ''" -- if you don't need configs, make sure you don't have any.
> 
> How about a warning/error from "spamassassin" and "spamd", but not the modules?

works for me.

note however that we already have something that partially does this -- if you don't have a "Check" plugin registered, lint or check will produce a fatal error.  so when using the modules directly you have to add a "miniature" config to load the Check plugin.  So adding this to the modules might actually *increase* the usability a little anyway!

people who are using the modules directly (Mark M, David) -- what do you guys think?

Comment 13 Mark Martinec 2008-03-15 15:46:24 UTC
> people who are using the modules directly (Mark M, David) -- what do
> you guys think?

If I understand this correctly, these changes won't affect amavisd-new.
I don't provide packages, don't provide amavis -specific SA rules
or .pre files, or provide a modified SpamAssassin. So people just
install SpamAssassin any way they find natural (CPAN, ports, packages,
whatever) following its documentation, and if it says sa-update must
be executed, or a port/package maintainer does it for them, there is
no problem. If it happens there are no rules, and either a ->check or
a Mail::SpamAssassin->new dies or always produces empty results, that
is fine by me (I slightly prefer dieing, it makes the problem immediately
apparent).
Comment 14 Daryl C. W. O'Shea 2008-03-18 01:53:26 UTC
r637451 needs thorough review, I don't think anything but the new functionality was tested.  For instance, running sa-update without --install is currently broken. :(

There are if statements checking for $opt{'install'} that should really be checking for @{$opt{'install'}}.

I haven't tried it, but I don't see anything (although it is 4:50am) that prevents you from trying to do both an --install and a regular --channel update which, depending on what order you specify stuff, could work (if the non --install --channelS are listed last) or could royally screw up (if you say, and mean, --channel ... --channel ... --install ...) what versions sa-update thinks are installed of each of the channels (not to mention they'd have the wrong data).

Also, if you're going to allow multiple updates to be --install'd at once (suggesting multiple channel source file sources) the requirement for 6 digits in the source files seems rather arbitrary.


I think we're best off in the user-confusion-ability department by only allowing one channel to be --install'ed at a time (or at least not allowing regular channel updates simultaneously).
Comment 15 Justin Mason 2008-03-18 03:40:50 UTC
(In reply to comment #14)
> r637451 needs thorough review, I don't think anything but the new functionality
> was tested.  For instance, running sa-update without --install is currently
> broken. :(
> 
> There are if statements checking for $opt{'install'} that should really be
> checking for @{$opt{'install'}}.

oops. fixed.

> I haven't tried it, but I don't see anything (although it is 4:50am) that
> prevents you from trying to do both an --install and a regular --channel update
> which, depending on what order you specify stuff, could work (if the non
> --install --channelS are listed last) or could royally screw up (if you say,
> and mean, --channel ... --channel ... --install ...) what versions sa-update
> thinks are installed of each of the channels (not to mention they'd have the
> wrong data).

urgh, that's pretty unpleasant (and possible).

> Also, if you're going to allow multiple updates to be --install'd at once
> (suggesting multiple channel source file sources) the requirement for 6 digits
> in the source files seems rather arbitrary.

yes, it is.  ok, let me make that 3...
 
> I think we're best off in the user-confusion-ability department by only
> allowing one channel to be --install'ed at a time (or at least not allowing
> regular channel updates simultaneously).

what happens currently if you mix --channels?

sa-update --channel foo1.site.com

sa-update --channel foo1.site.com foo2.site2.org

sa-update --channel foo2.site2.org


if that works ok, then fine -- I think that's an acceptable way to do it.  I agree the --channel --channel --install --install thing is a little confusing, and imposing a limitation of one --install at a time is fine by me.
Comment 16 Daryl C. W. O'Shea 2008-03-18 10:14:14 UTC
(In reply to comment #15)
> what happens currently if you mix --channels?
> 
> sa-update --channel foo1.site.com
> 
> sa-update --channel foo1.site.com foo2.site2.org

This one will ignore foo2 if you don't insert --channel before it.

> sa-update --channel foo2.site2.org

> if that works ok, then fine -- I think that's an acceptable way to do it.

I'm not sure how this is related.  Are you questioning whether you can run sa-update separately for batches of channels?  You can do that.
Comment 17 Justin Mason 2008-03-18 10:22:38 UTC
(In reply to comment #16)
> I'm not sure how this is related.  Are you questioning whether you can run
> sa-update separately for batches of channels?  You can do that.

yeah, I'm worried that if you run sa-update with a single channel after previously using multiple channels, it'll overwrite some state causing the other channels to be ignored.  If that's not the case, then we're good.
Comment 18 Daryl C. W. O'Shea 2008-03-18 10:34:10 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > I'm not sure how this is related.  Are you questioning whether you can run
> > sa-update separately for batches of channels?  You can do that.
> 
> yeah, I'm worried that if you run sa-update with a single channel after
> previously using multiple channels, it'll overwrite some state causing the
> other channels to be ignored.  If that's not the case, then we're good.

We're good.  There's no requirement to update all of the channels you are using at once.  For instance, I attempt to update my locally generated channels every few minutes, but only attempt to update the default channel every few hours.
Comment 19 Justin Mason 2008-03-19 04:24:38 UTC
: jm 113...; svn commit -m "bug 5752: sa-update --install now can only be used with one --channel switch at a time" sa-update.raw
Sending        sa-update.raw
Transmitting file data .
Committed revision 638793.
Comment 20 Justin Mason 2008-04-16 15:38:12 UTC
(In reply to comment #12)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > Should there be a warning if the user attempts to use SA with no rules
> > > installed?  I presume so.
> > 
> > Hrm.  I want to say yes, but at the same time I know there are uses of SA to
> > process messages and not scan them, so configs aren't necessarily needed.  An
> > argument, of course, would be to at least then always pass in something like
> > "config_text => ''" -- if you don't need configs, make sure you don't have any.
> > 
> > How about a warning/error from "spamassassin" and "spamd", but not the modules?
> 
> works for me.

this is now implemented.


: jm 126...; svn commit -m "bug 5752: add a warning telling the user to run 'sa-update' in Mail::SpamAssassin, if no rules are found in the system config dir, which will be possible since we plan to no longer distribute rules in the basic tarball"
Sending        lib/Mail/SpamAssassin/Conf/Parser.pm
Sending        lib/Mail/SpamAssassin/Conf.pm
Sending        lib/Mail/SpamAssassin.pm
Sending        spamassassin.raw
Sending        spamd/spamd.raw
Transmitting file data .....
Committed revision 648888.


next step is to fix the "make dist"/"make tardist"/"make disttest" situations.
Comment 21 Matthias Leisi 2008-07-06 01:55:30 UTC
(recording a discussion from the mailinglist from back in May 2008:)

In environments requiring strict change control or that are otherwise
paranoid in terms of security, (direct) online updates are usually not
desired (or not possible, because eg the firewall concept forbids
mailservers from doing HTTP requests).

Thus actually *requiring* online update would make deployment in such
environments more complex (it would eg require repackaging with the
rules prior to deployment).

The same environments would also be rather reluctant to allow direct
online updates without at least some form of sanity check (maybe not only
lint, but also some messages being checked).

A rules package from the same source as the app package is definitely
acceptable (since if you trust the code from some repository, you 
can also apply the same level of trust to rules from the same place).

This should apply to distro packagers, CPAN, and other channels. 
Comment 22 Phil Randal 2008-07-06 02:46:05 UTC
(In reply to comment #21)
> (recording a discussion from the mailinglist from back in May 2008:)
> 
> In environments requiring strict change control or that are otherwise
> paranoid in terms of security, (direct) online updates are usually not
> desired (or not possible, because eg the firewall concept forbids
> mailservers from doing HTTP requests).

Do these establishments also veto the timely application of antivirus pattern updates?  Or security updates on core infrastructure servers?

A core rule update for spamassassin is akin to an antivirus detection patterns update.

I'm sorry, but I don't think we should be pandering to the cluelessness of bureaucrats here.

If they were truly paranoid about security they'd be getting SA updates on as fast as possible, automatically.

Comment 23 Justin Mason 2008-07-06 03:12:54 UTC
It's already decided -- we're going to distribute a tarball of rules, as described in comment #5, generated when the code tarball is generated.  Then, later, there'll also be rule updates generated which can be used instead.  The more paranoid can bundle the first tarball and skip the updates if they so desire.
Comment 24 Matt Hampton 2008-07-06 09:58:16 UTC
(In reply to comment #23)
> It's already decided -- we're going to distribute a tarball of rules, as
> described in comment #5, generated when the code tarball is generated.  Then,
> later, there'll also be rule updates generated which can be used instead.  The
> more paranoid can bundle the first tarball and skip the updates if they so
> desire.
> 

With some AV products (and Windows Update) there is the ability to have a staging server.  

This is how most "Enterprise" solutions work.  I wonder if it is possible to have something similar for SA.  

Of course it would be possible to just use rsync but then you need to notify spamd (or the process using the SA libraries).  

The probelm with a "home grown" solution is the supportablity of the script.  I was involved in a recent upgrade (not SA) for a customer and there was two  ways of doing it - editing a text file (copying it from another live system) or running the vendor supplied script and manually entering the settings.  

The change control prefered the slower second option because it was "supported".  

I haven't looked at it yet but would it be possible to modifiy sa-update to talk to a "proxy" daemon running on another system which then returns the updated files once they had been approved by an admin.

 
Comment 25 Theo Van Dinter 2008-07-06 11:02:44 UTC
(In reply to comment #24)
> I haven't looked at it yet but would it be possible to modifiy sa-update to
> talk to a "proxy" daemon running on another system which then returns the
> updated files once they had been approved by an admin.

This is an exact use case of publishing your own channels internally.
Comment 26 Matt Hampton 2008-07-06 11:38:55 UTC
(In reply to comment #25)
> This is an exact use case of publishing your own channels internally.

Yes - but this is the same issue as the "supported" vs "non-supported" implication.  This would be an internally developed solution and not something supported by the "vendor".

It is the process of publishing an internal channel based on the external.  I am suggesting that a daemon is developed that runs sa-update as normal and then creates the channel for internal use.  This would be the supported update process in exactly the same way as Microsoft SUS works.

 

Comment 27 Justin Mason 2008-07-07 02:45:10 UTC
with open source code like SA, I think it's reasonable to assume that _some_ level of homegrown automation/scripting will be required to do that.  rsyncing the contents of /etc/mail/spamassassin and /var/lib/spamassassin from a staging server certainly makes the most sense to me.

IMO there are too many possible aspects to an "approval" workflow which could vary from organisation to organisation, to make any such mechanism a standard part of sa-update's operation...
Comment 28 Justin Mason 2009-01-02 13:14:13 UTC
I did this a while back, then lost the pending commit due to hosed laptop :(  not making the same mistake again!

So this checkin is an implementation of what's described in comment 11 -- the rules are no longer part of the main "make dist" tarball.  That is created (and can pass its tests) with no rules.

: 92...; svn commit -m "bug 5752: MAJOR CHANGE FOR 3.3.0: we no longer install the rules to /usr/share/spamassassin along with the code; admins must run 'sa-update' immediately after installing"
Sending        MANIFEST
Sending        MANIFEST.SKIP
Sending        Makefile.PL
Sending        t/SATest.pm
Sending        t/data/01_test_rules.cf
Adding         t/data/01_test_rules.pre
Sending        t/dcc.t
Transmitting file data .......
Committed revision 730843.


Here's what's left in /usr/share/spamassassin after make install --

: 39...; l /usr/local/share/spamassassin/
languages             sa-update-pubkey.txt  user_prefs.template

if you run spamassassin:


: 57...; spamassassin --lint
config: no rules were found!  Do you need to run 'sa-update'?


We now need to add a way to make a rules-dist tarball; ie. a .tgz containing just the rules, in sa-update --install format.  or at least a way to generate one of these at release time, and put it up on www.apache.org/dist/spamassassin/ alongside the code tarball.
Comment 29 Justin Mason 2009-06-24 14:54:46 UTC
(In reply to comment #28)
> We now need to add a way to make a rules-dist tarball; ie. a .tgz containing
> just the rules, in sa-update --install format.  or at least a way to generate
> one of these at release time, and put it up on
> www.apache.org/dist/spamassassin/ alongside the code tarball.


finally done!

: 108...; svn commit -m "bug 5752: get rid of rules in the main distribution tarball; we now have a new tarball for rules, alongside the main source tarball, or sa-update can be used to dl the freshest ruleset (more likely)."
Sending        build/README
Adding         build/update_devel_rules
Sending        build/update_stable
Transmitting file data ...
Committed revision 788192.

maybe we could do an alpha soon, now that this is finally out of the way....
Comment 30 Justin Mason 2009-06-24 14:55:06 UTC
closing