Bug 5751 - sa-update breaks SA installation if GPG validation of the base ruleset fails on first run
Summary: sa-update breaks SA installation if GPG validation of the base ruleset fails ...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: sa-update (show other bugs)
Version: 3.2.3
Hardware: Other other
: P2 major
Target Milestone: 3.4.1
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-17 13:58 UTC by Warren Togami
Modified: 2015-04-13 22:37 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Warren Togami 2007-12-17 13:58:21 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=392851
Several users are reporting that sa-update, with GPG signature check failing,
left an incomplete rules directory.  When spamassassin tries to use that
directory later it fails and passes mail with no checks.

Two bugs here?
1) sa-update should not leave a broken rules directory if it fails.  I have no
idea how people are getting broken directories though, because I have never seen
this happen with Fedora's default packaging and I have never used sa-update
--import myself.
2) spamassassin shouldn't attempt to use rules dir that is missing certain key
files.
Comment 1 Justin Mason 2007-12-17 14:03:17 UTC
(In reply to comment #0)
> 1) sa-update should not leave a broken rules directory if it fails.  I have no
> idea how people are getting broken directories though, because I have never seen
> this happen with Fedora's default packaging and I have never used sa-update
> --import myself.

yeah, it's odd alright.  could it be from deleting the sa-update keyring dir in
/etc/mail/spamassassin/sa-update-keys?
Comment 2 Warren Togami 2007-12-17 14:11:26 UTC
The RPM or scripts are not touching or doing anything special with
/etc/mail/spamassassin/sa-update-keys.
Comment 3 Sidney Markowitz 2007-12-17 15:52:25 UTC
Targeting for 3.2.4
Comment 4 Justin Mason 2007-12-18 05:07:25 UTC
(In reply to comment #2)
> The RPM or scripts are not touching or doing anything special with
> /etc/mail/spamassassin/sa-update-keys.

yeah -- I'm thinking a user might have accidentally deleted it...
Comment 5 bugzilla-spamassassin 2007-12-18 06:21:38 UTC
I have added some more detailed analysis to the original bug report at
https://bugzilla.redhat.com/show_bug.cgi?id=392851

A specific REPRODUCIBLE case of such error is given there. A more general
reproducible error case is as follows:

On a fresh install, run sa-update with two "--channel" inputs with one
corresponding to the base ruleset (e.g., updates.spamassassin.org) but without a
valid gpg key and the other (e.g., saupdates.onprotect.com) corresponding to
some other supplementary ruleset but with a valid gpg key. The order doesn't
seem to matter.

Then, sa-update will happily create the /var/lib/spamassassin/3.002003 directory
(using Fedora locations) and populate it with the rules from the channel with
the valid gpg key while warning of a bad gpg key and failing to add the base
ruleset for updates.spamassassin.org. The non-expert user likely won't
see/understand/recognize the warning and will be left with a spamassassin
configuration without any valid base rules since once the
/var/lib/spamassassin/3.002003 directory is created, it overrides the other
locations for the original non-updated rules (e.g., /usr/share/spamassassin)

Not sure though what the best way to fix this is since technically sa-update is
not doing anything wrong. It is only just creating a non-working situation that
is obscure to most non-experts :)
Comment 6 Justin Mason 2007-12-18 06:49:51 UTC
(In reply to comment #5)
> On a fresh install, run sa-update with two "--channel" inputs with one
> corresponding to the base ruleset (e.g., updates.spamassassin.org) but without a
> valid gpg key and the other (e.g., saupdates.onprotect.com) corresponding to
> some other supplementary ruleset but with a valid gpg key. The order doesn't
> seem to matter.
> 
> Then, sa-update will happily create the /var/lib/spamassassin/3.002003 directory
> (using Fedora locations) and populate it with the rules from the channel with
> the valid gpg key while warning of a bad gpg key and failing to add the base
> ruleset for updates.spamassassin.org. The non-expert user likely won't
> see/understand/recognize the warning and will be left with a spamassassin
> configuration without any valid base rules since once the
> /var/lib/spamassassin/3.002003 directory is created, it overrides the other
> locations for the original non-updated rules (e.g., /usr/share/spamassassin)
> 
> Not sure though what the best way to fix this is since technically sa-update is
> not doing anything wrong. It is only just creating a non-working situation that
> is obscure to most non-experts :)

ah.  I see!

it could probably also be reproduced by doing sa-update with just an
additional ruleset channel, without the base ruleset's channel.  that
way, the /var/lib/spamassassin/3.002003 dir is created without any
valid base ruleset.
Comment 7 Theo Van Dinter 2007-12-18 08:40:01 UTC
(In reply to comment #5)
> Then, sa-update will happily create the /var/lib/spamassassin/3.002003 directory
> (using Fedora locations) and populate it with the rules from the channel with
> the valid gpg key while warning of a bad gpg key and failing to add the base
> ruleset for updates.spamassassin.org. The non-expert user likely won't
> see/understand/recognize the warning and will be left with a spamassassin
> configuration without any valid base rules since once the
> /var/lib/spamassassin/3.002003 directory is created, it overrides the other
> locations for the original non-updated rules (e.g., /usr/share/spamassassin)

Yes, of course.  That's how sa-update is supposed to work.  updates.spamassassin.org isn't a required 
channel, nor should it be (there are use cases where people want to just use their own channels, for 
instance).  I also don't think that the whole update should fail if any of the channels fail.

http://wiki.apache.org/spamassassin/RuleUpdates has this covered as the very top item in the FAQ 
section, btw.  For most people, if they're adding in a third party channel, they'll need to pay attention to 
the fact that they need the project's channel to install as well to get those rules.

IMO, this would be addressed by us not including any rules w/ the standard distro and requiring all 
rules to come from the update system.  It also fixes our issue whereby we have multiple areas for rules 
for the same SA version (ie: rules/branches/3.2 and branches/3.2/rules ...)
Comment 8 Justin Mason 2007-12-19 13:55:13 UTC
(In reply to comment #7)
> Yes, of course.  That's how sa-update is supposed to work. 
updates.spamassassin.org isn't a required 
> channel, nor should it be (there are use cases where people want to just use
their own channels, for 
> instance). 

after seeing the uses of rule updates in the field, I'd say it'd be more
appropriate if updates.spamassassin *was* a required channel, triggering
additional paranoia about leaving empty rules dirs behind.  We could possibly
add a command-line switch to disable this error-checking, but IMO it's better
than allowing PEBKAC errors so easily...

> http://wiki.apache.org/spamassassin/RuleUpdates has this covered as the very
top item in the FAQ 
> section, btw. 

is that linked from the *real* FAQ anywhere?

> For most people, if they're adding in a third party channel, they'll need to
pay attention to 
> the fact that they need the project's channel to install as well to get those
rules.

> IMO, this would be addressed by us not including any rules w/ the standard
distro and requiring all 
> rules to come from the update system.  It also fixes our issue whereby we have
multiple areas for rules 
> for the same SA version (ie: rules/branches/3.2 and branches/3.2/rules ...)

hmm.... might be worth considering this for 3.3.0.  how's about opening a bug to
discuss it?
Comment 9 Theo Van Dinter 2007-12-19 14:11:53 UTC
(In reply to comment #8)
> after seeing the uses of rule updates in the field, I'd say it'd be more
> appropriate if updates.spamassassin *was* a required channel, triggering
> additional paranoia about leaving empty rules dirs behind.  We could possibly
> add a command-line switch to disable this error-checking, but IMO it's better
> than allowing PEBKAC errors so easily...

I'm going to be pretty -1 about that.  The whole goal we were going for was to
separate engine from rules.  Requiring that people get our rules is a full 180
from the original goal.

I also think that the whole default rules dir/local state dir override thing is
a bit of a kluge and leads to problems like this.  Which gets me back to the
"only make rules available via sa-update" idea, which should make everyone
happy.  :)

> > http://wiki.apache.org/spamassassin/RuleUpdates has this covered as the very
> top item in the FAQ 
> > section, btw. 
> 
> is that linked from the *real* FAQ anywhere?

It is the real faq, for sa-update anyway.  It's in the man page. :)

I don't know if it's linked from the general SA FAQ.

> > IMO, this would be addressed by us not including any rules w/ the standard
> distro and requiring all 
> > rules to come from the update system.  It also fixes our issue whereby we have
> multiple areas for rules 
> > for the same SA version (ie: rules/branches/3.2 and branches/3.2/rules ...)
> 
> hmm.... might be worth considering this for 3.3.0.  how's about opening a bug to
> discuss it?

Sure. bug 5752. :)
Comment 10 bugzilla-spamassassin 2007-12-20 22:52:48 UTC
Well, I think part of the problem in my case (and in the likely use case) is
when someone has default rules installed in __def_rules_dir__  (in my case
/usr/share/spamassassin) but then gets only a partial update installed in
__local_state_dir__/spamassassin/__version__ (/var/lib/spamassassin/3.002003 in
my case). In fact, even if just the directory __version__ (3.002003) is created
then, suddenly all the rules in __def_rules_dir__ are overriden. This would
occur even if no rules are added to the directory (or in my case the rules that
were in  3.002003 all had errors because they were missing dependencies),
resulting in effectively NO RULE CHECKING.

Wouldn't a more forgiving approach be to first look in __version__ and then
continue to look in __def_rules_dir__ if the subdirectory
updates_spamassassin_org is not present.

Then, additionally, in order to prevent incomplete updates, the repository
subdirectories would not be added (or later changed) in __version__ if the
repository update is not successful. This would ensure that __def_rules_dir
would never be overriden unless a successful update occurred to
updates_spamassassin_org (or more generally the user-defined default repository).

To preserve separation of rules and engines, one could make a user-defined
variable assign which repository is the primary rule set for updates. The
default value of that variable would likely be updates_spamassassin_org

So, in my case, the failure of updates_spamassassin_org to update would leave
that directory missing from __version__ (as it was) so that spamassassin would
continue to look also at __def_rules_dir__ for the base rule set.

If you truly want to ignore all rules in __def_rules_dir__ then you could create
an empty directory updates_spamassassin_org (or whatever your default repository
is) so that spamassassin would think that the ruleset was updated successfully.

There are many possible variations on the above ideas but I would think that it
would be more robust than the current approach.
Comment 11 Theo Van Dinter 2007-12-21 08:20:39 UTC
(In reply to comment #10)
> Wouldn't a more forgiving approach be to first look in __version__ and then
> continue to look in __def_rules_dir__ if the subdirectory
> updates_spamassassin_org is not present.

Except that not having the SA update channel is a valid situation.  A simple fix
here IMO is to skip the local_state_dir if there are no usable files in it. 
Then it would DTRT.

Then for 3.3, there can be the whole discussion of requiring an update channel, etc.
Comment 12 Justin Mason 2007-12-21 08:30:35 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Wouldn't a more forgiving approach be to first look in __version__ and then
> > continue to look in __def_rules_dir__ if the subdirectory
> > updates_spamassassin_org is not present.
> 
> Except that not having the SA update channel is a valid situation.  A simple fix
> here IMO is to skip the local_state_dir if there are no usable files in it. 
> Then it would DTRT.

I don't think that fixes the issue in
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5751#c5 .
Comment 13 Theo Van Dinter 2007-12-21 09:16:09 UTC
(In reply to comment #12)
> I don't think that fixes the issue in
> http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5751#c5 .

Well, yes and no, and that's not what we were discussing. ;)

The issues in question are:
- if the local_state_dir is created, but nothing is in it, what do we do?
  Right now, if it exists, we use it.  The current sa-update only creates the
dir if it's about to extract channel files, and if that fails the directories
get removed -- see bug 4941 and sa-update starting around line 726.  So in
short, sa-update should never cause this situation.  If we are worried about
this situation though, the way to work around it, IMO, is to skip it.

- if the local_state_dir doesn't have updates.spamassassin.org in it, but has
other channels in it, what do we do?
  Our design says: using updates.spamassassin.org is optional, so use whatever
is available in the dir.  I'm going to be absolutely -1 on changing this -- we 
have to allow people to not use our rules if they don't want to.  We can highly
recommend they use them, but it's up to the user, and larger organizations will
want to be able to have their own channels available and not use ours.  We have
documented about adding channels and needing to use updates.spamassassin.org if
they expect the standard ruleset.  If we want to improve these docs to make it
more obvious, great.

- if one channel fails while we're updating multiple channels, what do we do?
  Right now, we let that channel fail and continue with the other channels. 
IMO, this is the right thing to do.  It does mean that people need to pay
attention if this is the first time they're running sa-update for a given
channel, and handle failures appropriately.  At the moment, I wouldn't be
opposed, if people wanted to, have sa-update be more verbose if it's trying to
get/install updates.spamassassin.org, and it fails, and other channels already
exist or other channels are in the queue to be updated.  It could probably be
added w/out making too much of a mess (ie: track the pass/fail per channel, then
at the end figure out the situation and warn appropriately -- hey, isn't this
part of bug 5043?). :)


As you may have guessed at this point, I'm not very sympathetic to people
burning themselves when playing with fire.  I think fire is important, and want
to make sure that if people want to, they can play with it.  I don't mind making
sure the warning signs are clear about the dangers of doing so, however.
Comment 14 bugzilla-spamassassin 2007-12-23 09:24:26 UTC
(In reply to comment #13)

> As you may have guessed at this point, I'm not very sympathetic to people
> burning themselves when playing with fire.  I think fire is important, and want
> to make sure that if people want to, they can play with it.  I don't mind making
> sure the warning signs are clear about the dangers of doing so, however.

Yes but I think there is another equally valid point here which is that a failed
(partial) update shouldn't leave you in a worse state. The first priority with
failed updates should be "DO NO HARM"
The problem in my mind is that even the most trivial addition to __version__ as
part of a failed multi-part run of sa-update causes ALL the default rules in
__def_rules_dir__ to be overriden leaving spamassassin in a broken state.

IMHO, this is a rather unnatural result in that one would naively expect that
updates would fail gracefully in that old rules would only be overriden to the
extent that they are replaced (or intentionally not replaced).

I think a better answer would be to have the default case of a partial update
failure be that NONE of the updates occur unless you use a (new) command line
flag --force. A failed partial update would then give a clear warning to the
effect that "No update occurred since update xxx failed. If you want to allow
only a partial update to proceed then use the --force flag"

I think this "default do-no-harm" policy is much better than just adding more
warnings. In my specific case, the sa-update was run from a root-only readable
cron file that I didn't even realize was there (we can't all know everything
about our systems :) and its ouput was buried deep in some log file so I never
even knew a failed update occurred. The result is that without any warning I was
left with an inoperable spamassassin setup. With my suggestion, the default
would have been that no update occurred and at least spamassassin would still work.

Also, in my particular case, even the update portion that did occur didn't help
because those rules had dependencies on rules in updates.spamassassin.org so
allowing a partial update truly had no benefit at all.

IMHO, the desired behavior should be analogous to programs like 'rpm' where if
you specify multiple rpms to install/upgrade then the installation/upgrade only
occurs if ALL of them work unless you use the --nodeps or --force type flags to
override the default behavior if you really know what you are doing.

I'm not sure why anyone would object to this compromise solution since experts
like yourself could still proceed at their own risk and allow partial updates to
occur while less sophisticated users would at worse have a still working but not
updated spamassassin installation.
Comment 15 Tom Schulz 2007-12-26 06:56:41 UTC
> we have to allow people to not use our rules if they don't want to.  We can
> highly recommend they use them, but it's up to the user, and larger
> organizations will want to be able to have their own channels available and
> not use ours.

Let me add my 2 cents worth here.  I can understand the above requirement, but
I would expect that not using your rules would be an unusual situation.
Perhaps an extremely unusual situation.  I think that not using your rules
should require an explicit ok from the site manager.  I propose that there
should be a configuration option that must be set before not using your rules
would be allowed.  Without that option, the distributed rules would be used
if none are found in the update directory.  That way someone who wanted to
use updated additional rules but use your distributed rules without updates
could do so.
Comment 16 Justin Mason 2007-12-28 04:17:32 UTC
(In reply to comment #15)
> > we have to allow people to not use our rules if they don't want to.  We can
> > highly recommend they use them, but it's up to the user, and larger
> > organizations will want to be able to have their own channels available and
> > not use ours.
> 
> Let me add my 2 cents worth here.  I can understand the above requirement, but
> I would expect that not using your rules would be an unusual situation.
> Perhaps an extremely unusual situation.  I think that not using your rules
> should require an explicit ok from the site manager.  I propose that there
> should be a configuration option that must be set before not using your rules
> would be allowed.  Without that option, the distributed rules would be used
> if none are found in the update directory.  That way someone who wanted to
> use updated additional rules but use your distributed rules without updates
> could do so.

I really have to agree.   

I *do* agree that we need to support using SA (and sa-update) without requiring
that the default ruleset be used -- but in this case, using sa-update to
download thirdparty rulesets *without* also downloading updates to the default
ruleset is a very unusual case; I doubt there's anyone in the world yet doing this.

IMO, it'd make most sense if that was not permitted unless explicitly requested
using a new "--no-default-rules" switch or similar.
Comment 17 Theo Van Dinter 2007-12-28 17:40:30 UTC
(In reply to comment #16)
> I *do* agree that we need to support using SA (and sa-update) without requiring
> that the default ruleset be used -- but in this case, using sa-update to
> download thirdparty rulesets *without* also downloading updates to the default
> ruleset is a very unusual case; I doubt there's anyone in the world yet doing this.

A very obvious use case is when people want to download channels w/ different frequency, and so run 
sa-update once for updates.spamassassin.org, and then again at a different time for some other 
channel.

> IMO, it'd make most sense if that was not permitted unless explicitly requested
> using a new "--no-default-rules" switch or similar.

I'm against this idea for previously discussed reasons.
Comment 18 Theo Van Dinter 2007-12-28 17:51:19 UTC
(In reply to comment #14)
> Yes but I think there is another equally valid point here which is that a failed
> (partial) update shouldn't leave you in a worse state. The first priority with
> failed updates should be "DO NO HARM"

And that's what sa-update already does.  The issue here is that when running sa-update for a channel 
*the first time* when using multiple channels, people have to pay attention to whether there was a 
failure and if so, fix the issue and try again.  I believe all other situations are handled automatically, 
because they can be.

> I think a better answer would be to have the default case of a partial update
> failure be that NONE of the updates occur unless you use a (new) command line
> flag --force.

If someone runs sa-update w/ different channels (for example so that updates occur at different 
frequencies for different channels), then you have the same problem and no commandline option or 
sa-update logic will be able to solve the problem.

> I think this "default do-no-harm" policy is much better than just adding more
> warnings. In my specific case, the sa-update was run from a root-only readable
> cron file that I didn't even realize was there (we can't all know everything
> about our systems :) and its ouput was buried deep in some log file so I never
> even knew a failed update occurred.

That's a system administration problem.

> I'm not sure why anyone would object to this compromise solution since experts
> like yourself could still proceed at their own risk and allow partial updates to
> occur while less sophisticated users would at worse have a still working but not
> updated spamassassin installation.

IMO, bug 5193 (making sa-update more "atomic") would allow more for "fail a whole run if any part 
fails", if people want to go that way.  I'm not opposed to that kind of change if people feel it's the better 
way to go.  It does solve this specific situation, though not all possible similar situations, which I 
believe as previously described is dependent on the user.
Comment 19 Loren Wilton 2007-12-28 19:52:02 UTC
Suggestion to maybe get around the impasse here:

When SA is first installed (initially, or as an upgrade) a check is made to see 
if the updates directory exists.  If not, it is created and the rules supplied 
with the package are copied into this directory.  This should be done as part 
of the installation process, NOT as part of any normal SA activity.  

SA will no longer look in the original rules location for rules, only in the 
updates hierarchy. If the user does not use an SA update channel the original 
rules will remain functional, but SA will only have to look in one place for 
rules, not two places with an override on the first.  If the user is using an 
SA update channel, obviously updated rules will override the package rules in 
the normal manner; it will simply be the case of a newer update replacing an 
older update.

For the rare case of packagers of third party software that do not wish to use 
either the native SA rules nor SA rule updates, they only have to add a final 
installation step to remove the default SA rules from the update directory that 
were installed by an earlier installation step.

Note that it might be simpler all around to simply re-home the package rules to 
the updates directory in the first place, and remove the code that copies from 
the current rules location to the updates directory.  The one drawback I can 
see with doing this is that on an upgrade the package rules would/might 
override any current update SA rules in the updates directory.  OTOH rules tend 
to change between releases, so this might be a good thing.
Comment 20 Justin Mason 2007-12-29 03:12:50 UTC
(In reply to comment #19)
> SA will no longer look in the original rules location for rules, only in the 
> updates hierarchy. If the user does not use an SA update channel the original 
> rules will remain functional, but SA will only have to look in one place for 
> rules, not two places with an override on the first.  If the user is using an 
> SA update channel, obviously updated rules will override the package rules in 
> the normal manner; it will simply be the case of a newer update replacing an 
> older update.

yes, this is bug 5752.  I agree we should do that for 3.3.0, but we still have
this bug for 3.2.x.
Comment 21 Justin Mason 2008-01-01 12:42:52 UTC
moving on to 3.2.5, so 3.2.4 can be released
Comment 22 Justin Mason 2008-06-01 03:37:15 UTC
moving to 3.2.6 so that we can release a 3.2.5
Comment 23 tm 2008-09-21 03:31:49 UTC
Just a quick comment: In my SA logs, I sometimes find mentioning of syntax errors after running sa-update, even after restarting SA thereafter (do I need to?). The latest example is a problem with this rule: (in 80_additional.cf):

@4000000048d6216f360ad4cc [15045] warn: syntax error at /var/db/spamassassin/3.002004/updates_spamassassin_org/80_additional.cf, rule __TVD_BODY, line 13, near ";
@4000000048d6216f360d6cdc [15045] warn:  }"
@4000000048d6216f360ffd1c [15045] warn: /var/db/spamassassin/3.002004/updates_spamassassin_org/80_additional.cf, rule __TVD_BODY, has too many errors.


There is only a comment on line 13.


Help about how to debug such stuff would be great...

This is SpamAssassin 3.2.4 on OpenBSD 4.3, the official package.
Comment 24 Darxus 2011-10-05 19:44:24 UTC
Re-targeting from 3.2.6 to 3.4.0 (current trunk).
Comment 25 Kevin A. McGrail 2011-10-28 20:36:33 UTC
Moving to 3.4.1 to allow for a 3.4.0 release.
Comment 26 Kevin A. McGrail 2015-04-13 22:37:08 UTC
Closing.  sa-update being required to run has largely fixed this issue.