Bug 7397 - SA-Update hasn't seen a lot of love lately
Summary: SA-Update hasn't seen a lot of love lately
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: sa-update (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 enhancement
Target Milestone: Future
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-01 21:47 UTC by Michael P
Modified: 2022-03-06 12:37 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Modified sa-update with enhanced support for channel options application/x-perl None Michael P [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Michael P 2017-04-01 21:47:21 UTC
Created attachment 5439 [details]
Modified sa-update with enhanced support for channel options

While looking at the sa-update methods, we see that the code hasn't had a lot of love lately, but even more to the point, it is constrained in it's ability to use multiple channels, when very specific channel requirements might/should be in place.

And while my Perl code is most likely horrible from lack of use lately, decided to look at updating it to used an enhanced methodology.

* More Flexible Channel mechanisms
* Try for 100% backwards compatibility
* Try not to completely re-write from scratch, to make easier adoption.

What I decided to do is use a commonly used concept..

/etc/spamassassin/channels.d/

Where individual channels could have their own configurations, yet allowing a simple native sa-update process and cron jobs properly retrieve all channels safely.

The reason for this, is that different channels might have different restrictions, eg.. some require, https vs http, some might use http auth, some might have differing GPG requirements, some might want IPv6 access etc.. Some might have to refresh mirrors more often, etc..

The idea is that a config file would look similar to this..

channel=updates.spamassassin.org
use=y
;use-ipv6=no
use-gpg=y
; You MUST specify 'trusted' keys in one of the following two options
gpgkey=26C900A46DD40CD5AD24F6D7DEE01987265FA05B 0C2B1D7175B852C64B3CDC716C55397824F434CE 5E541DC959CB8BAC7C78DFDC4056A61A5244EC45
;gpgkeyfile
# This should default to the channel name
gpghomedir=/etc/spamassassin/sa-update-keys
force-https=n
#use-auth-token=myuser
; watch out for passwords containing '='
#use-auth-password=mypass
refresh-mirrors=n
allow-plugins=y

(of course, allowing if allowing plugins, it behooves the user of the channel to make sure that the channel rules are 'secure' and 'authentic)

I also chose to have all public keys stored in the same directory, eg /usr/share/spamassassin, but with a naming convention that reflects the actual channel name, except of course for the legacy key for updates.spamassassin.org

I also chose to prefer to have gpghomedir based on the channel name, and to be located under /etc/spamassassin/sa-update-keys/

And while sa-update should really be rebuilt from scratch, by someone who is a much better perl programmer than I, I thought I would share my working version of an updated sa-update program, for what it is worth.

Only thing left is to improve the cron jobs to accommodate, so that it can pre-compile other channels rules as well as the native rules.

Please dont' criticize the code too heavily, but take the concept and the working prototype for what it is meant, a contribution, and a step forward to make it easier for those that test/develop rule sets, to use multiple channels much more safely.  And of course, I didn't test it on other platforms (eg, windows)
Comment 1 Henrik Krohns 2019-06-15 19:42:59 UTC
Retargeting for later review
Comment 2 Michael P 2019-06-17 13:52:37 UTC
For the record, our version of sa-update deals with a lot of the outstanding enhancement requests, but is not yet production ready.. eg, help docs, pretty up the code etc, but has been proven to work for the last year in all our production environments.  

However, would be nice to get some comments, to be truthful sa-update tries to do too much.  I would suggest discussion on whether it should be broken out into sa-setup and sa-update.  

While it may seem obvious, and probably did in the past, that an update should happen whether the new dataset of rules are on the local file system, (eg a developer testing or during initial setup or one server pulling updates for a cluster) vs doing an http 'pull' from a remote mirror, I would suggest that given the differences in behaviour that these two actions be separated.

I think that decision should be made before updating the current sa-update code for 4.0
Comment 3 Henrik Krohns 2019-06-17 15:19:54 UTC
Do you have a patch that applies cleanly to latest trunk? Your file is quite old 3.3.2, many small changes after that..

( waiting for KAM to ask signing CLAs.. ;-) )

Not against the idea completely. Some distributions have something similar.

Then again I have to think about how many people use extra channels, and how many of those actually feel the current sa-update is lacking something? Does running separate sa-update per channel have issues? Also I'm not sure what you think "sa-setup" should perform?

Just gauging if it's worth the effort to see this through (there's really not many active maintainers right now, if even that..). But if it's a clean 100% backwards compatible patch then maybe why not. Just a new option that uses the dir contents and nothing else.

We could try to make 4.0.0 easier so perhaps more people would create update channels. One problem is that sa-update requires DNS records to check versions etc, way too much work to create a channel, atleast we need to provide "pure http" option.

I do have some scepticism about channels in general. The few that are out there are pretty localized and nothing but bunch of fixed strings and uris, which could possibly be much more efficiently used as clamav signatures or DNS lists etc. Also the scoring is usually suspect (no masschecks) and sa-update has no option to adjust/limit scores on the fly (we should add this option).

Good bug for some thinking, already there's some options that should be added.
Comment 4 Kevin A. McGrail 2019-06-17 23:58:46 UTC
Michael submitted a CCLA and was planning to work on this a bit this week too.  Question: Does a CCLA cover anyone for the firm and we give them HasCLA flags?
Comment 5 Kevin A. McGrail 2019-06-20 18:58:32 UTC
(In reply to Kevin A. McGrail from comment #4)
> Michael submitted a CCLA and was planning to work on this a bit this week
> too.  Question: Does a CCLA cover anyone for the firm and we give them
> HasCLA flags?

NOTE: I have reached out to the Secretary to confirm this is OK.
Comment 6 Kevin A. McGrail 2019-06-20 19:01:50 UTC
Comment #2, I have marked Michael as HasCLA as that has been the process I know of.
Comment 7 Kevin A. McGrail 2019-06-20 20:58:57 UTC
Note from discussions with Secretary, this is ok but to have commit privileges, they need an ICLA as well.
Comment 8 Michael P 2019-06-24 14:03:27 UTC
Before I set out to complete this, as mentioned I would like a little feedback.

I propose changing sa-update to sa-update-local and sa-update-remote.  There are uses for SA to be able to take local copies of rulesets, and activate them, eg on first installation, for testing/debugging, and in cases where there is a cluster, and only one system downloads the rulesets for other systems to use, but the purpose differs enough from downloading rules from remote mirrors, that they should be different.  This would also ensure that the code is more maintanable going forward.

Package Maintainers can ensure that sa-update-local is run on installation, but that cron jobs run sa-update-remote.

I would also suggest that we deprecate support for older GPG, package maintainers for new versions of SA should ensure that the latest is being used.

I also propose that going forward ONLY the channels.d configuration be used, rather than the older method of accepting just a single channel file structure. 

This way, the 'import GPG key' funtionality could also be removed from sa-update-remote.  Updated GPG keys should really only be provided by package maintainers.  

Please provide comments, if you disagree, as this is the direction I would like to head.  By shipping sa-update-remote as a 'new' script, it will also help ensure a simpler migration path.
Comment 9 Kevin A. McGrail 2019-06-24 14:15:32 UTC
> I propose changing sa-update to sa-update-local and sa-update-remote.  

Big +1.  Love this idea.

> I would also suggest that we deprecate support for older GPG

-1 or at least a 0.  I would say you need to support the version shipped with common OSes likely as far back as at least CentOS 6 and take a look at some of the LTS OSes.  Is there a security issue solved by this that you can point out?

> I also propose that going forward ONLY the channels.d configuration be used, rather than the older method of accepting just a single channel file structure. This way, the 'import GPG key' funtionality could also be removed from sa-update-remote.  

-1.  There are rulesets being run today using this feature where the keys are installed by admins. 

-KAM
Comment 10 Henrik Krohns 2019-06-24 14:27:18 UTC
(In reply to Kevin A. McGrail from comment #9)
> > I propose changing sa-update to sa-update-local and sa-update-remote.  
> 
> Big +1.  Love this idea.

-1 hate it. Probably just results in duplicate code for no reason.
Comment 11 Henrik Krohns 2019-06-24 14:37:14 UTC
(In reply to Michael P from comment #8)
> There
> are uses for SA to be able to take local copies of rulesets, and activate
> them, eg on first installation, for testing/debugging, and in cases where
> there is a cluster, and only one system downloads the rulesets for other
> systems to use, but the purpose differs enough from downloading rules from
> remote mirrors, that they should be different.  This would also ensure that
> the code is more maintanable going forward.

How does this differ from already existing sa-update --install command?

Nothing prevents package maintainers from already doing all this. Or clusters doing this. Or local networks rsyncing locally whatever they need.

Code being more maintanable is non-issue, since it can be always rewritten if someone wants, but I don't see any need to change the semantics how sa-update works right now. Except the new optional --channels.d operation that would use a new directory structure for configuration. There is no reason to break backwards compatibility for something that works quite fine as it is.
Comment 12 RW 2019-06-25 19:56:10 UTC
(In reply to Michael P from comment #8)

> I also propose that going forward ONLY the channels.d configuration be used,
> rather than the older method of accepting just a single channel file
> structure. 

IMO the best way to use sa-update on multiple channels is to run it on one channel at a time. That way the shell script knows which channels have updated and which have failed. For example it may be useful to delete minor channels failing lint automatically rather than have them hold-up updates to core rules. I'd rather not lose this flexibility.

There are also other uses for command-line overrides, such as trouble-shooting, or downloading  a channel just to look at the rules.
Comment 13 Michael P 2019-06-25 22:06:38 UTC
Expect a flurry of comments over the next few days, unless you think there is a better place to discuss these things instead of in the bug ticket itself.

sa-update has code to try and figure out which version of SA is running, and then alter it's behavior based on that.  I don't get it.  Should not it simply have the same version as SA itself defined at make time, and one simple safety.  If the version is not the same as the installed SA version, refuse to run?

The decision should be, is sa-update a part of SA or not? Does it not get updated with each SA version? Eg.. 

# set debug areas, if any specified (only useful for command-line tools)
$SAVersion =~ /^(\d+\.\d+)/;
if ($1+0 > 3.0) {
  $opt{'debug'} ||= 'all' if (defined $opt{'debug'});
} else {
  $opt{'debug'} = defined $opt{'debug'};
}

I would assume that if the sa-update matches the installed and running SA, that it would only have code for the running version, correct?

Can I deprecate things like this, and assume that we are only writing for the current version?
Comment 14 Michael P 2019-06-25 23:29:26 UTC
(In reply to RW from comment #12)
> (In reply to Michael P from comment #8)
> 
> > I also propose that going forward ONLY the channels.d configuration be used,
> > rather than the older method of accepting just a single channel file
> > structure. 
> 
> IMO the best way to use sa-update on multiple channels is to run it on one
> channel at a time. That way the shell script knows which channels have
> updated and which have failed. For example it may be useful to delete minor
> channels failing lint automatically rather than have them hold-up updates to
> core rules. I'd rather not lose this flexibility.
> 
> There are also other uses for command-line overrides, such as
> trouble-shooting, or downloading  a channel just to look at the rules.

The script (eg sa-remote-update) should not preclude the ability to circle through each of the channels, and the ones that fail lint will simply not get applied, and an error message sent to the admin, with the remaining channels being applied.  It should also not preclude some one from deciding to explicitly update channel by channel, should they choose to do so, instead of simply performing by default, do every channel in the channels.d directory, if they are set as active.
Comment 15 Henrik Krohns 2019-06-26 06:31:57 UTC
(In reply to Michael P from comment #13)
> The decision should be, is sa-update a part of SA or not? Does it not get
> updated with each SA version? Eg.. 

Of course.

> # set debug areas, if any specified (only useful for command-line tools)
> $SAVersion =~ /^(\d+\.\d+)/;
> if ($1+0 > 3.0) {
>   $opt{'debug'} ||= 'all' if (defined $opt{'debug'});
> } else {
>   $opt{'debug'} = defined $opt{'debug'};
> }
> 
> I would assume that if the sa-update matches the installed and running SA,
> that it would only have code for the running version, correct?

Yes.

> Can I deprecate things like this, and assume that we are only writing for
> the current version?

Yes.

It should not be a surprise that some 15 year old code snippets can come up from time to time. Things were very much different back then.
Comment 16 Michael P 2019-06-26 15:25:50 UTC
Discussion of Optimization Policy on MIRRORED.BY (deprecate?)

    # ---------------------------------
    # MP: Logic behind local MIRRORED.BY, to find working mirrors
    # 1) Use local file if it exists, and is not older than one week
    #    (Not sure why we have this optimization, given how light the
    #     query is. Reduce the chance of man in middle giving out 
    #     fake information?? Or a hack at a mirror, or malicious mirror?
    #     should make this safer)
    # 2) Use the information presented in a TXT record lookup
    #     (Again, maybe vulnerable to M-M DNS spoofing attack, might
    #      want to consider improving security in this regard. Also make
    #      a security note to channel maintainers, to find the location
    #      of the latest mirrors file, eg.. 
    #      host -t TXT mirrors.updates.spamassassin.org
    #      mirrors.updates.spamassassin.org descriptive text "http://spamassassin.apache.org/updates/MIRRORED.BY"
    # 2a)  Download that list, and store it locally (per channel)
    # 3)  Randomly order the mirror list(s)?? We could do something novel
    #      eg, order the mirror list by GEO proximity, or by a 'weight' 
    # 4)  Walk through each Mirror until we download a valid set of rules
    # Concerns: New Installs would not get updated mirrors for 7 days
    #           If a broken or hacked mirror existed, some people would 
    #           not get list updated for 7 days 

I would think that given how light a DNS query is for the TXT record, and how light the actual GET for the mirrors file, we don't REALLY need this optimization like we might have years ago.  IF we do want this optimization, I highly recommend it be dropped down to one (1) day only.
Comment 17 RW 2019-06-26 15:51:19 UTC
(In reply to Michael P from comment #16)

>     # 3)  Randomly order the mirror list(s)?? We could do something novel
>     #      eg, order the mirror list by GEO proximity, or by a 'weight' 

It's already weighted.


>     # 4)  Walk through each Mirror until we download a valid set of rules
>     # Concerns: New Installs would not get updated mirrors for 7 days
>     #           If a broken or hacked mirror existed, some people would 
>     #           not get list updated for 7 days 
> 
> IF we do want this optimization,
> I highly recommend it be dropped down to one (1) day only.

It's swings and roundabouts, if someone replaces the mirror file everyone gets attacked on day zero rather than just 1 in 7.
Comment 18 Michael P 2019-06-26 16:12:30 UTC
Sorry, yes.. maybe not clear in my notes, but it is already weighted, just this is an opportunity to consider if there is demand to change the behavior..


Should add a couple of notes:

* Channels can override the behavior of using the existing MIRRORED.BY even if less than seven(7) days. Or via the use of the CLI option --refreshmirrors.

* Even on first install, there is no reason for a package maintainer to ship a MIRRORED.BY, as it will always be stale, unless created via a post install etc.. Would recommend that it not be shipped by default.

* Channels will have their own MIRRORED.BY, just to be clear

* Channel maintainers may want different policies on how long to cache that information.

And yes, I considered that maybe there was concern that a MIRRORED.BY file get subverted, and that this was a poor man's method to limit the exposure to that case.. However, in contrast the risk of one 'bad mirror' existing, and people still using it for seven days, would at least offset the advantage of that caching.

I do think that by securing the information in the MIRRORED.BY, (future) eg, DNSSEC on the DNS entry, and forced HTTPS, and checksums, this can be mitigated in other/better ways.
Comment 19 Henrik Krohns 2019-08-27 08:46:58 UTC
(In reply to Henrik Krohns from comment #3)
> sa-update has no option to adjust/limit scores on the fly (we should add
> this option).

Fyi, added --score-multiplier, --score-limit options.

Sending        trunk/UPGRADE
Sending        trunk/sa-update.raw
Transmitting file data ..done
Committing transaction...
Committed revision 1865977.
Comment 20 Michael P 2019-08-27 22:14:44 UTC
Henrik, curious why this belongs in sa-update code base? Isn't it up to the channel to set scores, not the upstream SA? After all, if a channel sets unrealistic scores, it simply would not be something someone wants to use?  Feel free to take this offline, if it is easier to explain. Or reference the discussion that triggered this..
Comment 21 Kevin A. McGrail 2019-08-27 22:49:36 UTC
I think it makes sense.  KAM.cf, for example, is designed for systems with a threshold over 5.  And I might want to lower scores on a rule channel that I want to use but want to lower the weight of the scores.
Comment 22 Henrik Krohns 2019-08-28 04:17:25 UTC
What Kevin said.

It's just an expectation of 3rd parties using bad scoring or poison pill rules. No one is forced to use the few lines of extra code, but it's available if needed.

(In reply to Michael P from comment #20)
> After all, if a channel sets unrealistic scores, it simply would not be something someone wants to use?

With all respect, that's some seriously naive thinking. While official SA updates try hard to maintain reasonable quality and scoring, are you really expecting that 3rd parties do any serious mass-checking or never make mistakes? I've had my fair share of problems with ClamAV/Sanesecurity too..

Not going to deny that KAM.cf was an early inspiration for this. :-D I would most definitely use --score-limit ~2-3 for any external channel.
Comment 23 Henrik Krohns 2022-03-06 12:37:12 UTC
Postponing into future, due to lack of activity