Bug 4982 - automatic generation of 3.1.x rule updates
Summary: automatic generation of 3.1.x rule updates
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: sa-update (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P3 enhancement
Target Milestone: 3.2.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-13 09:25 UTC by Justin Mason
Modified: 2006-09-04 10:48 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2006-07-13 09:25:08 UTC
Currently, sa-update packages are auto-generated from SVN trunk; once the rule
is checked in to a sandbox, everything proceeds on autopilot from there, which
is great.

However, the same is not true for the 3.1.x maintainance branch; unfortunately,
good rules have to be manually backported to there (generally by Theo).
This is suboptimal.   

(It's pretty silly, too, since what's the point of generating sa-update packages
for svn trunk, when everyone who needs sa-update is running a release version
instead? ;)

It should be possible to check in a rule, and have it appear in the sa-update
package for day+1 or day+2, in both 3.1.x and trunk packages.


How's about these options?

- (a) we write a script that takes a selection of rules, based on their
mass-check results in trunk (the nightly mass-checks), and auto-promotes them on
the assumption that they will have similar hitrates.   It will not autopromote
rules that fail a --lint with SA 3.1.x, or rules that require plugins.  It will
also have a hand-maintained exclusion list, so that we can cause rules to be
excluded for other reasons based on regexp patterns in the rule text.

- OR (b) we set up a small-scale nightly mass-check using 3.1.x, on the zone, in
parallel with the existing trunk nightly mass-checks.   We then generate 3.1.x
sa-update packages from that.  It doesn't have to be of the same scale as the
"real" nightly mass-checks... just enough to provide useful hit-rate data for
autopromotion.  (I think I like this idea.)



Any other suggestions?
Comment 1 Loren Wilton 2006-07-13 17:16:22 UTC
Perhaps a combination of these options?

Presumably in most cases that a rule works in both trunk and 3.1.x the hit 
rates should be pretty much the same.  So you should be getting better 
classification data from a larger masscheck.

So I think I'd do a), but to a provisional 3.1 rules base.  This would then run 
through b) to be autopromoted to the real distribution location.  There should 
probably also be some manual method of promoting so that something that hits 
well on the full masscheck but not as well on the small one can still be 
propmoted.

This could be overkill, but I like the idea of a real masscheck (even a small 
one) over s simple --lint, since it would likely catch any pathelogical 
problems in the regex that only show up on the older version.
Comment 2 Justin Mason 2006-07-13 21:07:04 UTC
Loren -- good point, the combo works pretty well.
Comment 3 Justin Mason 2006-07-31 11:11:09 UTC
ok, I think I'd like to go ahead with the 'combo' idea.

Theo -- since you're the main party working on 3.1.x updates --
have you any comments, before I get hacking?
Comment 4 Justin Mason 2006-08-10 12:30:57 UTC
Theo had some comments (via IM) but hasn't posted them yet.  To summarise,
before I forget -- iirc, he would prefer to see the 3.1.x branch being the main
branch against which mass-checks and ruleqa measures accuracy and the updates
are built, instead of trunk.

This initially seems problematic, but with a bit of thought, maybe not so much.
 why should rule development measure against trunk, after all?  for code
development, that makes sense, but most of the rule development doesn't need to
use the latest code.

so I'm thinking that maybe the easiest thing to do is:

1. set up a 3.1.x-based mass-checks/ruleqa/mkupdates infrastructure, in parallel
to the existing trunk one (creating duplicate infrastructure will be a lot
easier than modifying the existing one to support two outputs instead of one)

2. switch many/most of the mass-checks over to the 3.1.x base

3. add a few extra mass-checks for trunk
Comment 5 Theo Van Dinter 2006-08-12 22:02:38 UTC
This really is an enhancement for what we have today, so marking it as such.  I
also wanted to make clear in the list that this isn't a blocker for the 3.1
series. :)
Comment 6 Theo Van Dinter 2006-08-20 18:30:21 UTC
since the rules backport is really outside the scope of the 3.1 branch, I'm
going to move this ticket to the 3.2 queue.
Comment 7 Justin Mason 2006-08-21 15:17:49 UTC
some progress...

Nightly mass-checks are now running against b3_1_0, sending logs to a new rsync
dir called "stable-corpus", with a rule-QA app up and running at
http://ruleqa-stable.spamassassin.org/ .  Currently, only the bb-* mass-checks,
running on the zone, are uploading log files for that.

The rule-QA features that require rule metadata (select rules by path or mod
date) do not work yet, because "build/mkrules" is required to generate that, and
it's not part of 3.1.x.

Updates are not being generated from this yet.

next step is to convert 3.1.x to use build/mkrules, so that it can read sandbox
rules from rulesrc/sandbox/*; this will require some work, since we currently
have a lot of 3.2.x-specific rule data in "rulesrc/core" I think, and these may
not be 3.1.x-compatible.  We may need to move at least some of that back out of
rulesrc, into the 3.2.x svn trunk tree.

After that, if all goes well, we can start switching other mass-checks over to
running against 3.1.x; it's basically a complete backport from trunk, so it
should be easy to switch from one to the other (or set up a new parallel one).
Comment 8 Justin Mason 2006-08-28 10:29:07 UTC
pasting from dev list mail from Theo, so that it doesn't get lost:

'> - should we switch 3.1.x to using the rules from rulesrc/core, same as
>   trunk is now doing?  This may be too much of a change for
>   mid-maintainance cycle.
> 
> - should rulesrc/core and rulesrc/lang be moved out of the "rulesrc"
>   external, back into the code's SVN repository, so that 3.1.x and
>   trunk can use different "core" rulesets?
> 
> - or, should mkrules be hacked for 3.1.x to ignore files in rulesrc/core
>   and rulesrc/lang, and we have two different means to build the ruleset
>   for 3.1.x and 3.2.0?

Well, my first thought is that I don't want 3.1 to have the same mkrules
business as 3.2.  I still don't really like the way it works, and was hoping
we'd get it changed around before releasing 3.2.

In relation to that, since we're just talking about 3.1 updates and not
changing the 3.1 tarball rules directory (right?), I don't think it really
matters what's in rulesrc.  Let's just make sure the updates look appropriate
before doing anything with them.'
Comment 9 Justin Mason 2006-09-04 13:31:21 UTC
after some hacking around with this, I'm giving up.  I think we're just going to
have to not generate automatic rule updates for 3.1.x, and instead just leave it
to manual cutting and pasting.  Basically, the changes required to bring the
rulesrc external into the 3.1.x branch are just too great for a maintainance
branch -- instead we should concentrate on getting 3.2.0 ready for release, with
*it's* auto-update-generation all in order and ready to go.
Comment 10 Justin Mason 2006-09-04 17:48:24 UTC
uh, forgot to close the bug ;)