SA Bugzilla – Bug 4982
automatic generation of 3.1.x rule updates
Last modified: 2006-09-04 10:48:24 UTC
Currently, sa-update packages are auto-generated from SVN trunk; once the rule is checked in to a sandbox, everything proceeds on autopilot from there, which is great. However, the same is not true for the 3.1.x maintainance branch; unfortunately, good rules have to be manually backported to there (generally by Theo). This is suboptimal. (It's pretty silly, too, since what's the point of generating sa-update packages for svn trunk, when everyone who needs sa-update is running a release version instead? ;) It should be possible to check in a rule, and have it appear in the sa-update package for day+1 or day+2, in both 3.1.x and trunk packages. How's about these options? - (a) we write a script that takes a selection of rules, based on their mass-check results in trunk (the nightly mass-checks), and auto-promotes them on the assumption that they will have similar hitrates. It will not autopromote rules that fail a --lint with SA 3.1.x, or rules that require plugins. It will also have a hand-maintained exclusion list, so that we can cause rules to be excluded for other reasons based on regexp patterns in the rule text. - OR (b) we set up a small-scale nightly mass-check using 3.1.x, on the zone, in parallel with the existing trunk nightly mass-checks. We then generate 3.1.x sa-update packages from that. It doesn't have to be of the same scale as the "real" nightly mass-checks... just enough to provide useful hit-rate data for autopromotion. (I think I like this idea.) Any other suggestions?
Perhaps a combination of these options? Presumably in most cases that a rule works in both trunk and 3.1.x the hit rates should be pretty much the same. So you should be getting better classification data from a larger masscheck. So I think I'd do a), but to a provisional 3.1 rules base. This would then run through b) to be autopromoted to the real distribution location. There should probably also be some manual method of promoting so that something that hits well on the full masscheck but not as well on the small one can still be propmoted. This could be overkill, but I like the idea of a real masscheck (even a small one) over s simple --lint, since it would likely catch any pathelogical problems in the regex that only show up on the older version.
Loren -- good point, the combo works pretty well.
ok, I think I'd like to go ahead with the 'combo' idea. Theo -- since you're the main party working on 3.1.x updates -- have you any comments, before I get hacking?
Theo had some comments (via IM) but hasn't posted them yet. To summarise, before I forget -- iirc, he would prefer to see the 3.1.x branch being the main branch against which mass-checks and ruleqa measures accuracy and the updates are built, instead of trunk. This initially seems problematic, but with a bit of thought, maybe not so much. why should rule development measure against trunk, after all? for code development, that makes sense, but most of the rule development doesn't need to use the latest code. so I'm thinking that maybe the easiest thing to do is: 1. set up a 3.1.x-based mass-checks/ruleqa/mkupdates infrastructure, in parallel to the existing trunk one (creating duplicate infrastructure will be a lot easier than modifying the existing one to support two outputs instead of one) 2. switch many/most of the mass-checks over to the 3.1.x base 3. add a few extra mass-checks for trunk
This really is an enhancement for what we have today, so marking it as such. I also wanted to make clear in the list that this isn't a blocker for the 3.1 series. :)
since the rules backport is really outside the scope of the 3.1 branch, I'm going to move this ticket to the 3.2 queue.
some progress... Nightly mass-checks are now running against b3_1_0, sending logs to a new rsync dir called "stable-corpus", with a rule-QA app up and running at http://ruleqa-stable.spamassassin.org/ . Currently, only the bb-* mass-checks, running on the zone, are uploading log files for that. The rule-QA features that require rule metadata (select rules by path or mod date) do not work yet, because "build/mkrules" is required to generate that, and it's not part of 3.1.x. Updates are not being generated from this yet. next step is to convert 3.1.x to use build/mkrules, so that it can read sandbox rules from rulesrc/sandbox/*; this will require some work, since we currently have a lot of 3.2.x-specific rule data in "rulesrc/core" I think, and these may not be 3.1.x-compatible. We may need to move at least some of that back out of rulesrc, into the 3.2.x svn trunk tree. After that, if all goes well, we can start switching other mass-checks over to running against 3.1.x; it's basically a complete backport from trunk, so it should be easy to switch from one to the other (or set up a new parallel one).
pasting from dev list mail from Theo, so that it doesn't get lost: '> - should we switch 3.1.x to using the rules from rulesrc/core, same as > trunk is now doing? This may be too much of a change for > mid-maintainance cycle. > > - should rulesrc/core and rulesrc/lang be moved out of the "rulesrc" > external, back into the code's SVN repository, so that 3.1.x and > trunk can use different "core" rulesets? > > - or, should mkrules be hacked for 3.1.x to ignore files in rulesrc/core > and rulesrc/lang, and we have two different means to build the ruleset > for 3.1.x and 3.2.0? Well, my first thought is that I don't want 3.1 to have the same mkrules business as 3.2. I still don't really like the way it works, and was hoping we'd get it changed around before releasing 3.2. In relation to that, since we're just talking about 3.1 updates and not changing the 3.1 tarball rules directory (right?), I don't think it really matters what's in rulesrc. Let's just make sure the updates look appropriate before doing anything with them.'
after some hacking around with this, I'm giving up. I think we're just going to have to not generate automatic rule updates for 3.1.x, and instead just leave it to manual cutting and pasting. Basically, the changes required to bring the rulesrc external into the 3.1.x branch are just too great for a maintainance branch -- instead we should concentrate on getting 3.2.0 ready for release, with *it's* auto-update-generation all in order and ready to go.
uh, forgot to close the bug ;)