Bug 6997 - [FR] merge bayes database(s)
Summary: [FR] merge bayes database(s)
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Tools (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-03 12:39 UTC by ildar.mulyukov
Modified: 2019-08-01 10:53 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description ildar.mulyukov 2014-02-03 12:39:53 UTC
The initial goal is to be able to merge a backup into the current database.
Unfortunately trying to "sa-learn --restore a_backup.file" erases the existing data although I didn't use "--clean" option.
It'd be great to be able to __merge__ the data.
If anyone knows a workaround or a 3rd-party tool for that, please share.
Comment 1 RW 2014-02-03 22:52:59 UTC
It should be trivial to merge two backup files with a bit of scripting. There are three kind of lines: metadata (v), token (t), and  signature (s) e.g.

v       3       db_version # this must be the first line!!!
v       11092   num_spam
v       3968    num_nonspam
...
t       3       0       1259670431      295b15d4b5
t       0       1       1260035824      dabf5b0ede
...
s       s       300feabea1f24e00434235260d05b2d0a5cd143a@sa_generated
s       h       d04d35c817646450e194c13cc3fac9f2f1b82aef@sa_generated


all you have to do is merge the metadata and merge token lines with the same token hash into a single line with aggregate counts and the newest time-stamp. 

The signatures should just be passed through or discarded if you don't want them. 

It's not possible to deal correctly with any messages that have been trained in both files, but that shouldn't be major problem.
Comment 2 RW 2014-02-03 23:09:35 UTC
I forgot to mention that you should discard any expiry metadata lines. I don't have any in my example because I never use the built-in expiry method.
Comment 3 Henrik Krohns 2019-08-01 10:53:52 UTC
Closing old stale bug. As it's somewhat easy to script as demonstrated, probably noone will spend time on it.