SA Bugzilla – Bug 6997
[FR] merge bayes database(s)
Last modified: 2019-08-01 10:53:52 UTC
The initial goal is to be able to merge a backup into the current database. Unfortunately trying to "sa-learn --restore a_backup.file" erases the existing data although I didn't use "--clean" option. It'd be great to be able to __merge__ the data. If anyone knows a workaround or a 3rd-party tool for that, please share.
It should be trivial to merge two backup files with a bit of scripting. There are three kind of lines: metadata (v), token (t), and signature (s) e.g. v 3 db_version # this must be the first line!!! v 11092 num_spam v 3968 num_nonspam ... t 3 0 1259670431 295b15d4b5 t 0 1 1260035824 dabf5b0ede ... s s 300feabea1f24e00434235260d05b2d0a5cd143a@sa_generated s h d04d35c817646450e194c13cc3fac9f2f1b82aef@sa_generated all you have to do is merge the metadata and merge token lines with the same token hash into a single line with aggregate counts and the newest time-stamp. The signatures should just be passed through or discarded if you don't want them. It's not possible to deal correctly with any messages that have been trained in both files, but that shouldn't be major problem.
I forgot to mention that you should discard any expiry metadata lines. I don't have any in my example because I never use the built-in expiry method.
Closing old stale bug. As it's somewhat easy to script as demonstrated, probably noone will spend time on it.