Bug 6710 - sa-learn --restore with MySQL is horrible slow
Summary: sa-learn --restore with MySQL is horrible slow
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: 3.3.2
Hardware: All All
: P2 normal
Target Milestone: 4.0.0
Assignee: SpamAssassin Developer Mailing List
Depends on:
Reported: 2011-11-27 17:01 UTC by Christian Ruppert
Modified: 2019-06-26 12:24 UTC (History)
3 users (show)

Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Ruppert 2011-11-27 17:01:23 UTC
Doing sa-learn --backup / --restore with DBM and a bayes DB of about 9 and 25MB is pretty fast (around a minute or so IIRC).
Restoring a backup of those DBM files into a MySQL db by using the MySQL module is horrible slow.
I don't know if the basic SQL module behaves similar.
restore is running now for over 2hrs and while attaching a strace to the process I noticed its doing every token separate.

Wouldn't it make sense to create an array/buffer to insert N tokens at once?

write(4, "\200\0\0\0\3INSERT INTO bayes_seen (id, msgid, flag)\n             VALUES ('14','63494b63ad4d346fe268b554d485e8a39c8c97e7@sa_generated','s')", 132) = 132
read(4, "\7\0\0\1\0\1\0\1\0\0\0", 16384) = 11
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(4, "\7\0\0\0\3commit", 11)        = 11
read(4, "\7\0\0\1\0\0\0\0\0\0\0", 16384) = 11

Not sure if that is related at all but anyway:
I use 3.3.2 with the patches of bug 6624, bug 6625 and bug 6626.
Comment 1 Ivan Filippov 2016-02-22 23:45:41 UTC
This affects any component using Mail::SpamAssassin::BayesStore::SQL.  The INSERTs and UPDATEs could be batched in transactions to speed this up.

I'm using 3.4.0 with a MariaDB Galera cluster (3 intercontinental members).  I've tested differently sized scan targets via bayes and just standard database operations.  INSERTing 400 simple records one at a time (with autocommit) takes around 80 seconds, entering 400 of the same records within a transaction takes less than 1 second.
Comment 2 Henrik Krohns 2019-06-26 08:01:11 UTC
Is this still problem with current codebase? From a quick look I guess so.

Might be hard to get anyone working on it, since Redis works miles better anyway (and it can clustered too).

Well tested patches welcomed..
Comment 3 Kevin A. McGrail 2019-06-26 12:24:05 UTC
+1 use redis for bayes