SA Bugzilla – Bug 6710
sa-learn --restore with MySQL is horrible slow
Last modified: 2022-03-24 05:19:08 UTC
Doing sa-learn --backup / --restore with DBM and a bayes DB of about 9 and 25MB is pretty fast (around a minute or so IIRC). Restoring a backup of those DBM files into a MySQL db by using the MySQL module is horrible slow. I don't know if the basic SQL module behaves similar. restore is running now for over 2hrs and while attaching a strace to the process I noticed its doing every token separate. Wouldn't it make sense to create an array/buffer to insert N tokens at once? <snip> write(4, "\200\0\0\0\3INSERT INTO bayes_seen (id, msgid, flag)\n VALUES ('14','63494b63ad4d346fe268b554d485e8a39c8c97e7@sa_generated','s')", 132) = 132 read(4, "\7\0\0\1\0\1\0\1\0\0\0", 16384) = 11 poll([{fd=4, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) write(4, "\7\0\0\0\3commit", 11) = 11 read(4, "\7\0\0\1\0\0\0\0\0\0\0", 16384) = 11 </snip> Not sure if that is related at all but anyway: I use 3.3.2 with the patches of bug 6624, bug 6625 and bug 6626.
This affects any component using Mail::SpamAssassin::BayesStore::SQL. The INSERTs and UPDATEs could be batched in transactions to speed this up. I'm using 3.4.0 with a MariaDB Galera cluster (3 intercontinental members). I've tested differently sized scan targets via bayes and just standard database operations. INSERTing 400 simple records one at a time (with autocommit) takes around 80 seconds, entering 400 of the same records within a transaction takes less than 1 second.
Is this still problem with current codebase? From a quick look I guess so. Might be hard to get anyone working on it, since Redis works miles better anyway (and it can clustered too). Well tested patches welcomed..
+1 use redis for bayes
Postpone into future