SA Bugzilla – Bug 4519
[review] Add multi_tok_count_change to BayesStore API
Last modified: 2005-08-10 16:02:17 UTC
Adding a multi_tok_count_change method to the BayesStore API that allows passing of multiple tokens to change their ham/spam count and atime values. This allows for a large speed up in the MySQL and PgSQL storage modules.
Created attachment 3057 [details] Patch File - Does Not Work Here is the current patch. It is broken for the PgSQL case, getting deadlocks in the _put_tokens and tok_touch_all methods. Need to figure that out before we can do anything else. Although the benchmarks look very good.
I tried pulling out the tok_touch_all code into a stored procedure, looping over the tokens and updating one at a time, but this did not help.
Fixed it by sorting the tokens before passing them into the stored procedure. Running benchmarks now, I'll upload the patch later.
Created attachment 3058 [details] Patch File
I'm very excited about this change. It really speeds up BayesSQL by a large factor 2x for MySQL and 7x for PostgreSQL. Please review for inclusion in 3.1
Created attachment 3059 [details] Patch File for 3.1 Branch Committed to HEAD. Here is the 3.1 branch specific patch. Don't be scared of the size, a large portion of the size is due to copying code that has been working very well in the MySQL.pm module to PgSQL.
I have been working with Michael on this patch. It has been tested by both of us and is found to be a great deal faster when doing bayes against SQL. This patch greatly strengthens bayes support because the SQL modules now perform well enough to be much more useful in big sites.
+1 stored SQL procedures are scary ;)
I have this code running on one of my servers and it works 10x better than 3.1.0pre4. This patch makes bayes/SQL fast enough to move away from the bdb locking headache.
+1 I'm not going to claim that any of the SQL stuff works, just that this patch doesnt appear to break anything that wasnt broken before. :-)
Created attachment 3073 [details] Patch File This patch fixes an additional bug in SQL.pm which ended up without a _put_tokens method. Now it has one. Same as the other patch, with just a few changes in SQL.pm.
+1
+1 stored procedures kick ass ;)
Sending lib/Mail/SpamAssassin/Bayes.pm Sending lib/Mail/SpamAssassin/BayesStore/DBM.pm Sending lib/Mail/SpamAssassin/BayesStore/MySQL.pm Sending lib/Mail/SpamAssassin/BayesStore/PgSQL.pm Sending lib/Mail/SpamAssassin/BayesStore/SQL.pm Sending lib/Mail/SpamAssassin/BayesStore.pm Sending sql/bayes_pg.sql Transmitting file data ....... Committed revision 231411.