Bug 4519 - [review] Add multi_tok_count_change to BayesStore API
Summary: [review] Add multi_tok_count_change to BayesStore API
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 critical
Target Milestone: 3.1.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard: ready to commit
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-05 21:23 UTC by Michael Parker
Modified: 2005-08-10 16:02 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
Patch File - Does Not Work patch None Michael Parker [HasCLA]
Patch File patch None Michael Parker [HasCLA]
Patch File for 3.1 Branch patch None Michael Parker [HasCLA]
Patch File patch None Michael Parker [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Parker 2005-08-05 21:23:12 UTC
Adding a multi_tok_count_change method to the BayesStore API that allows passing
of multiple tokens to change their ham/spam count and atime values.

This allows for a large speed up in the MySQL and PgSQL storage modules.
Comment 1 Michael Parker 2005-08-05 21:59:14 UTC
Created attachment 3057 [details]
Patch File - Does Not Work

Here is the current patch.  It is broken for the PgSQL case, getting deadlocks
in the _put_tokens and tok_touch_all methods.

Need to figure that out before we can do anything else.  Although the
benchmarks look very good.
Comment 2 Michael Parker 2005-08-05 23:43:58 UTC
I tried pulling out the tok_touch_all code into a stored procedure, looping over
the tokens and updating one at a time, but this did not help.
Comment 3 Michael Parker 2005-08-06 00:49:44 UTC
Fixed it by sorting the tokens before passing them into the stored procedure. 
Running benchmarks now, I'll upload the patch later.
Comment 4 Michael Parker 2005-08-06 23:37:38 UTC
Created attachment 3058 [details]
Patch File
Comment 5 Michael Parker 2005-08-06 23:38:55 UTC
I'm very excited about this change.  It really speeds up BayesSQL by a large
factor 2x for MySQL and 7x for PostgreSQL.

Please review for inclusion in 3.1
Comment 6 Michael Parker 2005-08-07 13:21:04 UTC
Created attachment 3059 [details]
Patch File for 3.1 Branch

Committed to HEAD.

Here is the 3.1 branch specific patch.

Don't be scared of the size, a large portion of the size is due to copying code
that has been working very well in the MySQL.pm module to PgSQL.
Comment 7 Matthew Schumacher 2005-08-07 20:37:47 UTC
I have been working with Michael on this patch.  It has been tested by both of
us and is found to be a great deal faster when doing bayes against SQL.  This
patch greatly strengthens bayes support because the SQL modules now perform well
enough to be much more useful in big sites.
Comment 8 Justin Mason 2005-08-09 00:05:09 UTC
+1

stored SQL procedures are scary ;)
Comment 9 Matthew Schumacher 2005-08-09 10:27:59 UTC
I have this code running on one of my servers and it works 10x better than
3.1.0pre4.  This patch makes bayes/SQL fast enough to move away from the bdb
locking headache.
Comment 10 Duncan Findlay 2005-08-10 19:24:17 UTC
+1

I'm not going to claim that any of the SQL stuff works, just that this patch
doesnt appear to break anything that wasnt broken before. :-)
Comment 11 Michael Parker 2005-08-10 22:00:47 UTC
Created attachment 3073 [details]
Patch File

This patch fixes an additional bug in SQL.pm which ended up without a
_put_tokens method.

Now it has one.  Same as the other patch, with just a few changes in SQL.pm.
Comment 12 Duncan Findlay 2005-08-10 22:03:23 UTC
+1
Comment 13 Daryl C. W. O'Shea 2005-08-10 23:58:27 UTC
+1 stored procedures kick ass ;)
Comment 14 Michael Parker 2005-08-11 00:02:17 UTC
Sending        lib/Mail/SpamAssassin/Bayes.pm
Sending        lib/Mail/SpamAssassin/BayesStore/DBM.pm
Sending        lib/Mail/SpamAssassin/BayesStore/MySQL.pm
Sending        lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
Sending        lib/Mail/SpamAssassin/BayesStore/SQL.pm
Sending        lib/Mail/SpamAssassin/BayesStore.pm
Sending        sql/bayes_pg.sql
Transmitting file data .......
Committed revision 231411.