SA Bugzilla – Bug 4642
concurrency problem in the PostgreSQL specific Bayes-module
Last modified: 2016-01-26 17:50:12 UTC
the PostgreSQL specific Bayes-module uses the plpgsql functions _put_token() that takes an array of bytea-values and loops through the array doing either an update or an insert. However the approach used is subject to a racecondition in the case of multiple clients learning mails with similiar tokens (multiple spamds or sa-learns), because it might happen that a new token gets inserted(and commited) after the function got called but before it tries the insert resulting in the following error message in the postgresql log: ERROR: duplicate key violates unique constraint "bayes_token_pkey" CONTEXT: SQL statement "INSERT INTO bayes_token (id, token, spam_count, ham_count, atime) VALUES ( $1 , $2 , $3 , $4 , $5 )" PL/pgSQL function "put_tokens" line 18 at SQL statement beginning with PostgreSQL 8.0 there is support of subtransactions/exceptions in postgresql/plpgsql that could be used to solve that problem - an example for that is available on: http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING (example 36-1)
> However the approach used is subject to a racecondition in the case of multiple > clients learning mails with similiar tokens (multiple spamds or sa-learns), > because it might happen that a new token gets inserted(and commited) after the > function got called but before it tries the insert resulting in the following > error message in the postgresql log: Btw, the same problem occurs in SQL-based AWL backend: process A does a SELECT and does not find any entries, meanwhile process B does the same and INSERTs its record, and when later process A tries to do its own INSERT (instead of UPDATE), the SQL operation fails due to a key constraint.
PostgreSQL 9.5 support upsert (see also bug 7218 which includes a patch for postgres AWL)