Bug 4635 - Bayes fails to initialize MySQL and PgSQL database
Summary: Bayes fails to initialize MySQL and PgSQL database
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 3.1.0
Hardware: All All
: P3 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-16 20:02 UTC by Aleksi Asikainen
Modified: 2006-08-15 14:40 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Patch to not disconnect unnecessarily patch None Jason Rhinelander [NoCLA]
Updated patch to avoid double-connect patch None Jason Rhinelander [NoCLA]
Previous patch, minus debugging line patch None Jason Rhinelander [NoCLA]
Unofficial patch to Bayes _initialize_db+_tie_db_... patch None Alexander Davydenko [NoCLA]
Unofficial patch to Bayes _initialize_db+MySQL.pm patch None Alexander Davydenko [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksi Asikainen 2005-10-16 20:02:58 UTC
Running Bayes with a MySQL or PgSQL database seems to cause a failure where a 
non-existing user nevers get their database table row in bayes_vars 
initialized ("bayes: unable to initialize database [...]" in 
SpamAssassin/BayesStore/SQL.pm). This seems to be originating from 
SpamAssassin/Bayes.pm line 1122, which tries to establish a read-only 
connection, which does fail if it can't find the user from the database table 
bayes_vars. 

(This same error occurs also at least in spamd and sa-learn and supposedly 
everywhere else where SpamAssassin/Bayes.pm function is_scan_available() is 
used.)

The following is a quick fix for the problem. I must stress that I do not know 
what other effects it has. It creates the user, if they do not exist in the 
database table yet.

Change the line on SpamAssassin/Bayes.pm:1122 (in is_scan_available) as 
follows:

out: return 0 unless $self->{store}->tie_db_readonly();

in: return 0 unless ( $self->{store}->tie_db_readonly() || $self->{store}-
>tie_db_writable() );
Comment 1 Jason Rhinelander 2005-10-18 19:26:37 UTC
I looked into this for another problem (as it turned out, completely unrelated,
and my own fault), and I don't think this is the proper change.  Basically, (as
I understand it) is_scan_available is meant to return 0 at that point to
indicate that no bayes scan is available - because there is no stored bayes data
for that user.  This is The Right Thing - with no bayes data, no bayes scan is
possible.  Even *with* your change, the subroutine will still return 0 once it
gets down to the min_spam/min_ham checks further in the subroutine.

When learning is enabled and SpamAssassin decides to learn from the message, it
doesn't check is_scan_available.  Instead, it will call tie_db_writable(), which
creates the user if needed, then inserts the data.

That said, I believe there *is* a problem here - if $self->_initialize_db(0) (at
Mail/SpamAssassin/BayesStore/SQL.pm, line 140) fails, ->untie_db() will be
called, which proceeds to call $self->{_dbh}->disconnect (the comment right
above this function is entirely misleading, if not completely wrong) - but that
means that if learning happens, BayesStore/SQL.pm has now connected to the
database twice instead of simply reusing the same connection (or potentially
much more than twice, if BayesStore is used multiple times without an ->untie_db()).

That problem can, I believe, be fixed by changing the return value here from 0
to -1 (BayesStore/SQL.pm, lines 1772-1773):

  # Do not create an entry for this user unless we were specifically asked to
  return 0 unless ($create_entry_p);

then, in tie_db_readonly, changing (lines 140-144) from:

  unless ($self->_initialize_db(0)) {
    dbg("bayes: unable to initialize database for ".$self->{_username}." user,
aborting!");
    $self->untie_db();
    return 0;
  }

to:

  unless (my $init = $self->_initialize_db(0)) {
    dbg("bayes: unable to initialize database for ".$self->{_username}." user,
aborting!");
    $self->untie_db();
    return 0;
  }
  elsif ($init == -1) {
    dbg("bayes: $self->{_username} has no data in database, aborting!");
    return 0;
  }

I'm not intimately familiar with all the innards of spam assassin, so hopefully
someone more experienced with it than I can tell me whether my understanding of
how this code works is correct or not.
Comment 2 Jason Rhinelander 2005-10-18 19:35:13 UTC
Created attachment 3195 [details]
Patch to not disconnect unnecessarily

Attached a patch to Mail/SpamAssassin/BayesStore/SQL.pm containing the changes
in comment 1.
Comment 3 Michael Parker 2005-10-18 19:36:42 UTC
Subject: Re:  Bayes fails to initialize MySQL and PgSQL database


>------- Additional Comments From jagerman@jagerman.com  2005-10-18 19:26 -------
>
>That said, I believe there *is* a problem here - if $self->_initialize_db(0) (at
>Mail/SpamAssassin/BayesStore/SQL.pm, line 140) fails, ->untie_db() will be
>called, which proceeds to call $self->{_dbh}->disconnect (the comment right
>above this function is entirely misleading, if not completely wrong) - but that
>means that if learning happens, BayesStore/SQL.pm has now connected to the
>database twice instead of simply reusing the same connection (or potentially
>much more than twice, if BayesStore is used multiple times without an ->untie_db()).
>
>  
>

I believe you are mistaken in how the code works.  That said, it's been
awhile so it's not coming to me immediately.  I'll have a look and see
just to make sure.

Michael
Comment 4 Jason Rhinelander 2005-10-25 06:44:26 UTC
Created attachment 3205 [details]
Updated patch to avoid double-connect

(In reply to comment #3)
> 
> I believe you are mistaken in how the code works.  That said, it's been
> awhile so it's not coming to me immediately.	I'll have a look and see
> just to make sure.
> 

I've verified that if tie_db_readonly() is called, followed by
tie_db_writable(), and the user has no bayes information yet, two seperate
database connections are performed: once for _initialize_db(0), which
disconnects as soon as _initialize_db(0) "fails", then again when
tie_db_writeable() is called - this is easy enough to test by setting
$ENV{DBI_TRACE} = 1 before SA is loaded, and looking for ->connect's in the
resulting trace debugging.

My previous patch was broken, however, in that it needed to call
_initialize_db(1) if $self->{db_writable_p} is false just before it sets it to
true in tie_db_writable().
Comment 5 Jason Rhinelander 2005-10-25 06:46:23 UTC
Created attachment 3206 [details]
Previous patch, minus debugging line
Comment 6 Angelo Turetta 2006-08-09 17:06:10 UTC
The latest patch works for me on 3.1.3.
The first few messages getting through logged an object unassigned error (sorry,
didn't scribble it down), but after the transient all is fine.

Angelo.
Comment 7 Alexander Davydenko 2006-08-15 06:45:07 UTC
Created attachment 3651 [details]
Unofficial patch to Bayes _initialize_db+_tie_db_...

As I can see from a previous patch, the problem is not resolved. This patch is
my  suggestion on that issue. It trys to follow a *Description* of patched
methods. I tested it with 3.1.1/perl 5.8.8
Comment 8 Alexander Davydenko 2006-08-15 06:48:40 UTC
Created attachment 3652 [details]
Unofficial patch to Bayes _initialize_db+MySQL.pm

It's the same as a SQL.pm patch, but for MySQL.pm module.
Comment 9 Michael Parker 2006-08-15 12:44:08 UTC
-1 on the patches.  There is a reason for not creating the entries.
Comment 10 Alexander Davydenko 2006-08-15 21:40:47 UTC
(In reply to comment #9)
> -1 on the patches.  There is a reason for not creating the entries.

IMHO the only place where we arise in a problem with create an user entry is the
'clear_database()', which trash dbase if user just inserted. I think it can be
resolved with a $self->{some_var} private to SQL class and set/test it when we
need db_readonly from some place.