Bug 5733

Summary: Bayes DBM tie_db_readonly doesn't clear is_locked flag
Product: Spamassassin Reporter: Rob Mueller <apache>
Component: LearnerAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED WORKSFORME    
Severity: normal CC: apache
Priority: P5    
Version: 3.2.1   
Target Milestone: Undefined   
Hardware: Other   
OS: All   
Whiteboard:

Description Rob Mueller 2007-11-28 14:52:21 UTC
We're using some custom code to call the spamassassin bayes learning 
routines rather than sa-learn, and have come across a slight bug that meant 
the first messages being learnt were always being lost.

Going through the code here's what I have & what's happening...

-----

    $SpamAssassin = Mail::SpamAssassin->new({
      dont_copy_prefs => 1,
      config_text => <<'EOF',
use_bayes 0
bayes_file_mode 0777
bayes_learn_to_journal 1
bayes_auto_expire 1
bayes_journal_max_size 0
bayes_use_hapaxes 0
bayes_auto_learn 0
EOF
    });

    $SpamAssassin->init();

-----

... later on when I need to learn some messages for a user ...

-----

  # Nasty poke straight into conf object...
  $SpamAssassin->{conf}->{use_bayes} = 1;
  $SpamAssassin->{conf}->{bayes_path} = $LearnDir . "/bayes";
  $SpamAssassin->signal_user_changed({ username => $UnqName });
  $SpamAssassin->init_learner({
    caller_will_untie => 1,
  });

-----

... get the messages and loop over ...

-----

    for-all-messages ... {
      my $Message = $SpamAssassin->parse(\@Lines);
      my $Learner = $SpamAssassin->learn($Message, undef, $IsSpam);
      $Learner->finish();
      $Message->finish();
    }
...

  $SpamAssassin->rebuild_learner_caches();
  $SpamAssassin->finish_learner();

-----

The problem occurs because Mail::SpamAssassin::Bayes::learn() has the code:

-----
    if ($self->{main}->{learn_to_journal}) {
      # If we're going to learn to journal, we'll try going r/o first...
      # If that fails for some reason, let's try going r/w.  This happens
      # if the DB doesn't exist yet.
      $ok = $self->{store}->tie_db_readonly() ||
$self->{store}->tie_db_writable();
    } else {
      $ok = $self->{store}->tie_db_writable();
    }
-----

If no bayes db exists, then
Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly fails, so it does
Mail::SpamAssassin::BayesStore::DBM::tie_db_writeable() which succeeds.

However when it tries to learn the second message, it calls 
Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly() again. This closes 
the db for writable access, and succeeds in opening it for readonly access, 
however it doesn't clear the $self->{is_locked} flag.

This means that it thinks the db is still in writable mode, even though it's
now been tied in read_only mode.

This means that every message learnt attempts to re-tie the db (slow). It
also means that when we rebuild_learner_caches(), it thinks that the db is
in writeable mode, so when it calls
Mail::SpamAssassin::BayesStore::DBM::tie_db_writeable(), it immediately
returns because it thinks the db is writeable, but then all attempts to
write to the db are just lost (no error messages, the write to the hash just
disappears).

It seems the fix is as easy as adding $self->{is_locked} = 0 somewhere in
tie_db_readonly()

I worked around this by forcibly opening the db in write mode on immediately 
after calling init_learner.

-----
     caller_will_untie => 1,
   });

+  # Need to force creation of db
+  my $Scanner = $SpamAssassin->{bayes_scanner};
+  if ($Scanner && $Scanner->{store}) {
+    $Scanner->{store}->tie_db_writable();
+    $Scanner->{store}->untie_db();
+  }
+
   return 1;
 }

-----

Hmmm, there was another problem with locks being left around as well, but I 
can't remember the exact details again now. I just know that I found a work 
around by changing my code to:

-----

+  # Call this first to untie db to release lock
+  $SpamAssassin->finish_learner();
   $SpamAssassin->rebuild_learner_caches();

-----

I'll see if I can track it down again and remind myself what it was...
Comment 1 Henrik Krohns 2022-03-06 16:37:21 UTC
Closing ancient stale bug.