Bug 2370 - removing non-DB_File support from bayes
Summary: removing non-DB_File support from bayes
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: 2.60
Hardware: Other other
: P5 critical
Target Milestone: 2.60
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-08-27 18:47 UTC by Justin Mason
Modified: 2003-08-28 06:46 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
latest ver patch None Justin Mason [HasCLA]
latest ver patch None Justin Mason [HasCLA]
new version patch None Daniel Quinlan [HasCLA]
newer version patch None Daniel Quinlan [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2003-08-27 18:47:01 UTC
argh.   complicated.  Here's the latest patch.
Comment 1 Justin Mason 2003-08-27 18:48:11 UTC
Created attachment 1286 [details]
latest ver

moved upgrade code to sa-learn as --import switch
Comment 2 Justin Mason 2003-08-27 18:48:51 UTC
Created attachment 1287 [details]
latest ver

moved upgrade code to sa-learn as --import switch; removed support for
non-DB_File modules from BayesStore.
Comment 3 Theo Van Dinter 2003-08-27 19:38:37 UTC
+1
Comment 4 Daniel Quinlan 2003-08-28 05:38:22 UTC
Created attachment 1288 [details]
new version

I can't import 2.55 databases using that patch.  I generated 2.55 databases
for DB_File, GDBM_File, NDBM_File, and SDBM_File and then tried to import
each and none worked.

The biggest problem wwas collisions on the filename between DB_File and
GDBM_File.  GDBM-created databases from 2.55 can (and do) have the same
filename as the DB_File 2.60 databases.  So, the tie_db_writable() call fails
when there's an old GDBM database.  I had about 10 other problems trying to
get this working, this version seems to work for me.

The only nit:

  Somewhere during a run of "sa-learn --import", DB_File tries to die GDBM
  database R/O before the conversion has completed and this produces a warning.


Beyond that, it really needs more testing before I'd trust it at all.

Changes:
- avoid repeating code
- copies files temporarily while reading from them
- use %new = %old to copy hash

(Seems simple now.  Ugh...)
Comment 5 Daniel Quinlan 2003-08-28 05:45:48 UTC
This is the GDBM import nit:

Cannot open bayes databases /home/quinlan/.spamassassin/bayes_* R/O: tie failed: 
upgrading to DB_File: /home/quinlan/.spamassassin/old_bayes_toks
DB_File: no database of that kind found, nothing copied.
GDBM_File: copied 48162 entries.
upgrading to DB_File: /home/quinlan/.spamassassin/old_bayes_seen
DB_File: no database of that kind found, nothing copied.
GDBM_File: copied 998 entries.

(Also note that I added DB_File into the fray since I can't tell the difference
between DB_File and GDBM_File dbm files and since you have to have a separate
file to do the conversion, et cetera, et cetera, it seemed easiest/best to
just treat old DB_File dbm files like the others.  I could have tried more
complicated logic, but it was hard enough to get this working.)

There are no warnings for the other formats:

DB:
upgrading to DB_File: /home/quinlan/.spamassassin/old_bayes_toks
DB_File: copied 48162 entries.
upgrading to DB_File: /home/quinlan/.spamassassin/old_bayes_seen
DB_File: copied 998 entries.

NDBM:
upgrading to DB_File: /home/quinlan/.spamassassin/old_bayes_toks
DB_File: no database of that kind found, nothing copied.
GDBM_File: no database of that kind found, nothing copied.
NDBM_File: copied 48162 entries.
upgrading to DB_File: /home/quinlan/.spamassassin/old_bayes_seen
DB_File: no database of that kind found, nothing copied.
GDBM_File: no database of that kind found, nothing copied.
NDBM_File: copied 998 entries.

SDBM:
upgrading to DB_File: /home/quinlan/.spamassassin/old_bayes_toks
DB_File: no database of that kind found, nothing copied.
GDBM_File: no database of that kind found, nothing copied.
NDBM_File: no database of that kind found, nothing copied.
SDBM_File: copied 48162 entries.
upgrading to DB_File: /home/quinlan/.spamassassin/old_bayes_seen
DB_File: no database of that kind found, nothing copied.
GDBM_File: no database of that kind found, nothing copied.
NDBM_File: no database of that kind found, nothing copied.
SDBM_File: copied 998 entries.
Comment 6 Malte S. Stretz 2003-08-28 05:48:44 UTC
This looks complicated, can't we get RC3 out and plan an RC4 for the DB_file 
change? 
Comment 7 Justin Mason 2003-08-28 10:13:15 UTC
+1
Comment 8 Malte S. Stretz 2003-08-28 10:55:43 UTC
+1 looks ok. 
 
what is the reason for the big eval{} block in upgrade_old_dbm_files()? a 
comment would be cool :) 
Comment 9 Justin Mason 2003-08-28 11:17:55 UTC
There was one.  I think Dan must have removed it.  Basically since we're using
DB_File directly, and we're allowing users to run sa without DB_File installed
as long as they don't use Bayes, we have to protect use of DB_File.
Comment 10 Malte S. Stretz 2003-08-28 11:56:53 UTC
Yeah, there's a comment for the eval in 
  BayesStore::upgrade_old_dbm_files_trapped 
but not for 
  BayesStore::upgrade_old_dbm_files 
(which calls the one above). And I can't find any command which could die() in 
the latter. Except maybe the use of the File::* modules but they wouldn't need 
such a big eval block. Or can $self->tie_db_writable() die()? 
 
Hmmm... and in BayesStore::upgrade_old_dbm_files_trapped there's an eval 
inside an eval; I think the outer eval isn't really needed, too. 
 
I don't want to stop this patch, if it works it should go in so we can get RC3 
out. It just looks a bit fishy and maybe should be revisited later. 
Comment 11 Daniel Quinlan 2003-08-28 14:02:40 UTC
Created attachment 1289 [details]
newer version

- doesn't remove old files
- one less level of eval (so we can catch success correctly)
- be a bit more informative
- copy seen before toks (faster, so user gets some feedback quicker)
Comment 12 Theo Van Dinter 2003-08-28 14:22:14 UTC
why does the upgrade look for DB_File?  or are we assuming the user wouldn't 
know if they used DB_File, so support it just in case?

otherwise, +1.
Comment 13 Justin Mason 2003-08-28 14:37:03 UTC
+1.
Comment 14 Daniel Quinlan 2003-08-28 14:41:45 UTC
applied
Comment 15 Daniel Quinlan 2003-08-28 14:46:46 UTC
> why does the upgrade look for DB_File?  or are we assuming the user wouldn't 
> know if they used DB_File, so support it just in case?

I already answered this on IRC to Theo's satisfaction, but basically:

- easier logic
- user wouldn't know
- filenames can be same as other DB formats