Bug 2904 - bayes wont expire tokens properly
Summary: bayes wont expire tokens properly
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 2.60
Hardware: Sun Solaris
: P5 critical
Target Milestone: 2.70
Assignee: Theo Van Dinter
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-01-07 08:07 UTC by Adam Denenberg
Modified: 2004-02-05 08:42 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Denenberg 2004-01-07 08:07:23 UTC
It seems that i can not expire any of my tokens, and I am not sure why.

here is my sa-learn --dump magic and the attempt to force-expire in -D 

[root@nydb1 adam]# sa-learn --dump magic
0.000          0          2          0  non-token data: bayes db version
0.000          0      67753          0  non-token data: nspam
0.000          0      24516          0  non-token data: nham
0.000          0    1648100          0  non-token data: ntokens
0.000          0          0          0  non-token data: oldest atime
0.000          0 1103644440          0  non-token data: newest atime
0.000          0 1073474818          0  non-token data: last journal sync atime
0.000          0 1073456071          0  non-token data: last expiry atime
0.000          0    1382400          0  non-token data: last expire atime delta
0.000          0       1019          0  non-token data: last expire reduction count
[root@nydb1 adam]# sa-learn -D --force-expire
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/local/sbin', keeping.
debug: PATH included '/usr/ccs/bin', keeping.
debug: PATH included '/usr/local/mysql/bin', keeping.
debug: PATH included '/usr/ucb', keeping.
debug: Final PATH set to:
/usr/bin:/bin:/usr/sbin:/sbin:/usr/bin:/bin:/usr/local/bi
n:/sbin:/usr/sbin:/usr/local/sbin:/usr/ccs/bin:/usr/local/mysql/bin:/usr/ucb
debug: using "/usr/local/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/export/home/adam/.spamassassin/user_prefs" for user prefs file
debug: Allowing user rules!
debug: bayes: 8901 tie-ing to DB file R/O /share/spam/bayes_toks
debug: bayes: 8901 tie-ing to DB file R/O /share/spam/bayes_seen
debug: bayes: found bayes db version 2
debug: Score set 2 chosen.
debug: Initialising learner
debug: Initialising learner
debug: Syncing Bayes journal and expiring old tokens...
debug: lock: 8901 created /share/spam/bayes.lock.nydb1.mxprecise.com.8901
debug: lock: 8901 trying to get lock on /share/spam/bayes with 0 retries
debug: lock: 8901 link to /share/spam/bayes.lock: link ok
debug: bayes: 8901 tie-ing to DB file R/W /share/spam/bayes_toks
debug: bayes: 8901 tie-ing to DB file R/W /share/spam/bayes_seen
debug: bayes: found bayes db version 2
.
synced Bayes databases from journal in 1 seconds: 1697 unique entries (2408
total e ntries)
debug: bayes: expiry check keep size, 75% of max: 750000
debug: bayes: token count: 0, final goal reduction size: -750000
debug: bayes: reduction goal of -750000 is under 1,000 tokens.  skipping expire.
debug: Syncing complete.
debug: bayes: 8901 untie-ing
debug: bayes: 8901 untie-ing db_toks
debug: bayes: 8901 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 8901 unlink /share/spam/bayes.lock
Comment 1 Theo Van Dinter 2004-01-07 08:56:47 UTC
I asked for the ticket to be opened, so I'm assigning it to myself.
Comment 2 Theo Van Dinter 2004-01-07 12:26:31 UTC
Well, I don't exactly have the same environment, but ...

Took the original DB file, ran it through dump/load to get it converted to a format my box can 
understand (was originally, it looks like, DB4 on sparc, going to DB3 on ix86).  I then had no 
problems doing an expire:

debug: bayes: found bayes db version 2
debug: bayes: expiry check keep size, 75% of max: 375000
debug: bayes: token count: 1648258, final goal reduction size: 1273258
debug: bayes: First pass?  Current: 1073506951, Last: 1073456071, atime: 1382400, count: 1019, 
newdelta: 1106, ratio: 1249.51717369971
debug: bayes: something fishy, calculating atime (first pass)


Is there a way I can get access to the box showing this problem?  If not, I can try to rig up a similar 
environment on some Suns that I have access to.
Comment 3 Adam Denenberg 2004-01-07 12:33:28 UTC
Unfortunately i cant give access to this machine as it is on an internal
network.  How did you dump and load the data?  Maybe i should try the same
method to see if i get the same results?  Could you share the dump and load tool?

- adam
Comment 4 Theo Van Dinter 2004-01-07 12:55:38 UTC
Subject: Re:  bayes wont expire tokens properly

On Wed, Jan 07, 2004 at 01:28:30PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Unfortunately i cant give access to this machine as it is on an internal
> network.  How did you dump and load the data?  Maybe i should try the same

ok, fair enough.  just wanted to ask before I try whipping one the sparcs
at work into doing my bidding.  ;)

> method to see if i get the same results?  Could you share the dump and load tool?

I just used the dump/load from the berkeley db distro.  'db_dump
bayes_toks.orig | db33_load bayes_toks'

There shouldn't have been any bayes data change due to the dump/load,
just changing the internal berkeley DB format.

I suppose I could upgrade my personal box to use DB_File linked against
db4 and see what that does... :|

Comment 5 Theo Van Dinter 2004-01-07 18:23:43 UTC
BTW:

$ db_verify bayes_orig 
db_verify: Page 10310: hash page has bad prev_pgno
db_verify: Page 10033: hash page has bad prev_pgno
db_verify: Page 6485: item 286 hashes incorrectly
db_verify: Page 6488: item 312 hashes incorrectly
db_verify: Page 6489: item 330 hashes incorrectly
db_verify: Page 6489: item 332 hashes incorrectly
db_verify: Page 7448: non-empty page in unused hash bucket 5895
db_verify: Page 0: page 2885 encountered a second time on free list
db_verify: DB->verify: bayes_orig: DB_VERIFY_BAD: Database verification failed

bayes_orig is the file you sent up.
Comment 6 Adam Denenberg 2004-01-14 12:02:26 UTC
Hmm, do you think a db_dump bayes_toks | db_load will fix the hash errors?  It
would appear that there is corruption, but dont want to lose a database that
took me so much time to build... 

 any recommendations here?

 thanks
adam
Comment 7 Theo Van Dinter 2004-02-05 17:42:28 UTC
you could try the db rebuild, but it seems like you should update your DB_File or libdb libs. :(

but since this doesn't seem like a SA issue, closing as wfm.