Bug 5613 - Bayes expiry first pass should calculate ntokens
Summary: Bayes expiry first pass should calculate ntokens
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 normal
Target Milestone: Future
Assignee: SpamAssassin Developer Mailing List
Depends on:
Reported: 2007-08-17 21:18 UTC by Theo Van Dinter
Modified: 2010-01-27 03:16 UTC (History)
0 users

Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Theo Van Dinter 2007-08-17 21:18:12 UTC
There was an issue that came in IRC where an expiry run wasn't working properly.
 Looking at the debug output, the only possibly reason was that ntokens wasn't
accurate, and sure enough "sa-learn --dump data | wc -l" validated this.

It occurred to me that an easy workaround for this is that if we're going to be
doing a first pass anyway, which requires going through all the tokens, we
should just count the number of tokens while doing so and then ignore ntokens. 
It's not going to hurt when things are working correctly, and it helps in the
odd invalid ntokens case.

[13615] dbg: bayes: expiry check keep size, 0.75 * max: 375000
[13615] dbg: bayes: token count: 14729453, final goal reduction size: 14354453
[13615] dbg: bayes: first pass? current: 1187407685, Last: 1166061803, atime:
691200, count: 5927, newdelta: 285, ratio: 2421.87497891007, period: 43200
[13615] dbg: bayes: can't use estimation method for expiry, unexpected result,
calculating optimal atime delta (first pass)
[13615] dbg: bayes: expiry max exponent: 9
[13615] dbg: bayes: atime token reduction
[13615] dbg: bayes: ======== ===============
[13615] dbg: bayes: 43200 3759141
[13615] dbg: bayes: first pass decided on 43200 for atime delta
[13615] dbg: bayes: token expiration would expire too many tokens, aborting

$ sa-learn --dump data | wc -l

My commentary on how to fix it:
so I'd set "use_bayes_rules 0" and "bayes_auto_learn 0",
then "sa-learn --backup > bayes.backup"
move aside the bayes db files
then "sa-learn --restore bayes.backup"
assuming that all goes well, and "sa-learn --dump magic" looks right, remove the
config options above and let bayes go back to work
Comment 1 Theo Van Dinter 2008-10-06 13:40:42 UTC
this came up again on the users@ list, so I'm pushing this to the 3.3 queue.  it should be really simple to implement. :)
Comment 2 Justin Mason 2010-01-27 02:31:33 UTC
moving some 3.3.0-targeted bugs into the vague Future.  feel free to retarget to 3.3.1 if you think you'll be able to work on them
Comment 3 Justin Mason 2010-01-27 03:16:42 UTC
reassigning, too