SA Bugzilla – Bug 3471
[review] add sanity check so too many tokens don't get expired
Last modified: 2004-06-09 06:02:10 UTC
it's been reported on the -users list that sometimes bayes expiry will get rid of the majority of tokens in the db. the issue seems to occur when most of the tokens have a single atime (like when an upgrade happens) and expiry uses the estimation method. I'm attaching a patch which adds a simple check to the DBM expiry (SQL doesn't do estimation?).
Created attachment 2000 [details] suggested patch
I think this is 3.0 material.
Subject: Re: New: add sanity check so too many tokens don't get expired On Fri, Jun 04, 2004 at 09:50:27AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > > I'm attaching a patch which adds a simple check to the DBM expiry (SQL doesn't do estimation?). > Actually, it does. It determines the optimal atime to expiration and then deletes everything < that. If this is a problem for DBM, then it could happen for SQL as well, but slightly harder to recover from. Michael
Subject: Re: add sanity check so too many tokens don't get expired On Fri, Jun 04, 2004 at 04:38:11PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > Actually, it does. It determines the optimal atime to expiration and > then deletes everything < that. If this is a problem for DBM, then it > could happen for SQL as well, but slightly harder to recover from. Ah. I saw the "delete ... < ..." bit, and got confused (it was a very long night last night...) I guess the way to guard for this in SQL is to do a "select count(??) ..." before the delete, and see how many tokens will be removed. If too many, abort. If not, do the delete call. It adds more overhead, but ... At least for the delete, the appropriate rows would be in cache. :)
Subject: Re: add sanity check so too many tokens don't get expired It's been awhile since I've looked at this code but I wonder if this is just fixing the symptom and not the actual problem. We should be figuring out the optimal atime value that doesn't expire too many tokens. So is that part of the code doing the wrong thing? Michael
Subject: Re: add sanity check so too many tokens don't get expired On Fri, Jun 04, 2004 at 06:35:31PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > We should be figuring out the optimal atime value that doesn't expire > too many tokens. So is that part of the code doing the wrong thing? it's all documented in the sa-learn docs. ;) the expiry works in 2 passes. the first pass is what generates the atime to use, and the second pass actually does the removal. to make expiry go faster, the first pass will try to estimate what the atime value should be based on what the last expiry run found, as long as they are similar enough in what they want to accomplish (ie: if you previously expired 5000 tokens, and you now want to expire 5000 tokens, you can probably use the same atime delta as last time...). this works on the idea that message flow is relatively constant. if estimation shouldn't be used (it's rather picky), the first pass will actually go through all the tokens to figure out what atime value should be used based on an exponential scale (12 hours, 24 hours, up through 256 days). the problem is that doing estimation gives you, well, an estimate. most of the time it's right, and sometimes it's wrong. it's the reason why people don't like the weatherman on tv. so to answer your question, we can always use a better estimation algorithm/equation, but the nature of the beast is that estimations will sometimes be wrong so we should just put a sanity check in the bottom to make sure we're not about to shoot off our own foot.
Subject: Re: add sanity check so too many tokens don't get expired Ahhh, yeah I forgot about the first pass/guess based on previous expires. Makes sense. Michael
Created attachment 2005 [details] new version of implementation this version handles both DBM and SQL
change to review state, taking ticket
+1 ok, that looks good I think
Subject: Re: [review] add sanity check so too many tokens don't get expired I hate to do it, but I'm -1 on the patch as is. There are a couple of problems with the SQL stuff. I'll fix it up in the morning and attach a new version. Michael
Subject: Re: [review] add sanity check so too many tokens don't get expired On Mon, Jun 07, 2004 at 10:35:02PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > I hate to do it, but I'm -1 on the patch as is. There are a couple of > problems with the SQL stuff. Heh. Don't worry about it, as long as we get a good patch in for the issue. For the SQL code, I copied the select section from another part of the module and just changed the query around and added some if/else statements. > I'll fix it up in the morning and attach a new version. :)
Created attachment 2016 [details] Patch File All new version. Pretty much the same as before, however I made some changes to how the SQL version returns on error and cleaned up the SQL stuffs. I don't really like introducing the goto, but I couldn't think of a cleaner way to jump to the end so we could return cleanly. Ideas on other methods welcome.
Subject: Re: [review] add sanity check so too many tokens don't get expired -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >I don't really like introducing the goto, but I couldn't think of a cleaner way >to jump to the end so we could return cleanly. Ideas on other methods welcome. goto's fine for clean error handling IMO -- one for the style guide I suppose ;) - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFAxiGKQTcbUG5Y7woRAkyrAJ9n5urBsGCml5wq8qwGFL9AQ4IIMQCgmxPG I2SxdS+WRDloE+DJn/crlts= =Z7wx -----END PGP SIGNATURE-----
+1, assuming it works ;)
+1 looks fine here
Committed revision 20962.