SA Bugzilla – Bug 1719
sa-learn locking/unlocking for every message
Last modified: 2003-05-09 07:39:18 UTC
sa-learn locks and unlocks the Bayes dbs for *every* message it learns from. This is obviously suboptimal.
Created attachment 851 [details] fix
fix attached.
*** Bug 1673 has been marked as a duplicate of this bug. ***
OKAY: looks good
from IRC: <quinlan> doesnt the 1719 fix mean a learner will have the db locked for a very long time? <jm> an "sa-learn" learner may. <jm> auto-learning etc. will not <jm> one's manual, the other isn't; so UI-wise, a manually-run process is OK to keep it locked <quinlan> jm: I'm concerned about a long-running sa-learn causing thousands of spamd processes to queue up <jm> in that case, we should untie every 50 messages or so -- or use the journal when learning. I'd favour the latter. <jm> OK, we can leave 1719 out, and implement journalled learning in 2.60
so was the decision to not bother with this patch or what? (I want to close it if it was decided to not do this... I think we should lock for the duration of learning, and untie occasionally, but ...)
Subject: Re: [SAdev] sa-learn locking/unlocking for every message > so was the decision to not bother with this patch or what? (I want to > close it if it was decided to not do this... I think we should lock > for the duration of learning, and untie occasionally, but ...) I think Dan wanted to use a different method; perhaps locking & unlocking every 20 msgs or similar.
I have a small request: if this patch is ever implemented, please add an option to disable this functionality (something like: -n <number> -- retie db every <number> messages). I think that this functionality does not buy you that much, and it actualy adds more headache (at least for me). sa-learn is an offline process. I do not see any reason why you should complicate matters with this patch. It can have very adverse affects on servers with autolearning enabled.
Created attachment 910 [details] re-added learn-to-journal
Created attachment 911 [details] re-added learn-to-journal, take 2
OK guys, take a look and tell me what you think... it's journalled learning, so the db isn't locked during an "sa-learn --no-rebuild" op.
actually, scratch that. I'm going to redo this in-RAM, then add to journal at end; more reliable in case of user CTRL-C. but a bit busy right now....
fyi: patch 851 causes things to break badly. it makes finish() check caller_will_untie, which is bad because that's the function that gets called to untie... I opened a new bug (1813) to remind me to go fix that in head.
ISSUE: I have two issues actually. 1) you have to be sure that all updates go through the journal which theoretically would happen, but I can come up with places where that may not be true. for example, someone wants to try it, learns to journal for a little bit, doesn't like that so they disable it before doing a rebuild, then learn some more. when the sync occurs, the db gets "corrupt". the easiest way to solve that is instead of "change ham/spam count to" for both token and db, we should have it be "change ham/spam count by". 2) doing defer_update seems to not include anyway to specify a message id for the seen db. so the end result is not the same as the non-deferred version. a related issue is that you can learn the same mail continuously, then sync and have the mail count multiple times.
Subject: Re: sa-learn locking/unlocking for every message bugzilla-daemon@hughes-family.org said: > 1) you have to be sure that all updates go through the journal which > theoreti cally would happen, but I can come up with places where that > may not be true. for example, someo ne wants to try it, learns to > journal for a little bit, doesn't like that so they disable it befo re > doing a rebuild, then learn some more. when the sync occurs, the db > gets "corrupt". > the easiest way to solve that is instead of "change ham/spam count to" > for bo th token and db, we should have it be "change ham/spam count by". It already does this, actually. > 2) doing defer_update seems to not include anyway to specify a message > id for the seen db. so the end result is not the same as the > non-deferred version. a related issue is that you can learn the same > mail continuously, then sync and have the mail count multiple times. Yes, this is an issue. I'll fix it in the updated patch. --j.
Subject: Re: sa-learn locking/unlocking for every message On Wed, Apr 23, 2003 at 03:07:34PM -0700, bugzilla-daemon@hughes-family.org wrote: > > the easiest way to solve that is instead of "change ham/spam count to" > > for bo th token and db, we should have it be "change ham/spam count by". > > It already does this, actually. Do they now ...? <checks code> Look at that, you're right! I originally thought it was a full update, but now see it's a delta. Well, never mind then. :)
I think this needs to hold off for 2.60. There's still the msgid issue (actually fairly easy to solve, just add a "message-id seen" thing to the journal), and I'd rather leave the "new feature" for the new SA. My goal is to get 2.54 out RSN, then focus all attention on getting 2.60 out in the next month or so.
I really think we should push this to 2.60.
ok, both Dan and myself agree this should wait for 2.60, so I'm reaiming.
this issue will be done with once bug 1881 is closed, as Theo's now implementing the remaining bit (msgids and journal). cheers Theo!
justin already checked in his code, and I finished 1881. :)