Bug 1719 - sa-learn locking/unlocking for every message
Summary: sa-learn locking/unlocking for every message
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 2.52
Hardware: Other other
: P5 major
Target Milestone: 2.60
Assignee: Justin Mason
URL:
Whiteboard:
Keywords: backport
: 1673 (view as bug list)
Depends on: 1881
Blocks:
  Show dependency tree
 
Reported: 2003-03-31 15:44 UTC by Justin Mason
Modified: 2003-05-09 07:39 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
fix patch None Justin Mason [HasCLA]
re-added learn-to-journal patch None Justin Mason [HasCLA]
re-added learn-to-journal, take 2 patch None Justin Mason [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2003-03-31 15:44:39 UTC
sa-learn locks and unlocks the Bayes dbs for *every* message it learns from.
This is obviously suboptimal.
Comment 1 Justin Mason 2003-03-31 15:45:20 UTC
Created attachment 851 [details]
fix
Comment 2 Justin Mason 2003-03-31 15:45:48 UTC
fix attached.
Comment 3 Theo Van Dinter 2003-03-31 15:57:57 UTC
*** Bug 1673 has been marked as a duplicate of this bug. ***
Comment 4 Theo Van Dinter 2003-03-31 16:10:48 UTC
OKAY: looks good
Comment 5 Daniel Quinlan 2003-03-31 16:17:49 UTC
from IRC:

<quinlan> doesnt the 1719 fix mean a learner will have the db locked for a very
long time?
<jm> an "sa-learn" learner may.
<jm> auto-learning etc. will not
<jm> one's manual, the other isn't; so UI-wise, a manually-run process is OK to
keep it locked
<quinlan> jm: I'm concerned about a long-running sa-learn causing thousands of
spamd processes to queue up
<jm> in that case, we should untie every 50 messages or so -- or use the journal
when learning.  I'd favour the latter.
<jm> OK, we can leave 1719 out, and implement journalled learning in 2.60
Comment 6 Theo Van Dinter 2003-04-03 13:29:34 UTC
so was the decision to not bother with this patch or what?  (I want to close it
if  it was decided to not do this...  I think we should lock for the duration of
learning, and untie occasionally, but ...)
Comment 7 Antony Mawer 2003-04-03 15:11:11 UTC
Subject: Re: [SAdev]  sa-learn locking/unlocking for every message 


> so was the decision to not bother with this patch or what?  (I want to
> close it if  it was decided to not do this...  I think we should lock
> for the duration of learning, and untie occasionally, but ...)

I think Dan wanted to use a different method; perhaps locking & unlocking
every 20 msgs or similar.

Comment 8 Eugene Miretsky 2003-04-10 10:46:44 UTC
I have a small request: if this patch is ever implemented, please
add an option to disable this functionality (something like: -n <number> -- 
retie db every <number> messages).

I think that this functionality does not buy you that much,
and it actualy adds more headache (at least for me).

sa-learn is an offline process.  I do not see any reason why you should
complicate matters with this patch.  It can have very adverse affects on servers
with autolearning enabled.
Comment 9 Justin Mason 2003-04-22 00:19:55 UTC
Created attachment 910 [details]
re-added learn-to-journal
Comment 10 Justin Mason 2003-04-22 00:30:08 UTC
Created attachment 911 [details]
re-added learn-to-journal, take 2
Comment 11 Justin Mason 2003-04-22 00:30:52 UTC
OK guys, take a look and tell me what you think... it's journalled learning,
so the db isn't locked during an "sa-learn --no-rebuild" op.
Comment 12 Justin Mason 2003-04-22 11:50:29 UTC
actually, scratch that.  I'm going to redo this in-RAM, then add to journal at
end; more reliable in case of user CTRL-C.  but a bit busy right now....
Comment 13 Theo Van Dinter 2003-04-22 11:53:29 UTC
fyi: patch 851 causes things to break badly.   it makes finish() check caller_will_untie, which is bad 
because that's the function that gets called to untie...

I opened a new bug (1813) to remind me to go fix that in head.
Comment 14 Theo Van Dinter 2003-04-23 11:53:10 UTC
ISSUE: I have two issues actually.

1) you have to be sure that all updates go through the journal which theoretically would happen, 
but I can come up with places where that may not be true.  for example, someone wants to try it, 
learns to journal for a little bit, doesn't like that so they disable it before doing a rebuild, then 
learn some more.  when the sync occurs, the db gets "corrupt".

the easiest way to solve that is instead of "change ham/spam count to" for both token and db, we 
should have it be "change ham/spam count by".

2) doing defer_update seems to not include anyway to specify a message id for the seen db.  so 
the end result is not the same as the non-deferred version.  a related issue is that you can learn 
the same mail continuously, then sync and have the mail count multiple times.
Comment 15 Antony Mawer 2003-04-23 15:07:33 UTC
Subject: Re:  sa-learn locking/unlocking for every message 


bugzilla-daemon@hughes-family.org said:

> 1) you have to be sure that all updates go through the journal which
> theoreti cally would happen, but I can come up with places where that
> may not be true.  for example, someo ne wants to try it, learns to
> journal for a little bit, doesn't like that so they disable it befo re
> doing a rebuild, then learn some more.  when the sync occurs, the db
> gets "corrupt".
> the easiest way to solve that is instead of "change ham/spam count to"
> for bo th token and db, we should have it be "change ham/spam count by".

It already does this, actually.

> 2) doing defer_update seems to not include anyway to specify a message
> id for the seen db.  so the end result is not the same as the
> non-deferred version.  a related issue is that you can learn the same
> mail continuously, then sync and have the mail count multiple times.

Yes, this is an issue.  I'll fix it in the updated patch.

--j.

Comment 16 Theo Van Dinter 2003-04-23 17:42:26 UTC
Subject: Re:  sa-learn locking/unlocking for every message

On Wed, Apr 23, 2003 at 03:07:34PM -0700, bugzilla-daemon@hughes-family.org wrote:
> > the easiest way to solve that is instead of "change ham/spam count to"
> > for bo th token and db, we should have it be "change ham/spam count by".
> 
> It already does this, actually.

Do they now ...?   <checks code>

Look at that, you're right!  I originally thought it was a full update,
but now see it's a delta.  Well, never mind then. :)

Comment 17 Theo Van Dinter 2003-05-03 14:57:16 UTC
I think this needs to hold off for 2.60.  There's still the msgid issue 
(actually fairly easy to solve, just add a "message-id seen" thing to the 
journal), and I'd rather leave the "new feature" for the new SA.

My goal is to get 2.54 out RSN, then focus all attention on getting 2.60 out in 
the next month or so.
Comment 18 Daniel Quinlan 2003-05-03 21:03:06 UTC
I really think we should push this to 2.60.
Comment 19 Theo Van Dinter 2003-05-04 04:55:22 UTC
ok, both Dan and myself agree this should wait for 2.60, so I'm reaiming.
Comment 20 Justin Mason 2003-05-09 14:38:26 UTC
this issue will be done with once bug 1881 is closed, as Theo's now implementing
the remaining bit (msgids and journal). cheers Theo!
Comment 21 Theo Van Dinter 2003-05-09 15:39:18 UTC
justin already checked in his code, and I finished 1881.  :)