Bug 1664 - sync_journal() should do better error handling
Summary: sync_journal() should do better error handling
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 normal
Target Milestone: 2.52
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 1665 1675 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-03-20 15:10 UTC by Theo Van Dinter
Modified: 2003-03-24 04:06 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
suggested patch, apply after bug 1629 patch None Theo Van Dinter [HasCLA]
fix to be applied on top of patch 793 patch None Justin Mason [HasCLA]
lower opportunistic expire contention. patch on top of 793 patch None Theo Van Dinter [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Theo Van Dinter 2003-03-20 15:10:52 UTC
Right now, sync_journal() renames the journal, opens the journal for 
reading, then tries to tie the bayes db r/w.  If the tie fails, we have to 
rename the temp journal back to the original filename (which overwrites a 
new journal which could have been created).

I think the db tie should be tried first as it's the most likely part to fail.  If it 
fails, we don't have to do anything.  If it succeeds, we can then look at 
renaming the journal, etc.
Comment 1 Antony Mawer 2003-03-20 15:30:09 UTC
Subject: Re: [SAdev]  New: sync_journal() should do better error handling 


> Right now, sync_journal() renames the journal, opens the journal for
> reading, then tries to tie the bayes db r/w.  If the tie fails, we have
> to rename the temp journal back to the original filename (which
> overwrites a new journal which could have been created).
> 
> I think the db tie should be tried first as it's the most likely part to
> fail.  If it fails, we don't have to do anything.  If it succeeds, we
> can then look at renaming the journal, etc.

Hearty agreement here.  Very good point.

--j.

Comment 2 Justin Mason 2003-03-20 17:28:15 UTC
also the fix for 1665, and that bug has a better error description which users
would recognise. marking dup.

*** This bug has been marked as a duplicate of 1665 ***
Comment 3 Theo Van Dinter 2003-03-20 18:01:49 UTC
Created attachment 793 [details]
suggested patch, apply after bug 1629
Comment 4 Justin Mason 2003-03-20 18:02:42 UTC
hmm, reopening to mark 1665 a dup of this ;)
Comment 5 Justin Mason 2003-03-20 18:03:15 UTC
*** Bug 1665 has been marked as a duplicate of this bug. ***
Comment 6 Justin Mason 2003-03-20 18:05:59 UTC
OKAY: yes, looks good to me.  now to get the folks to try it out...
Comment 7 Justin Mason 2003-03-20 18:17:57 UTC
Note: patch applies on its own, NOT after patch 792!
Comment 8 Justin Mason 2003-03-21 15:19:02 UTC
OK, there are still issues with that.  It does most of the work, but still
requires all scanner procs to attempt to lock the DB, then eventually find out
that 1 scanner proc had done the expire successfully.  Since 100 scanner procs
will exhaust memory pretty quickly, this is not good.

The following patch fixes this by adding a token to the db, noting that
an expire is in progress (or at least was in progress up to 300 seconds ago).
The sync_journal_trapped() method sets this, and the expire_old_tokens_trapped()
method removes it. The failure cases for these methods do NOT remove the token;
this means that a failure means the same operation will not be attempted for at
least 300 secs, which will help in the face of load spikes and system resource
shortages.

A sync/expire will therefore not take place if the caller determines that a
sync/expire is in progress, or failed up to 300 secs ago.
Comment 9 Justin Mason 2003-03-21 15:19:34 UTC
Created attachment 795 [details]
fix to be applied on top of patch 793
Comment 10 Theo Van Dinter 2003-03-21 15:29:51 UTC
Created attachment 796 [details]
lower opportunistic expire contention.  patch on top of 793
Comment 11 Justin Mason 2003-03-21 15:47:59 UTC
OK, use Theo's one (patch 796), it's better.  only slightly ;)
Comment 12 Theo Van Dinter 2003-03-21 16:14:25 UTC
this'll be in 2.52. :)
Comment 13 Malte S. Stretz 2003-03-21 16:26:06 UTC
OKAY: pretty complay but I can't find a logical error (I don't really know 
that code though) 
Comment 14 Theo Van Dinter 2003-03-22 16:18:32 UTC
just so we don't forget: the second patch needs to be applied to 2.60.  the first should already be applied.
Comment 15 Theo Van Dinter 2003-03-23 10:19:10 UTC
*** Bug 1675 has been marked as a duplicate of this bug. ***
Comment 16 Theo Van Dinter 2003-03-24 13:06:55 UTC
applied to stable branch and 2.6.  closing ticket.