SA Bugzilla – Bug 1664
sync_journal() should do better error handling
Last modified: 2003-03-24 04:06:55 UTC
Right now, sync_journal() renames the journal, opens the journal for reading, then tries to tie the bayes db r/w. If the tie fails, we have to rename the temp journal back to the original filename (which overwrites a new journal which could have been created). I think the db tie should be tried first as it's the most likely part to fail. If it fails, we don't have to do anything. If it succeeds, we can then look at renaming the journal, etc.
Subject: Re: [SAdev] New: sync_journal() should do better error handling > Right now, sync_journal() renames the journal, opens the journal for > reading, then tries to tie the bayes db r/w. If the tie fails, we have > to rename the temp journal back to the original filename (which > overwrites a new journal which could have been created). > > I think the db tie should be tried first as it's the most likely part to > fail. If it fails, we don't have to do anything. If it succeeds, we > can then look at renaming the journal, etc. Hearty agreement here. Very good point. --j.
also the fix for 1665, and that bug has a better error description which users would recognise. marking dup. *** This bug has been marked as a duplicate of 1665 ***
Created attachment 793 [details] suggested patch, apply after bug 1629
hmm, reopening to mark 1665 a dup of this ;)
*** Bug 1665 has been marked as a duplicate of this bug. ***
OKAY: yes, looks good to me. now to get the folks to try it out...
Note: patch applies on its own, NOT after patch 792!
OK, there are still issues with that. It does most of the work, but still requires all scanner procs to attempt to lock the DB, then eventually find out that 1 scanner proc had done the expire successfully. Since 100 scanner procs will exhaust memory pretty quickly, this is not good. The following patch fixes this by adding a token to the db, noting that an expire is in progress (or at least was in progress up to 300 seconds ago). The sync_journal_trapped() method sets this, and the expire_old_tokens_trapped() method removes it. The failure cases for these methods do NOT remove the token; this means that a failure means the same operation will not be attempted for at least 300 secs, which will help in the face of load spikes and system resource shortages. A sync/expire will therefore not take place if the caller determines that a sync/expire is in progress, or failed up to 300 secs ago.
Created attachment 795 [details] fix to be applied on top of patch 793
Created attachment 796 [details] lower opportunistic expire contention. patch on top of 793
OK, use Theo's one (patch 796), it's better. only slightly ;)
this'll be in 2.52. :)
OKAY: pretty complay but I can't find a logical error (I don't really know that code though)
just so we don't forget: the second patch needs to be applied to 2.60. the first should already be applied.
*** Bug 1675 has been marked as a duplicate of this bug. ***
applied to stable branch and 2.6. closing ticket.