1664 – sync_journal() should do better error handling

Bug 1664 - sync_journal() should do better error handling

Summary: sync_journal() should do better error handling

Status:	RESOLVED FIXED

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	Libraries (show other bugs)
Version:	SVN Trunk (Latest Devel Version)
Hardware:	Other other

Importance:	P5 normal
Target Milestone:	2.52
Assignee:	SpamAssassin Developer Mailing List

URL:
Whiteboard:
Keywords:

Duplicates (2):	1665 1675 (view as bug list)
Depends on:
Blocks:

Reported:	2003-03-20 15:10 UTC by Theo Van Dinter
Modified:	2003-03-24 04:06 UTC (History)
CC List:	2 users (show)

Attachment	Type	Actions	Submitter/CLA Status
suggested patch, apply after bug 1629	patch	None	Theo Van Dinter
fix to be applied on top of patch 793	patch	None	Justin Mason
lower opportunistic expire contention. patch on top of 793	patch	None	Theo Van Dinter
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Theo Van Dinter 2003-03-20 15:10:52 UTC

Right now, sync_journal() renames the journal, opens the journal for 
reading, then tries to tie the bayes db r/w.  If the tie fails, we have to 
rename the temp journal back to the original filename (which overwrites a 
new journal which could have been created).

I think the db tie should be tried first as it's the most likely part to fail.  If it 
fails, we don't have to do anything.  If it succeeds, we can then look at 
renaming the journal, etc.

Comment 1 Antony Mawer 2003-03-20 15:30:09 UTC

Subject: Re: [SAdev]  New: sync_journal() should do better error handling 


> Right now, sync_journal() renames the journal, opens the journal for
> reading, then tries to tie the bayes db r/w.  If the tie fails, we have
> to rename the temp journal back to the original filename (which
> overwrites a new journal which could have been created).
> 
> I think the db tie should be tried first as it's the most likely part to
> fail.  If it fails, we don't have to do anything.  If it succeeds, we
> can then look at renaming the journal, etc.

Hearty agreement here.  Very good point.

--j.

Comment 2 Justin Mason 2003-03-20 17:28:15 UTC

also the fix for 1665, and that bug has a better error description which users
would recognise. marking dup.

*** This bug has been marked as a duplicate of 1665 ***

Comment 3 Theo Van Dinter 2003-03-20 18:01:49 UTC

Created attachment 793 [details]
suggested patch, apply after bug 1629

Comment 4 Justin Mason 2003-03-20 18:02:42 UTC

hmm, reopening to mark 1665 a dup of this ;)

Comment 5 Justin Mason 2003-03-20 18:03:15 UTC

*** Bug 1665 has been marked as a duplicate of this bug. ***

Comment 6 Justin Mason 2003-03-20 18:05:59 UTC

OKAY: yes, looks good to me.  now to get the folks to try it out...

Comment 7 Justin Mason 2003-03-20 18:17:57 UTC

Note: patch applies on its own, NOT after patch 792!

Comment 8 Justin Mason 2003-03-21 15:19:02 UTC

OK, there are still issues with that.  It does most of the work, but still
requires all scanner procs to attempt to lock the DB, then eventually find out
that 1 scanner proc had done the expire successfully.  Since 100 scanner procs
will exhaust memory pretty quickly, this is not good.

The following patch fixes this by adding a token to the db, noting that
an expire is in progress (or at least was in progress up to 300 seconds ago).
The sync_journal_trapped() method sets this, and the expire_old_tokens_trapped()
method removes it. The failure cases for these methods do NOT remove the token;
this means that a failure means the same operation will not be attempted for at
least 300 secs, which will help in the face of load spikes and system resource
shortages.

A sync/expire will therefore not take place if the caller determines that a
sync/expire is in progress, or failed up to 300 secs ago.

Comment 9 Justin Mason 2003-03-21 15:19:34 UTC

Created attachment 795 [details]
fix to be applied on top of patch 793

Comment 10 Theo Van Dinter 2003-03-21 15:29:51 UTC

Created attachment 796 [details]
lower opportunistic expire contention.  patch on top of 793

Comment 11 Justin Mason 2003-03-21 15:47:59 UTC

OK, use Theo's one (patch 796), it's better.  only slightly ;)

Comment 12 Theo Van Dinter 2003-03-21 16:14:25 UTC

this'll be in 2.52. :)

Comment 13 Malte S. Stretz 2003-03-21 16:26:06 UTC

OKAY: pretty complay but I can't find a logical error (I don't really know 
that code though)

Comment 14 Theo Van Dinter 2003-03-22 16:18:32 UTC

just so we don't forget: the second patch needs to be applied to 2.60.  the first should already be applied.

Comment 15 Theo Van Dinter 2003-03-23 10:19:10 UTC

*** Bug 1675 has been marked as a duplicate of this bug. ***

Comment 16 Theo Van Dinter 2003-03-24 13:06:55 UTC

applied to stable branch and 2.6.  closing ticket.