SA Bugzilla – Bug 1776
Upgrade Bayes DB format to v1
Last modified: 2003-04-23 13:59:22 UTC
The 2.50 Bayes DB format (v1) has several issues, including: - token atime is an unsigned short, so it overflows - token values are endian-dependent - magic tokens could be found in messages - no db version magic token exists So for 2.60, we should come up with a new format that addresses these issues.
- message ids may not exist and may not be unique
ok, my initial version is committed to head. - atime is now an unsigned int (32-bit) so it shouldn't overflow for quite some time - token values are forced vax (little endian) order - magic tokens were modified to have a prefix of ^M^A^G^I^C instead of ** so that there's no possibility of a message corrupting the magic tokens (control chars are automatically stripped when tokenizing) - db version magic token added and code added to set it when the db is initially created I left the message id issue alone for now. It's bug 1588. We should probably come up with something for that too, but that's not really part of the token db format change. all that's left is to figure out what atime should be. one camp (myself included) thinks message count should stay, another thinks timestamp (some variation of time_t into 2bytes (ie: instead of 1 second, do 6-18 hour blocks)). since the format isn't set in stone, but the code is committed, I disabled bayes r/w access in 2.60 until we decide on the format.
Subject: Re: [SAdev] Upgrade Bayes DB format to v1 > - magic tokens were modified to have a prefix of ^M^A^G^I^C instead of ** so > that there's no possibility of a message corrupting the magic tokens (control > chars are automatically stripped when tokenizing) How about magic tokens simply starting with \000 instead of something so complicated?
Subject: Re: [SAdev] Upgrade Bayes DB format to v1 > > - magic tokens were modified to have a prefix of ^M^A^G^I^C instead of ** s > o > > that there's no possibility of a message corrupting the magic tokens (contr > ol > > chars are automatically stripped when tokenizing) > > How about magic tokens simply starting with \000 instead of something > so complicated? I think \000 is probably best avoided -- at least because db manipulation may become impossible in some languages in that case. --j.
Subject: Re: Upgrade Bayes DB format to v1 On Tue, Apr 22, 2003 at 08:53:00PM -0700, bugzilla-daemon@hughes-family.org wrote: > How about magic tokens simply starting with \000 instead of something > so complicated? Yeah, what Justin said. ;) I wanted to specifically avoid null. I thought about just using a single control char, but decided it would be better to be more complex for the magic tokens since they're so important. It's also not wasting a ton of space since there's only ~7 magic tokens. So if we used 1 instead of 5, that's 4 bytes * 7 tokens or 28 bytes saved.
ok, I've only heard from Justin WRT atime, and he voted msgcount as well. so msgcount it is. I'll re-enable bayes r/w in 2.60 and close the ticket. :)