Bug 6663 - Use unsigned database fields for int
Summary: Use unsigned database fields for int
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Documentation (show other bugs)
Version: unspecified
Hardware: All All
: P5 enhancement
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
Depends on:
Reported: 2011-09-21 19:43 UTC by Michiel Hazelhof
Modified: 2011-10-08 11:29 UTC (History)
1 user (show)

Attachment Type Modified Status Actions Submitter/CLA Status
Test for the sql tables application/x-sql None Michiel Hazelhof [NoCLA]
New revision #1 application/x-sql None Michiel Hazelhof [NoCLA]
bayes_mysql #1 application/x-sql None Michiel Hazelhof [NoCLA]
userpref_mysql #1 application/x-sql None Michiel Hazelhof [NoCLA]
awl_mysql #1 application/x-sql None Michiel Hazelhof [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Michiel Hazelhof 2011-09-21 19:43:39 UTC
Using unsigned integers saves some database size and allows for integers to grow bigger.

My patch includes awl_mysql.sql, bayes_mysql.sql and userpref_mysql.sql.

As a secondary question should UTF-8 or another? This way it is determined via the default MySQL? (Latin might be smaller).
Comment 1 Michiel Hazelhof 2011-09-21 19:47:19 UTC
Not sure wether `last_atime_delta` and `last_expire_reduce` could be unsigned, as I haven't got enough test data.
Comment 2 Michiel Hazelhof 2011-09-21 20:21:21 UTC
Latin saves about 1.2% (database copy) but I can't verify whether the database is still intact or not,
Comment 3 Michiel Hazelhof 2011-09-21 20:37:09 UTC
Correction, didn't take the correct measurement regarding the utf8 vs latin, latin costs about 1/3 compared to utf8.
Comment 4 Mark Martinec 2011-09-21 22:22:17 UTC
> My patch includes awl_mysql.sql, bayes_mysql.sql and userpref_mysql.sql

Where is the patch? Should probably be attached here.

> As a secondary question should UTF-8 or another?

Which fields?

A fix for a MySQL data type of bayes_token.token is already in trunk,
it should be binary (just as it already is for PostgreSQL) as it
contains octets which are not associated with any character set.
See Bug 6625.

As for other fields, like username or IP address or preference
or a value of a preference, I don't know - these should probably
also not be associated with any particular character set so a
varbinary would probably suit them best, but ascii (or Latin-1)
is just as good as they are not supposed to contain any characters
outside of the printable ascii set.
Comment 5 Michiel Hazelhof 2011-09-22 08:04:55 UTC
Created attachment 4965 [details]
Test for the sql tables
Comment 6 Michiel Hazelhof 2011-09-22 08:26:57 UTC
Sorry I forgot to add the sql file.
Setting fields like bayes_token.id to unsigned , allows them to contain higher numbers and potentially avoid some problems.

All fields which never contain negative integers should be unsigned (unsigned: 0 to 4,294,967,295 and signed: -2,147,438,648 to 2,147,438,647).

I am not sure about the following fields regarding to unsigned:

As for UTF8 it reduces memory usage a bit (I guess, just my measurements) and is capable of holding nearly all possible characters (that should be used anyway). Also defining a default set should help with debugging purposes as well since everybody should be having the same set.
Comment 7 Michiel Hazelhof 2011-09-22 08:27:15 UTC
Created attachment 4966 [details]
New revision #1
Comment 8 Michiel Hazelhof 2011-09-22 08:27:32 UTC
Created attachment 4967 [details]
bayes_mysql #1
Comment 9 Michiel Hazelhof 2011-09-22 08:27:50 UTC
Created attachment 4968 [details]
userpref_mysql #1
Comment 10 Michiel Hazelhof 2011-09-22 08:28:08 UTC
Created attachment 4969 [details]
awl_mysql #1
Comment 11 Michiel Hazelhof 2011-10-08 11:29:53 UTC
Determine whether the following fields can be negative: bayes_expire.runtime, bayes_vars.last_atime_delta, bayes_vars.last_expire_reduce

Add Upgrade notice for table changes (UTF-8 + unsigned), also for Bug 6625 I guess (needed to empty my table for this one).