Issue 99961 - Modifying table design causes the db to grow (old data might be kept in file)
Summary: Modifying table design causes the db to grow (old data might be kept in file)
Status: CONFIRMED
Alias: None
Product: Base
Classification: Application
Component: code (show other issues)
Version: OOo 3.0.1
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-06 10:44 UTC by wimbaetens
Modified: 2013-01-29 21:47 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
sample db for reproduction (including instructions) (41.24 KB, application/x-compressed)
2009-05-25 12:56 UTC, wimbaetens
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description wimbaetens 2009-03-06 10:44:59 UTC
DB type=HSQLDB

When repeatedly modifying the table design (e.g. during a normalisation), I have
the feeling that a lot of (old) garbage data is still kept in the db-file. My
suspicion is confirmed when I create a totally new DB and copy table by table,
query by query to the new, empty db-file. The difference can be quite big: the
new copy can be 2MB, where the original db-file reaches sometimes 10x as much
(20MB!), which makes it a real pain to use that db. Is the old data somehow
staying behind in the db-file or am I doing something really stupid? 

I think that the following example may reproduce the phenomenon.

I started out with non-normalised tables, which I started to normalise, for
example (tblNames):
ID;Name;Type
"0";"name1";"some very long description containing a lot of text duplication"
"1";"name2";"some very long description containing a lot of text duplication"
"2";"name3";"some very long description containing a lot of text duplication"
"3";"name4";"another long description"

where 
ID=integer (primary key, autovalue)
Name=VARCHAR 10
Type=VARCHAR 255

For these four records, I have maximum 4x (integer+Varchar10+Varchar255) bytes =
4x (10+10+255) = 1100

The example used here contains only a few records. In reality, I had a table
with hundreds of records and only about three distinct values for 'Type'.

Created a second table (tblDescription):
ID;TXT
"0";"some very long description containing a lot of text duplication"
"1";"another long description"

tblNames gets a new column:
ID;Name;Type;D_ID
where tblNames.D_ID refers to tblDescription.ID

Eventually, column tblNames.Type is deleted.
At this point, I have the following estimate:
tblNames: 4x (integer+Varchar10+integer) = 4x (10+10+10) = 120
tblDescription: 2x (integer+Varchar255) = 2x (10+255) = 530
TOTAL: max 650 bytes

In practice, the db-file keeps growing, as if the original tblNames.type column
is still kept somewhere. It's true that this is just a small example, but as
stated above, a db can grow from 2 to 20MB using similar operations.
To get the data from the beginning, I use a query or table view that
dereferences tblNames.D_ID to tblDescription.TXT.

If necessary, I can come up with a larger example db-file (however, it will cost
me quite some time to repeat the process)
Comment 1 andreschnabel 2009-03-15 13:02:52 UTC
confirmed on XP and Linux.

wimbaetens: can you please attach an example document?

I'll try to provide an example as well.
Comment 2 wimbaetens 2009-05-25 12:56:22 UTC
Created attachment 62498 [details]
sample db for reproduction (including instructions)