Apache OpenOffice (AOO) Bugzilla – Issue 99961
Modifying table design causes the db to grow (old data might be kept in file)
Last modified: 2013-01-29 21:47:32 UTC
DB type=HSQLDB When repeatedly modifying the table design (e.g. during a normalisation), I have the feeling that a lot of (old) garbage data is still kept in the db-file. My suspicion is confirmed when I create a totally new DB and copy table by table, query by query to the new, empty db-file. The difference can be quite big: the new copy can be 2MB, where the original db-file reaches sometimes 10x as much (20MB!), which makes it a real pain to use that db. Is the old data somehow staying behind in the db-file or am I doing something really stupid? I think that the following example may reproduce the phenomenon. I started out with non-normalised tables, which I started to normalise, for example (tblNames): ID;Name;Type "0";"name1";"some very long description containing a lot of text duplication" "1";"name2";"some very long description containing a lot of text duplication" "2";"name3";"some very long description containing a lot of text duplication" "3";"name4";"another long description" where ID=integer (primary key, autovalue) Name=VARCHAR 10 Type=VARCHAR 255 For these four records, I have maximum 4x (integer+Varchar10+Varchar255) bytes = 4x (10+10+255) = 1100 The example used here contains only a few records. In reality, I had a table with hundreds of records and only about three distinct values for 'Type'. Created a second table (tblDescription): ID;TXT "0";"some very long description containing a lot of text duplication" "1";"another long description" tblNames gets a new column: ID;Name;Type;D_ID where tblNames.D_ID refers to tblDescription.ID Eventually, column tblNames.Type is deleted. At this point, I have the following estimate: tblNames: 4x (integer+Varchar10+integer) = 4x (10+10+10) = 120 tblDescription: 2x (integer+Varchar255) = 2x (10+255) = 530 TOTAL: max 650 bytes In practice, the db-file keeps growing, as if the original tblNames.type column is still kept somewhere. It's true that this is just a small example, but as stated above, a db can grow from 2 to 20MB using similar operations. To get the data from the beginning, I use a query or table view that dereferences tblNames.D_ID to tblDescription.TXT. If necessary, I can come up with a larger example db-file (however, it will cost me quite some time to repeat the process)
confirmed on XP and Linux. wimbaetens: can you please attach an example document? I'll try to provide an example as well.
Created attachment 62498 [details] sample db for reproduction (including instructions)