Issue 80139 - Avoid ERRORS in duplicate Spreadsheets and Data
Summary: Avoid ERRORS in duplicate Spreadsheets and Data
Alias: None
Product: Calc
Classification: Application
Component: code (show other issues)
Version: OOo 2.2.1
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
Keywords: needmoreinfo
Depends on: 3740 34213 66817 80325 8811
  Show dependency tree
Reported: 2007-07-28 18:24 UTC by discoleo
Modified: 2017-05-20 09:55 UTC (History)
3 users (show)

See Also:
Issue Type: FEATURE
Latest Confirmation in: ---
Developer Difficulty: ---


Note You need to log in before you can comment on or make changes to this issue.
Description discoleo 2007-07-28 18:24:38 UTC
This issue makes it in my TOP 5 of major design flaws of existing spreadsheet




*Duplicate Data* perpetuates ERRORS!

One encounters in current practice dozens of copies of the same spreadsheet
because different people need to work with it, or because one needs only a
subset of the original data (which is copied/saved into a new table/
spreadsheet). [This is NOT a duplicate of issue 8811.]

One of my first tasks - as I started work in a big department - was to oversee
the allocation of revenues to various contractors based on some specific
characteristics. The first issue I stumbled upon, was the *lack of a central
repository* where those characteristics were stored.

Instead, various departments/employees had their own copy of the spreadsheet and
everybody was updating his own copy.

BUT no one had the spreadsheet up to date. There was NO way to get a fully *up
to date* spreadsheet, and there was NO way to *track the changes*.

As a second example, I will describe now a scenario that happened while I was
doing some research. I was the only person involved in analysing the
spreadsheet. However, due to the complexity of the spreadsheet, I created
various (sub-) tables containing only relevant portions of the data.

In the case I detected one error in the data, there was NO method to correct the
error *only ONCE*, and have the correction applied in all sheets/data. The
errors were doomed to persist in the various duplicate data.

Existing spreadsheets offer very little mechanisms to avoid problems of
duplicate data and the errors that arise because of this.

While many will argue that in such cases a database should be used, common
practice shows that most work is still done with spreadsheets, because of the
following reasons:
 a.) easy to create (actually NO effort at all)
 b.) needs NOT a rigid structure (unlike a DB), therefore
     great flexibility in the beginning, when the data may be largely unknown
 c.) you see what you do, and you already see the results
     (unlike for DBs - you need a dedicated software to compute the results)
 d.) NO special coding skills needed


As mentioned, a DB might be tentative, BUT NOT in real practice. Some
spreadsheets offer some (incomplete) solutions (like Excel for issue 8811), BUT
they will all fail on a global scale.

  Spreadsheets should allow collaboration between people. This is described in
greater detail in issue 8811. Features to be implemented:
 - allow simultaneous editing by different users
 - allow various locking mechanisms

  Often, one needs only subsets of the original data to further process. One may
filter this data out and then copy/paste it to a new spreadsheet.

However, what is lacking is to paste NOT the actual data, but a link to the
original data, like a *HARD-LINK* on Unix-like OSs. This is described in greater
detail in issue 66817 (see

A *HARD-LINK* would implement a mechanism through which a correction done in a
copied cell is propagated back in the original cell.

Another issue in this series is issue 34213 that involves pasting references to
the original cells (see
[Though the hard-links would be more powerful.]

  A very useful feature for such collaborative efforts is to track the changes.
This is even more important, as numerous simultaneous changes may slow down the
computer due to the re-calculations.

Therefore, a mechanism should be in place to disable automatic recalculations
BUT to show which cells have potentially changed (and need be updated IF one
needs those values).

Some more brainstorming is here really indicated.

  Another problem of these spreadsheets arises directly due to the frequent
changes. Previous spreadsheets might well be needed (because e.g. of legal
reasons), so one has to save the various versions of the spreadsheet.

I like in this respect especially the versioning scheme described in FORTRESS
(see the Fortress specification,, where new
versions of the same program are saved alongside the old version.

This Versioning-mechanism should be able to store snapshots of the spreadsheet
for future use. [This is definitely more difficult than the case in FORTRESS, so
some more brainstorming is warranted.]

I have already mentioned the relevant issues. There are surely more relevant
issues (to come). In brief:
 - issue 8811: Allow multiple users to edit the same spreadsheet...
 - issue 66817: Paste as HARD-LINK (Unix style)
 - issue 34213: Paste as reference
 - still others to come ...
Comment 1 discoleo 2007-08-16 20:46:49 UTC

I have filed issue 80325 which deals with more advanced features in the
track-changes category (see also issues blocking that one). Please note, this is
not the simple track-changes that is lacking.

The problem with a simple track changes is:
 1.) nobody can track all the changes in a big-spreadsheet
 (consider a 100x1,000 = 100,000 cell spreadsheet, one needs years to review all
changes in a multi-user environment)
 2.) one wants to detect functional areas - track only some specific cells/ranges
 3.) currently, IF a formula-calculated results changes, it is marked as changed
(NOT the changing of the formula itself, BUT the computed value)
 4.) when one detects a suspicious value, one wants to track the changes in
upstream cells that resulted in this suspicious value
 5.) a better history of changes - organized depending on the functional area
Comment 2 discoleo 2007-08-16 20:52:24 UTC
Sorry for a slight ERROR in my previous post:
>  3.) currently, IF a formula-calculated results changes,
>      it is >>marked<< as changed (NOT the changing of the formula itself,
>      BUT the computed value)

3.) currently, IF a formula-calculated result changes,
    it is *NOT marked* as changed (NOT IF changing the formula itself,
    BUT when only the computed value changes)
Comment 3 discoleo 2007-09-16 15:17:59 UTC
Another way to avoid duplicate data is to reference external data using the
existing mechanisms:
 a.) directly referencing data in a different spreadsheet
     [though hard-links would be more powerful]
 b.) using a DDE-Link
     [I will discuss this in a new post]

Unfortunately, both methods show substantial weaknesses.

As described in issue 3740:
Named ranges in one spreadsheet cannot be accessed from another spreadsheet.

Why are *named ranges* such important?

Well, IF the user wrote the master-spreadsheet he may be able to recall in which
rows/columns he did store the relevant data. BUT how well will he remember these
design details in 2 weeks? And in 2 month?

And what are the chances, that a different user will reference the correct range?

Another even worse problem arises when the initial data range changes: say, one
has to add another data row, so *ALL END USERS* will have to correct their
spreadsheets that link to this master-spreadsheet.

This is a formidable task and one that generates numerous *hard-to-trace errors*.

These are the main reasons, why it is desirable to link to a named range!
Comment 4 discoleo 2008-02-05 00:07:02 UTC
An interesting project dealing with more advanced 'Track Changes' and
'Collaborative Editing' for spreadsheets (specifically for Calc) can be found on

see also the following presentation and article on this topic:
Comment 5 oc 2008-07-15 10:43:07 UTC
reassigning features and enhancements to user which
will be the default owner for those tasks (was introduced some time ago)
Comment 6 Edwin Sharp 2014-01-12 12:33:48 UTC
Please attach real life example.
Comment 7 Edwin Sharp 2014-01-26 07:18:01 UTC
No info from author.