Issue 79924

Summary: Strong Typing in Formulas
Product: Calc Reporter: discoleo <discoleo>
Component: codeAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: issues
Version: OOo 2.2.1   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: FEATURE Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 58309    

Description discoleo 2007-07-23 19:23:45 UTC
*This is a killer feature*!


= *INTRO* =

Every decent programing language implements a typing mechanism of variables to
reduce the rate of errors.

One advantage of C++ over C was exactly the stronger typing mechanism (in
addition to namespaces; classes and templates contributed strongly to the more
robust typing). However, spreadsheets have failed to introduce typing, although
a great number of errors could be prevented IF such a mechanism was built in.

I consider this a *major design flaw* of existing spreadsheet applications, and
one that makes it in my personal *TOP 5* of fundamental design flaws.

I estimate (blindly) that almost one third of significant spreadsheet errors
could be prevented by such a mechanism. Please note, that I have a vast
experience in working with spreadsheets, overseeing 100+ employees that work
extensively with spreadsheets.

This issue has been partially discussed on the OASIS mailing list (see
http://lists.oasis-open.org/archives/office-comment/200706/msg00005.html and the
ensuing discussion). Though the aforementioned thread discussed units on a more
broader scale, I wish to focus here entirely on currency.


= *CURRENCY* =

Calc should implement a strong typing system. In a first phase, I advocate
implementing a general typing system for currency (a generic currency type),
later extending it to further units (probably 'time-units' being next and
followed by other units).

= *OPERATIONS* =

Valid operations with this unit:
1. (currency) + (currency) => (currency)
2. (currency) * scalar => (currency)
3. (currency) / scalar => (currency)
4. (currency) / (currency) => scalar

Invalid operations:
1. (currency) + scalar !!! (many errors)
2. (currency) * (currency) (see statistics comment; maybe allowing this
operation, BUT IF the cell formatting has UNIT<currency>, raise an ERROR)
3. (currency)^(x), where x != 1
[statistics should be possible to compute with currencies, although that would
involve terms of form (currency)^2; this should be solved somehow]

= *FURTHER DEVELOPMENT* =

Various subtypes of currency should be possible to create in later versions of
the implementation, e.g.
1. Euro as type <currency>
2. US$ as type <currency>
3. various other currencies (custom currencies)

Still later, there should exist the possibility to define conversions between
these currencies subtypes. There would be 2 types of conversions:

1. fixed:
 - defined once and stored with the document, e.g. 1 Euro = US$ 1.33
 - every conversion between Euro and US$ would be accomplished using
   this explicit conversion factor

2. dynamic:
 - allow the conversion factor to be retrieved dynamically from
   a web-site, server, some other place, ...
 - conversions are done dynamically, either at:
   -- opening the document
   -- some pre-specified time point
   -- x-times per day
   -- some other condition
[This would be especially interesting for international enterprises and the
financial market.]


= *ADVANTAGES* =

 - many spreadsheet ERRORS would be timely prevented
  (many at design time)
 - significant financial losses averted
 - a clear advantage over competing products
 - would set the standard in the field, not merely copying MS Excel
Comment 1 discoleo 2007-07-24 20:33:36 UTC
I add here some other useful comments from the OASIS mailing list:

1. http://lists.oasis-open.org/archives/office-comment/200706/msg00028.html
2. http://lists.oasis-open.org/archives/office-comment/200706/msg00030.html
3. http://lists.oasis-open.org/archives/office-comment/200706/msg00032.html
4. http://lists.oasis-open.org/archives/office-comment/200706/msg00033.html
5. http://lists.oasis-open.org/archives/office-comment/200706/msg00036.html

Thats a brief discussion of problems and possible solutions. I hope that the
developers recognise the real potential of data typing and - ultimately - decide
to implement this feature.
Comment 2 discoleo 2007-08-09 09:40:24 UTC
I found some more interesting data on the OOo web site:

Issue 58309 (http://qa.openoffice.org/issues/show_bug.cgi?id=58309) is a similar
request for units of length (basically for feet-type units).

Also, there was a similar request on the OOo forums some years ago, see
http://www.oooforum.org/forum/viewtopic.phtml?t=14674 for a detailed discussion.
Comment 3 frank 2007-08-09 10:21:10 UTC
Hi,

IMHO this would confuse most of the users especially those who came from Excel.
But let's requirements decide.

Frank
Comment 4 discoleo 2007-08-09 13:06:16 UTC
discoleo->fst

Please have a look at the following article:
http://portal.acm.org/citation.cfm?id=1140346

To quote from the abstract:

   Since the error rate in spreadsheets is very high and since those errors have
   significant impact, methods and tools that can help detect and remove errors
   from spreadsheets are very much needed. Type systems have traditionally
   played a strong role in detecting errors in programming languages, and it is
   therefore reasonable to ask whether type systems could not be helpful in
   improving the current situation of spreadsheet programming.

No, it won't confuse users, it will really help them to avoid errors. A lot of
errors. The article I posted is from 2006. Indeed, there is recently much work
in this area.
Comment 5 discoleo 2007-08-09 20:01:00 UTC
discoleo->fst
discoleo->all

Please apologise posting again. I noticed that the article posted previously is
not available to non-registered users (it must be purchased).

Here is a free link to that article:
http://web.engr.oregonstate.edu/~erwig/papers/TypeInf_PPDP06.pdf.

It contains a lot of useful information. Please read it as it is real research
on spreadsheets.
Comment 6 jbasc19921jbkas 2008-03-20 20:09:55 UTC
Do we really need the formula to be blocked from working at all? What about
presenting a dialogue box at the time the formula is entered (e.g. "This formula
may contain an error as it multiplies the contents of two cells which are
formatted as currency. Edit formula / Ignore warning / Help"), or putting on an
error warning note in the same way that Excel does for e.g. inconsistent formulas?

There is a flaw in saying that e.g. currency * currency is always invalid.

currency * currency / currency ought to be valid (it could be used e.g. in
various rescaling calculations) so therefore by extension the (yucky!) (currency
* currency) / currency ought to be as well.

Moreover currency * currency / 1.48 ought to be valid if the 1.48 really refers
to a currency. That would be bad spreadsheet design (the 1.48 ought to entered
into a cell formatted as currency and then referenced in the formula) but for
example students who are just starting to learn how to use Spreadsheets (I teach
a lot of them!) often progress gradually from entering formulas without
references (e.g. "=2.12*3.57/1.48") to formulas that consist entirely of
references, via formulas that contain partly numbers and partly cell references.

Using data typing (especially of currencies) to IDENTIFY potential errors is a
fantastic idea!! But while it should warn users of the strong possibility of an
error I do not believe the formula should be invalidated.
Comment 7 discoleo 2008-04-27 22:25:53 UTC
> currency * currency / currency ought to be valid (it could be used
> e.g. in various rescaling calculations) so therefore by extension the
> (yucky!) (currency * currency) / currency ought to be as well

While "currency * currency / currency" is valid as it returns currency, this
does NOT justify "currency * currency". During my years of Auditing / Quality
Control involving spreadsheets I found way too many errors to be comfortable
with such formulas.

As said, IF the whole formula was written within one cell, NO error would be
raised. Strong typing would still found that the result is "currency", so NO
error. [Although I do not encourage such usage.]

IF however, one cell contains "currency * currency", then this is at least bad
design. IF the student needs a conversion factor (please note, I do teach
spreadsheets, too), then the correct way is to define it somewhere like:
 = currency / currency which is a scalar

[
 I may add, that this definition should proceed in a special area of
 the spreadsheet, for which I posted a new feature request:
 http://www.openoffice.org/issues/show_bug.cgi?id=67499
]

Multiplying with this scalar is OK and never would raise an error. Also, I
advocate introducing beyond the generic currency, more specific currencies and
handle conversions on a higher - more structure - level. [see the FURTHER
DEVELOPMENT in my first post]

I was also NOT very explicit about raising errors for valid operations, even IF
the result is a bogus unit:
 > 2. (currency) * (currency) (see statistics comment; maybe allowing
 >    this operation, BUT IF the cell formatting has UNIT<currency>,
 >    raise an ERROR)

However, in the meantime, I am even less convinced that "currency * currency"
should be accepted without raising some error/warning. The only argument against
is the variance, which has a valid unit of "currency * currency" in this example.