Bug 51710 - getCellFormula() returns incorrect result
Summary: getCellFormula() returns incorrect result
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.8-dev
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2011-08-23 15:34 UTC by Antti Koskimäki
Modified: 2012-02-27 12:21 UTC (History)
0 users

Excel plus unit-test (27.96 KB, application/x-zip-compressed)
2011-08-23 15:34 UTC, Antti Koskimäki
sheet1.xml in readable form (40.00 KB, application/x-zip-compressed)
2011-08-23 20:54 UTC, Antti Koskimäki
graphical explanation of the problem (260.31 KB, image/png)
2012-02-27 12:14 UTC, Yegor Kozlov
graphical explanation of the problem (251.66 KB, image/png)
2012-02-27 12:19 UTC, Yegor Kozlov

Note You need to log in before you can comment on or make changes to this bug.
Description Antti Koskimäki 2011-08-23 15:34:33 UTC
Created attachment 27425 [details]
Excel plus unit-test

I came across to situation where XSSFCell.getCellFormula() returned incorrect results. Excel-file is created with Excel-2007. 

Tested on r1160172. Attached test-excel plus unit-test.
Comment 1 Nick Burch 2011-08-23 15:51:06 UTC
Does your file use shared formulas that don't start on the first cell by any chance? (There's a current discussion on the dev list about those)
Comment 2 Antti Koskimäki 2011-08-23 19:13:47 UTC
You ment this discussion ?


Yes, this seems to relate to the same issue. First of all, as far as I can recall, I created the example file with simple routine; first made one formula with "semi-static" references, then copied it from left to right for a total row, then copy for row, paste for area downwards. So the use case is "shared formula" case.

I looked into sheet1.xml, and there is this weird syntax in the exact cell where the problem starts to show, E60. Until that, the row 60 is OK with si="2" but ref-field in E60 is weird no matter how you think, and then for every cell with si="3" POI returns incorrect formula.


<row r="60" spans="1:13">
<c r="A60" s="1">
<c r="B60">
<f t="shared" si="2"/>
<c r="C60">
<f t="shared" si="2"/>
<c r="D60">
<f t="shared" si="2"/>
<c r="E60">
<f t="shared" ref="C60:M83" si="3">$A60*E$2</f>
<c r="F60">
<f t="shared" si="3"/>
<c r="G60">
<f t="shared" si="3"/>
Comment 3 Antti Koskimäki 2011-08-23 20:52:18 UTC
I stared the xml for a while and then noticed, that only thing that doesn't make sense is "ref" attribute of formula. The declared shared formula is always correct relative to the containing cell. The ref-attribute on the other hand is almost random, and does not in any way reflect to the formula copying process. What it seems to define is boundaries where the shared instance is in use.

When I count the references, it appears that there's familiar magic numbers involved :=)

 176 si="0"
 255 si="1"
 255 si="2"
 255 si="3"
  64 si="4"
 255 si="5"
 255 si="6"
  64 si="7"

So, it seems to me that 
A) one formula "instance" is shared maximum 255 times
B) distribution of these shared instances is somewhat random (mystical optimization?)
C) ref-attribute really does only define the "boundaries", nothing else

I also peeked the POI code, and found out that there's calculations on ss-level. It raised a question if HSSF-heritage is misleading the XSSF-implementation here ?

P.S. document that made me understand this shared-thing:  http://www.codeproject.com/KB/XML/ooxml_is_defective.aspx (see chapter 3)
Comment 4 Antti Koskimäki 2011-08-23 20:54:15 UTC
Created attachment 27427 [details]
sheet1.xml in readable form

Attached the sheet1.xml (from example xlsx) in readable form (with linefeeds etc)
Comment 5 Yegor Kozlov 2012-02-27 12:14:00 UTC
Created attachment 28391 [details]
graphical explanation of the problem
Comment 6 Yegor Kozlov 2012-02-27 12:19:25 UTC
Created attachment 28392 [details]
graphical explanation of the problem
Comment 7 Yegor Kozlov 2012-02-27 12:21:19 UTC
Should be fixed in r1294127, your test file included in the POI test collection.

It appeares that if a shared formula range preceeds its master cell then the preceding part is discarded.
For example, if the cell is E60 and the shared formula range is C60:M85 then the effective range is E60:M85 and the part C60:M85 is ignored. 

See the graphical explanation of the issue.