Bug 54636 - getNumericCellValue return different result for visually equal numbers
Summary: getNumericCellValue return different result for visually equal numbers
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.9-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2013-03-05 03:33 UTC by Yujun Liang
Modified: 2013-03-07 07:27 UTC (History)
0 users

File to illustrate the bug, please read it with POI and debug the Cell. (30.50 KB, application/vnd.ms-excel)
2013-03-05 03:33 UTC, Yujun Liang
Modified file. (31.00 KB, application/vnd.ms-excel)
2013-03-05 19:14 UTC, Yujun Liang
Test file converted into OOXML format. (10.65 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2013-03-06 15:29 UTC, Mark B

Note You need to log in before you can comment on or make changes to this bug.
Description Yujun Liang 2013-03-05 03:33:37 UTC
Created attachment 30017 [details]
File to illustrate the bug, please read it with POI and debug the Cell.

Please see attached spreadsheet, 

Here is the content of the spreadsheet,

1	97.916%	
2	97.916%	

Cell A1 was generated by earlier version of jasper library, A2 was typed in by myself,

when reading it using POI-3.8 3.9, here are the values returned,

Cell A1 : 0.97915999999999

Cell A2 : 0.97916

Any idea?
Comment 1 Mark B 2013-03-05 09:39:36 UTC
I could be wrong but do doubt that this is a bug with the API because it is the format applied to the cells that determines what the user sees when viewing the workbook using Excel. Please take a look at the HSSFDataFormatter class as it contains methods that will allow you to use POI to format the cells contents appropriately.
Comment 2 Nick Burch 2013-03-05 12:25:54 UTC
This is to be expected. Excel stores the values in the file as floating point numbers, and applies formatting rules to control how they look to the end user. You should either use something like DataFormatter to get strings based on the format rules, or specify the number of significant digits when comparing the numbers.
Comment 3 Yujun Liang 2013-03-05 12:38:51 UTC
Can you explain so I understand why it returns different result?
Comment 4 Nick Burch 2013-03-05 12:42:46 UTC
I'd strongly suggest you go and read the wikipedia page on floating point numbers and binary representations, and possibly review some tutorials / CS101 talks on the same topic. Come back if after all that you still have queries on how it works!
Comment 5 Yujun Liang 2013-03-05 12:51:02 UTC
I understand floating point and I understand it return the first value as 0.97915999999999, what puzzled me was why it doesn't do it for cell A2 which is exactly the same number? Can you explain that part? Have you opened the attachment?
Comment 6 Yujun Liang 2013-03-05 19:14:17 UTC
Created attachment 30020 [details]
Modified file.

I added one more cell A3, value = 97.9162345%, format = "#,##0.000%"

So the file contents becomes,

           A             Display value with focus   Reading with POI
1     97.916%             97.916%                   0.97915999999999   
2     97.916%             97.916%                   0.97916 
3     97.916%             97.9162345%               0.979162345

Any idea why cell A2 is not 0.97915999999999 by POI?
Comment 7 Mark B 2013-03-06 15:28:14 UTC
Now I understand the question completely - I thought you were asking why POI returned a numeric value for one of these cells rather than a nicely formatted sting similar to the one the user sees when viewing the workbook using Excel. Nick's response should, I think, be accorded greater weight. To my mind, this problem is caused by the difficulties encountered representing a floating point number and I will explain why I do think this.
The older binary file format is quite hard to interrogate without using a tool like POI. As a result, I used Excel to convert your test file into OOXML and saved it with the extension.xlsx. These are simply zipped archives of various xml files and it is quite trivial to look at their contents directly.
Opening the OOXML based Excel archive and drilling down to the xml markup for the first sheet reveals this fragment;
   <row r="1" spans="1:3">
      <c r="A1" s="1"><v>0.97915999999999992</v></c>
      <c r="B1" t="s"><v>0</v></c>
      <c r="C1" s="2"/>
   <row r="2" spans="1:3">
      <c r="A2" s="2"><v>0.97916000000000003</v></c>
      <c r="B2" t="s"><v>1</v></c>
      <c r="C2" s="2"/>
      <row r="3" spans="1:3">
      <c r="A3" s="2"><v>0.97916234499999999</v></c>
      <c r="B3" t="s"><v>2</v></c>

As you can no dobt intuit, this fragment contains the data for the sheet arranged into rows. Each row contains child elements that define the markup for the cells on the row and the key here are the numerical values contained within the v - value - elements. As you can see, these are all flaoting point values with a significant number of digits following the decimal point but all display, when the sheet is viewed using Excel, exactly as you originally described (I will attach the converted .xlsx workbook for you to see for yourself).
This suggests - to me at least - that you are seeing some sort of interaction between the way Excel itself stores floating point values and the way they are handled/represented internally. Consequently, I do not regard this as being a bug within POI as it is simply reporting the values it finds within the workbook files. All in my opinion of course and I could be wrong.
Comment 8 Mark B 2013-03-06 15:29:15 UTC
Created attachment 30022 [details]
Test file converted into OOXML format.
Comment 9 Yujun Liang 2013-03-06 16:42:56 UTC
(In reply to comment #8)
> Created attachment 30022 [details]
> Test file converted into OOXML format.

Thank you Mark. You did answer my question, as well as taught me to save as xlsx format to examine the older spreadsheet. If I had known that before I wouldn't have created this bug report. 

I agree this is not a POI bug. jxl uses native api so it can handle this situation, I understand POI can't do that.

I am new to POI so I didn't understand the subtle difference in the storage, your example well illustrates it for me. Thanks.

I wrote some POI wrapper so it allows me to do this,

workbook.sheet("sheet1").row("1") // row("1") is getRow(0)

and workbook1.equals(workbook2) will compare sheet by sheet, cell by cell. That's how I found out the spreadsheet generated by older jasper library is not same as the one generated by the later jasper which using newer POI. 

Do you want to incorporate this API into POI library?
Comment 10 Mark B 2013-03-07 07:27:36 UTC
Using Excel to convert the file from Binary to OOXML is not of course infallible but I do find it useful as it makes it so simple to examine the files contents. Also, I think that the data will be preserved quite well, problems and all, as the process relies entirely upon Excel itself.

On a similar note, will you have to support both file formats - binary and OOXML? If so, it is possible to accomplish this using a single codebase as there is a common parent class for both the HSSF and XSSF streams. It is becoming more common for developers to write code that uses the classes in the ss.usermodel package - Workbook, Sheet, Row, Cell, etc - safe in the knowledge that it will handle either file format successfully.

Finally, I cannot really comment on including your code into the API. The better place to ask this question would be the dev list.