Bug 46689 - Numeric cell value problem
Summary: Numeric cell value problem
Status: RESOLVED INVALID
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: unspecified
Hardware: PC Windows XP
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-10 14:17 UTC by Attila Szabo
Modified: 2009-02-10 23:40 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Attila Szabo 2009-02-10 14:17:42 UTC
I understand that floating point conversions are not the responsibility of the POI project (bug #30565). Altough I have a related issue: I have a large number in an Excel file, that exceeds the a Java 'int' 's capability (e.g. 95842303093988300). (It is an id of a contract). As HSSFCell.getNumericCellValue() returns a double (no matter that the value is an integer) I have the same problem as in #30565 ('extra digit'), but I just can't leave it that way: 95842303093988300 or 95842303093988304 is a relevant difference...
Is there any workaround to avoid floating point arithmetic when reading integer (long) values from a cell?
Comment 1 Josh Micich 2009-02-10 16:33:30 UTC
The IEEE 'double precision' data type provides around 15.95 decimal digits of precision.  Any arbitrary number (assuming reasonable exponent/scale) can be represented accurately to the first 15 digits, with the 16th digit being nearly right.

If you use numeric cells there is no way to avoid using IEEE doubles when accessing the values.  Given the above limitations you should not rely on more than 15 digits of accuracy when using numeric cells in Excel(/POI).

Looking at your example from the inside out, the decimal number 95842303093988300 requires 57 bits to express in binary form:
101010100100000000001000100011110010111110101101111001100
Doubles have a 53 bit fraction (1 implicit + 52 explicit), so this number is 4 bits too large. In converting to double, the bottom 4 bits are rounded (up) with carry into the next bit resulting in an error of (10000b-01100b) = +4.  This error appears when converting back to a long: 95842303093988304. 

Since Excel performs this rounding silently, it may not be immediately apparent where the discrepancy has originated.


Have you considered using text cells to store your contract IDs?  Text values are still comparable, but don't support more complex mathematical operations.  This is usually not a concern for PK/ID values.  I'd also suggest adding a standard non-digit prefix (e.g. "#" or "ID-") to prevent accidental conversion to numeric cell type.



Comment 2 Attila Szabo 2009-02-10 23:40:37 UTC
Unfortunately, the excel file that contains the problematic numbers are out of my control as it is an input for the program. I do store ids as strings inside my system, but no idea how to access a numeric cell input other than using HSSFCell.getNumericCellValue() to achieve the desired 'what you see is what you get'. (I tried to access the cell as a String but it only supported what was in the javadoc: got an exception.)

My sweetest dream is a HSSFCell.getLongCellValue() that returns a long (maybe rounds floating point numbers to avoid further problems). While you're working on it ;) I will contvert the id column on the problematic rows to string by hand in Excel (probably by exporting it to CSV to enforce string conversion, I don't know yet. I've already tried to change the cell's type from number to string - well let's say that Excel doesn't do the job for me).

Thanks for your answer!