Bug 57738

Summary: XSSFSheetXmlHandler returns incorrect values for numeric cells
Product: POI Reporter: Tim Slatcher <tim.slatcher>
Component: XSSFAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 3.11-FINAL   
Target Milestone: ---   
Hardware: Macintosh   
OS: Mac OS X 10.4   

Description Tim Slatcher 2015-03-22 21:08:21 UTC
When reading a spreadsheet containing certain numbers (for example 8.2), the XSSFSheetXmlHandler will output the incorrect value. 

I believe this is due to an error in the NUMBER case of the method endElement, where formatted cells are parsed as doubles, but unformatted cells simply return the internal value, which for reasons I don't understand, is 8.1999... rather than 8.2. It appears that simply converting these into Doubles will return the correct value. 

We don't currently have a work around - since this needs to happen only when the value is a number, in order to distinguish strings like '012' from numbers like '12' we can't simply attempt to parse all strings passed to output.cell. 

I'm unfamiliar with how to use SVN and create patches, but the following code seems to fix the problem:

case NUMBER:
    String n = value.toString();
if (this.formatString != null) {
    thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);
} else {
    DecimalFormat df = new DecimalFormat("0");
    thisStr = df.format(Double.parseDouble(n));
Comment 1 Nick Burch 2015-03-22 22:52:19 UTC
Both Excel's .xls and .xlsx use doubles to store numbers, and formatting rules to specify how numbers should be shown to the user. If you want POI to give you a string of "what Excel shows for a cell", you need to use DataFormatter

If you look at org.apache.poi.xssf.eventusermodel.XLSX2CSV you'll see an example of doing this.
Comment 2 Tim Slatcher 2015-03-22 23:09:18 UTC
I'm not sure I follow. Are you saying I need to reimplement XSSFSheetXmlHandler to do what I want? Given that it outputs as strings, I see no reason it wouldn't give you the exactly content the user sees, is that class supposed to do something else?

From looking at the source code to XLSX2CSV it has the exact same bug that I'm referencing here - put the number 8.2 in an XLSX, feed it through that class and the CSV will have different numbers in the cells. 

Also, 8.2 is representable as a double, so this doesn't explain why this number would appear as 8.199... unless I'm miss-understand double representation.
Comment 3 Tim Slatcher 2015-04-10 09:00:55 UTC
Any response to my comment below? I'd like to re-open this unless you disagree with my statement that the XLSX2CSV converter also has this bug.
Comment 4 Nick Burch 2015-04-24 01:35:32 UTC
Are you able to produce a very small .xlsx file which shows the problem with XLSX2CSV? Say, one with just a few cells completed, some of which POI is getting right, and some it isn't.

If so, please attach the file here, along with a screenshot of what Excel shows for the file, and details of what cells fail with XLSX2CSV + why (eg A4 should be 8.21 but is being output as 8.2099)

If you can reproduce the problem with XLSX2CSV and produce those things, it should make it quite quick to unit test + hopefully fix!