Bug 60354 - DataFormatter rounding issues
Summary: DataFormatter rounding issues
Alias: None
Product: POI
Classification: Unclassified
Component: SL Common (show other bugs)
Version: 3.15-FINAL
Hardware: PC Mac OS X 10.1
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2016-11-08 17:03 UTC by PJ Fanning
Modified: 2016-11-08 20:45 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description PJ Fanning 2016-11-08 17:03:20 UTC
I've run into some issues where converting numbers into Double leads to rounding issues. I would like to subclass org.apache.poi.ss.usermodel.DataFormatter to avoid converting numbers to Double. A lot of DataFormatter functionality is private. Would it be possible to change some methods to be protected? getFormattedDateString and getFormattedNumberString are examples of methods that it would be nice to have as protected so that I don't have to copy/paste the code into my subclass.
Comment 1 Nick Burch 2016-11-08 17:37:17 UTC
If DataFormatter is rounding things incorrectly compared to Excel, it'd be better to fix the bug, rather than requiring everyone to subclass it to do their own fixes!

Can you produce a small sample file and junit unit test, where DataFormatter formats a cell with different rounding to what Excel does?
Comment 2 PJ Fanning 2016-11-08 20:02:40 UTC
This issue may be down to my own confusion. We used to have a streaming parser based on the FromHowTo sample in POI and this code ran into rounding problems. This appears to be because the worksheet XML actually has the numeric data already stored as a badly rounded value.
This code does not use the DataFormatter.
For a cell that my workbook displays 9.20 for, the XML in the worksheet has:
<c r="C1" s="2"><v>9.1999999999999993</v></c>

We have now switched to an approach more like the XLSX2CSV POI sample but I've been wary about equivalent rounding issues and this is why I was interested in writing a custom DataFormatter to avoid using floating point precision. But it would appear that the Double logic is actually necessary to handle the fact that the Excel sheet data doesn't have the actual '9.20' value but an approximation of it.