Bug 60044 - Regression in currency formatting for at least some xls in trunk
Summary: Regression in currency formatting for at least some xls in trunk
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SS Common (show other bugs)
Version: 3.15-dev
Hardware: PC Windows NT
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-25 11:58 UTC by Tim Allison
Modified: 2016-09-09 18:42 UTC (History)
0 users



Attachments
example file (35.00 KB, application/vnd.ms-excel)
2016-08-25 11:58 UTC, Tim Allison
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2016-08-25 11:58:55 UTC
Created attachment 34173 [details]
example file

On https://issues.apache.org/jira/browse/TIKA-2013, I noticed a regression in currency formatting in at least some xls files when comparing 3.15-beta1 with 3.15-beta3-rc1.

I confirmed that this is a poi-level issue, not something that we're botching in Tika.

XLS2CSVmra results with trunk:

,"Apple Day",$963.01,$7,61,$12,96,$8,99,$6,71,$8,09,$4,27,$3,98,$2,80,
,"Bingo",$1,184.00,$8,67,$18,69,,,,,,,
Comment 1 Javen O'Neal 2016-08-25 15:22:37 UTC
Could you provide a diff of the beta1 and beta3-rc1 output or a unit test to make it easier to find the breaking change?

There were some changes to DataFormatter between these releases that improved POI's locale-specific number formatting (comma is used as decimal separator rather than a period).

My guess is that this commit [1] caused the regression. Can't quote the numbers as that would give a string. Either change Xls2Csv to set the locale to format with a period separator when the document's locale or the machine's locale use comma decimal separator, or escape the commas and hope the CSV consumer knows how to unescape it.

[1] https://svn.apache.org/viewvc?view=revision&revision=1747139
Comment 2 Javen O'Neal 2016-08-25 15:26:04 UTC
> $1,184.00

Looks like thousands separator commas need to be striped as well.

Question for folks who are used to seeing numbers with comma decimal separators: how are CSV files usually read and written? Are numbers represented with period decimal separators?
Comment 3 PJ Fanning 2016-08-29 16:23:40 UTC
For CSV output, wouldn't it be better not to use thousands separators at all? That is, $1,184.00 could be written as $1184.00.
Otherwise, to keep the comma separator for the thousands would require the value to appear in double quotes ("$1,184.00").
Comment 4 Tim Allison 2016-09-09 18:16:08 UTC
Looks like this is caused by a typo in 

http://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/usermodel/BuiltinFormats.java?r1=1721901&r2=1747325

-putFormat(m, 7, "\"$\"#,##0.00_);(\"$\"#,##0.00)");
+"\"$\"#,##,00_);(\"$\"#,##0.00)",
Comment 5 Tim Allison 2016-09-09 18:42:28 UTC
r1760102  

I didn't bother adding a unit test because this seems like such a small issue.  Please reopen if we need a test.