Bug 45678 - HSSFDataFormatter returns some incorrect formatted strings
Summary: HSSFDataFormatter returns some incorrect formatted strings
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.0-dev
Hardware: PC Windows Vista
: P2 minor (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-23 05:24 UTC by Pablo Torres
Modified: 2012-01-13 16:04 UTC (History)
2 users (show)



Attachments
Repro xls file for issue #3 (18.00 KB, application/octet-stream)
2008-11-03 21:22 UTC, Vinu Kumar
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pablo Torres 2008-08-23 05:24:03 UTC
When reading values from number cells, for some type of formatted numbers, HSSFDataFormatter returns incorrect values:

1. Simple numbers: adds an underscore after the number in the formatted string, using default number format, eg. a cell with the number 4455667788 is formatted as a string like '4455667788_'.

2. Account numbers are formatted with different precision than in Excel, eg. the number 0,896665 in excel, with 2 digits precision, is shown as 0,90. HSSFDataFormatter returns 0,896665. The same number, as currency or number is formatted as it appears in Excel (number, as reported above, adds an underscore in the end of the string, it returns '0,90_'.

3. Fraction numbers always returns the string '1 ??/??'.

The XLS document was generated using Excel 2007 / portuguese, in Excel 97/2003 format.

Other kind of number formats work fine, like currency, date, hour, percentage, scientific and phone numbers. 

Brazilian zip codes (ddddd-ddd) are formatted as dddddddd- and taxpayer registry numbers (ddddddddd-dd) as ddddddddddd-
Comment 1 James 2008-08-25 09:29:58 UTC
Brazilian zip codes (ddddd-ddd) and taxpayer registry numbers (ddddddddd-dd) will require custom formats which you can easily add via addFormat(String, java.text.Format). See docs and source for examples of SSN, phone-number, etc. 

As far as the other problems, can you attach a sample xls that demonstrates the behavior?

(In reply to comment #0)
> When reading values from number cells, for some type of formatted numbers,
> HSSFDataFormatter returns incorrect values:
> 1. Simple numbers: adds an underscore after the number in the formatted string,
> using default number format, eg. a cell with the number 4455667788 is formatted
> as a string like '4455667788_'.
> 2. Account numbers are formatted with different precision than in Excel, eg.
> the number 0,896665 in excel, with 2 digits precision, is shown as 0,90.
> HSSFDataFormatter returns 0,896665. The same number, as currency or number is
> formatted as it appears in Excel (number, as reported above, adds an underscore
> in the end of the string, it returns '0,90_'.
> 3. Fraction numbers always returns the string '1 ??/??'.
> The XLS document was generated using Excel 2007 / portuguese, in Excel 97/2003
> format.
> Other kind of number formats work fine, like currency, date, hour, percentage,
> scientific and phone numbers. 
> Brazilian zip codes (ddddd-ddd) are formatted as dddddddd- and taxpayer
> registry numbers (ddddddddd-dd) as ddddddddddd-

Comment 2 Vinu Kumar 2008-11-03 21:22:14 UTC
Created attachment 22825 [details]
Repro xls file for issue #3

Attaching repro xls file for Issue no:3 for fractions. This is reproduced in 3.1 and 3.5 as well.
Comment 3 James 2008-11-04 08:18:24 UTC
Internally this class uses java.text.DecimalFormat and java.text.SimpleDateFormat. Fraction formatting support was not included in HSSFDataFormatter because it would require a dependency on org.apache.commons.math.fraction.FractionFormat or a similar custom Format class. Does the POI development team have a recommendation for this matter?
Comment 4 Nick Burch 2008-11-10 04:45:55 UTC
Could we dynamically load commons-math at runtime if it's there? If it is, you get fractions, if not, you don't (but also don't need another jar as a dependency)
Comment 5 James 2008-11-10 08:54:16 UTC
Nick, your suggestion to dynamically load commons-math at runtime sounds like a good compromise. Unfortunately with my current project load, I have no bandwidth to devote to this right now. I can look at this when my schedule lightens up if a patch is not submitted in the meantime.
Comment 6 Nick Burch 2008-11-10 09:05:09 UTC
I'd suggest we leave the bug open for now then. If someone else fancies adding this new code, they can do and attach the patch to the bug. If not, please do work on it yourself once you have a spare moment :)
Comment 7 Dave Meikle 2010-01-05 13:17:55 UTC
I have been implementing some code in Apache Tika (part of the Lucene community) to improve our number format support as part of a JIRA Ticket.

As part of this is related to fractions I would like to progress the support of fractions with POI, therefore I have made changes to support dynamic usage of Commons Math.

Still have a few tweaks to make and then will attach.

Cheers,
Dave
Comment 8 Nick Burch 2010-01-06 03:03:37 UTC
Great news, we'll look forward to seeing the patch :)
Comment 9 Nick Burch 2010-06-01 11:37:36 UTC
I've just made some improvements to how POI handles excel formatting, in relation to bug #48872. I think this will help with a few of your problems. Fractions still remains as a todo though.

I feel I've a better handle on how it all works now, which should make reviewing and applying a patch easier, so if we could get a patch for fractions that'd be great!
Comment 10 Nick Burch 2012-01-13 16:04:04 UTC
As of r1231144, fractions (including the one in this file) are correctly formatted by DataFormatter, as part of the merging of the code from TEXT