Bug 46890

Summary: Text-formatted cells need to be recognized as CELL_TYPE_STRING - regardless from their content
Product: POI Reporter: alesinskyy
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED INVALID    
Severity: normal CC: alesinskyy
Priority: P2    
Version: 3.2-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   

Description alesinskyy 2009-03-23 06:55:07 UTC
It seems that if cell contains valid number then POI always consider it type as NUMERIC - even if the cell carries text format.
Comment 1 Nick Burch 2009-03-23 07:16:09 UTC
POI will give you the cell type that excel set for the cell

You should check the cell formatting rules to see how your numeric cells are formatted
Comment 2 alesinskyy 2009-03-23 07:59:17 UTC
Yes, probably it gets, but it causes a lot of problems.
BTW, older versions of POI (e.g. 2.5.1) used to return string in such cases.

This change of behavior has broken DBUnit XLSDataSet, for example. 

And even worse - I have not found any clear reference to this (rather breakin) change neither in the documentation, nor in the changelog. 

It is relatively clear, that if cell is formatted as "text", then it contents has to be treated text, not number. That means that cell content may be taken "as-is".
Comment 3 Nick Burch 2009-03-23 08:19:03 UTC
Pretty sure that POI has always returned NUMERIC for cells that excel has stored as a floating point number, and only returns STRING for cells where the additional string record is used

If a cell is really numeric, then excel stores a float. You need to fetch back that float, then apply string formatting rules to make a string. You can't fetch "a string", as there isn't one in the file, all there is is the float + a formatting string.

If you'd care to upload a unit test that shows this differently from poi 2.5 to now, we'll take a look, but I'm fairly sure this is still an INVALID
Comment 4 alesinskyy 2009-03-24 05:51:14 UTC
Hi,

I was wrong - POI 2.5.1 has behave practically the same. The real problem lies in DBUnit as such, not in POI.
Comment 5 alesinskyy 2009-03-24 07:47:31 UTC
Sorry, but I want to reopen this issue once more - the problem is that Excel (2003 and, likely, other versions as well) exhibits somewhat odd behavior - if cell containing valid numeric value is formated as text Excel still recognizes it as numeric till it would be edited once more, at least with "change-nothing" editing (in case if "Number stored as text" error checking rule is activated "Ignore error" should be confirmed for each affected cell as well).

It would be very nice to allow obtain string value (exactly as it is shown by Excel) of the such cell (which has numeric type, but text formatting) - now it throws exception. I'm not sure if it is attainable or this restriction comes directly from Excel.

Regards,
Oleksandr
Comment 6 Nick Burch 2009-03-24 08:35:14 UTC
For a numeric cell, the "text as shown by excel" isn't stored anywhere. Please follow standard POI practice, get the floating point number + format string, and format it yourself