Bug 40128 - HSSFDateUtil.isCellDateFormatted(cell) returns false when the cell is formatted as a date
Summary: HSSFDateUtil.isCellDateFormatted(cell) returns false when the cell is formatt...
Status: RESOLVED LATER
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.0-dev
Hardware: Other other
: P2 major with 2 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords: NeedsReleaseNote
Depends on:
Blocks:
 
Reported: 2006-07-27 20:23 UTC by Leo
Modified: 2012-10-30 08:55 UTC (History)
4 users (show)



Attachments
The excel sheet with dates formatted to each excel date format. (17.50 KB, application/vnd.ms-excel)
2006-07-28 13:29 UTC, Leo
Details
Proposed patch of HSSFUtil.Java (10.19 KB, patch)
2006-10-13 10:14 UTC, Paul H
Details | Diff
date cell resolution patch (3.09 KB, patch)
2006-11-03 12:30 UTC, Miroslav
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Leo 2006-07-27 20:23:20 UTC
After making sure that the cell is in numeric format:

When calling the HSSFDateUtil.isCellDateFormatted(cell) method, it returns false
for all excel date formats other than M/DD/YYYY.
  
If you use a custom date format it works but not the actual excel date format.

I believe this is because it calls the
 
isInternalDateFormat(int format) method which checks against these hex values:
 
                case 0x0e:
                case 0x0f:
                case 0x10:
                case 0x11:
                case 0x12:
                case 0x13:
                case 0x14:
                case 0x15:
                case 0x16:
                case 0x2d:
                case 0x2e:
                case 0x2f:

however these values are only for the custom date formats, not actual date formats.

Adding the following should recognize the actual date formats instead of only
custom ones:

Values were found through experimentation with excell's date formatting.
They correspond to the HSSFCell.getCellStyle().getDataFormat() values for excel
dates.

The format they represent is shown next to them.

      case 0xA6: // Monday, July 24, 2006
      case 0xA7: // 7/24

      case 0xA8: // 7/24/06

      case 0xA9: // 07/24/06

      case 0xAA: // 24-Jul

      case 0xAB: // 24-Jul-06


      case 0xAD: // 24-Jul-06

      case 0xAE: // Jul-06

      case 0xAF: // July-06

      case 0xB0: // July 24, 2006

      case 0xB1: // 7/24/06 12:00 AM

      case 0xB2: // 7/24/06 0:00

      case 0xB3: // J

      case 0xB4: // J-06

      case 0xB5: // 7/24/2006

      case 0xB6: // 24-Jul-2006
Comment 1 Jason Height 2006-07-28 00:49:15 UTC
Dont suppose that you could attach an excel file with the first row of the first
sheet containing cells which include a date, with each cell having the different
format as you indicate below.

We can then include it in a unit test and get it fixed.

Maybe you could even make the appropriate changes to isInternalDateFormat,
include a test case and attach a patch to this bug

Jason
Comment 2 Leo 2006-07-28 13:29:39 UTC
Created attachment 18660 [details]
The excel sheet with dates formatted to each excel date format.

I have attached a spreadsheet with the 17 date formats that excel currently
uses.
Comment 3 Indra Polak 2006-08-24 13:45:46 UTC
I believe also the value 164 (equals 0xA4 i think?) should be added to this list.
In my OpenOffice browser it shows up in the format

dd-MM-yyyy

while the double value is something like 38898.0

After adding this value to the accepted list, I could get the proper date 
of this value as 
30-6-2006 (In a dd-MM-yyyy format :).

What is the definitive word on this issue? Should the 

// Internal Date Formats as described on page 427 in

// Microsoft Excel Dev's Kit...

comment be read as 

// Some of the Internal Date Formats as described on page 427 in
// Microsoft Excel Dev's Kit...

If you come across a complete list (or the correct version of the
isInternalDateFormat function) please mail it to me at 
indra@feeddex.nl

For now, I just add some additional constants when I encounter them...
Maybe we can ask the guys from OpenOffice since they apparently know how to do
it. (Version 2.0)

Isn't that program open source as well? I'll look it up there...(In reply to
comment #2)
> Created an attachment (id=18660) [edit]
> The excel sheet with dates formatted to each excel date format.
> 
> I have attached a spreadsheet with the 17 date formats that excel currently
> uses.
Comment 4 Paul H 2006-10-13 10:14:50 UTC
Created attachment 19003 [details]
Proposed patch of HSSFUtil.Java

Additional values in isInternalDateFormat() for better recognition of date
formatted cells.
Comment 5 Paul H 2006-10-13 10:23:44 UTC
Comment on attachment 19003 [details]
Proposed patch of HSSFUtil.Java

I did some experimenting, too and I came up with some of the same values as
Jason did. I found some additional ones too. 

I also noticed that the same date format can have different format codes. (I
copy/pasted a date cell from one sheet to another and it had a different hex
value there.)

I'm attaching a proposed patch that checks for the union of Jason's and my
values hoping that most (all?) date formatted cells are recognized that way.
Comment 6 Paul H 2006-10-16 10:42:53 UTC
I guess that was Leo, not Jason. Sorry about that.
Comment 7 Miroslav 2006-11-02 01:47:41 UTC
I'm not sure this approach is correct. I'm not an MS Excel format expert but if
they say that there are only several formats described as date/time formats in
Microsoft Excel Dev's Kit then only those formats should be guarantied to be
date/time.

I bring this up because I have an Excel file created in OpenOffice that has
"standard number" format for some numbers (data format 164 in my case) and this
format is evaluated by the patch as being date cell (which it definitely is not
in my case). On the opposite, there are cells formated as date (Czech format,
data format 165) and this format is evaluated as not being date cell (which it
definitely is in my case).

I thougth about a more general approach that would determine whether the cell is
date/time or not using the format string specified in the data format and not an
index which seems it really doesn't work across different Excel files. So simply
said, if the formatting string contains date/time formatters then the cell would
be evaluated as date cell, otherwise it is a numeric cell.

I went through the API looking for a way to get HSSFDataFormat for given cell
but found only that I can create instance of it using Workbook object which I
don't know how to get from HSSFWorkbook object. I would like to verify that this
approach will work and eventually write the patch.
Comment 8 Leo 2006-11-02 07:09:20 UTC
The problem is differentiating actual excel "date" formats from "custom" formats
that are normally dates.  For instance, dd-MM-yyyy is not an actual supported
"date" format however there are "custom" formats available (in excel and open
office) that define the format dd-MM-yyyy as a date.

Perhaps there should be two separate methods, one for actual "date" formats and
one for formats that usually look like dates.
Comment 9 Miroslav 2006-11-02 07:22:50 UTC
There are also localized date formats that are not English date formats nor ISO
date formats so "guessing" the date format using date time formatters would be
IMO appropriate 'cos users want date values to be processed as dates no matter
whether it is Excel "native" or custom format. When reading Excel file, I do not
really care whether the date format is Excel "native" or it is a custom format.
I just want to read the value from cell as date if the data format says that the
cell value should be formatted as date.

Any suggestion how to implement this? I have idea how to implement it but I
cannot get HSSFDataFormat object for a cell as noted in comment #7.
Comment 10 Leo 2006-11-02 07:33:38 UTC
Since custom formats may be specified it is difficult to gaurantee what a
certain format may represent.  

The custom format 0xA4 may represent dd-MM-yyyy or something else if it was
defined differently.  

However, a native date format like 0xA6 should always represent a date since it
is not customized.
Comment 11 Miroslav 2006-11-02 07:51:02 UTC
(In reply to comment #10)
> Since custom formats may be specified it is difficult to gaurantee what a
> certain format may represent.  
> 
> The custom format 0xA4 may represent dd-MM-yyyy or something else if it was
> defined differently.  
> 
> However, a native date format like 0xA6 should always represent a date since it
> is not customized.

And that's the reason why it think that using index to determine whether cell
contains date or not is not safe and appropriate. Using the formatting string
would be better approach I think.
Comment 12 Miroslav 2006-11-03 12:30:00 UTC
Created attachment 19084 [details]
date cell resolution patch

I have created patch that determines more safely whether the cell is a date or
not. Because I need to get format specification to determine the cell format, I
did not patch HSSFDateUtil but used HSSFWorkbook and Workbook to implement it.
The code uses same approach as is used in JExcelApi.
Comment 13 Miroslav 2006-12-13 04:54:47 UTC
Is there anything new about this bug?
Comment 14 Danny Nip 2007-08-16 12:48:46 UTC
I was trying out the proposed code below,
 
Proposed patch of HSSFUtil.Java patch 2006-10-13 10:14 10.19 KB Edit
 
And it failed when I used the default format of:
12/31/06 (English US)
31/12/06 (English Canada)
 
I fixed this by adding
 
case 0xa5:
 
Note, I did save my .xls (97/2000/XP) Spreadsheet in OpenOffice in Vista.
Also I have not read the Microsoft SDK, I just did a
HSSFCellStyle.getDataFormat() to find the hex value.

PS. vote for this bug!!!
Comment 15 Nick Burch 2007-08-17 03:16:38 UTC
There has been quite a lot of work on cell dates in trunk over the last month or
so. This might well have been fixed, could you try it with SVN trunk and see if
it now works for you?
Comment 16 Danny Nip 2007-08-17 10:22:33 UTC
(In reply to comment #15)

Which SVN trunk are you talking about?

Okay, I have confirmed that the hex values for OpenOffice .xls files are
different.  For a Spreadsheet created in Office XP, default format is "case
0x0e:" (which is fixed in the latest POI 3.0.1 version).

case 0xa5: is the default Date value in OpenOffice.
case 0xa4: is NOT a date format in OpenOffice.  (2006-10-13 proposed fix
includes this as a Date)

So it's Spreadsheets saved in OpenOffice that may potentially cause a ruckus. 
Custom date formats are also suspect.  Anyone test with Office on a Mac?
Comment 17 Nick Burch 2007-08-20 02:10:08 UTC
(In reply to comment #16)
> Which SVN trunk are you talking about?

The POI svn trunk. If you can't do an svn checkout, there are nightly builds at 
http://encore.torchbox.com/poi-svn-build/
Comment 18 Sakari A. Maaranen 2007-10-30 04:23:10 UTC
POI should check if any date-formatting tokens are used in a custom format or if
the format is some locale-specific (e.g. Finnish dates), or of course any
standard or usual date format.  It doesn't matter which date format is in use as
long as the cell contains a date.  POI should detect all formats that are dates,
custom, localized, standard, everything.

We had this problem in our project (with Finnish dates) and our developers had
to implement a local solution for us.  They did not do it well enough to be
published here, but the principle was that if any date or time formatting tokens
were used, the cell was interpreted as being a date.  Worked well enough for us.

Please fix this asap. Localized and custom dates processing is a very basic
feature that badly needs support from POI!
Comment 19 Nick Burch 2007-11-06 04:35:07 UTC
(In reply to comment #18)
> POI should detect all formats that are dates,
> custom, localized, standard, everything.

Please feel free to suggest a patch to do this. Unfortunately, excel doesn't
actually store dates as such. Instead, it just stores a number, then uses the
formatting rules to decide if it's really a number or a date. It sucks, but
that's just what they did.

For the standard date formats, everything is easy. For custom date formats, we
have to try and spot if the formatting rules seem to be numeric, or date based,
and guess from that.

Patches to allow POI to detect other date formats are gladly received :)
Comment 20 Nick Burch 2010-06-03 12:42:39 UTC
This bug references a very old version of POI. As no new comments have been added in a long time, and lots of work on this area has taken place in POI in the mean time, I'm assuming that this bug has now been fixed

If the bug still exists with the latest version of POI, please re-open the bug and add a comment indicating this, ideally also with a failing unit test