After making sure that the cell is in numeric format: When calling the HSSFDateUtil.isCellDateFormatted(cell) method, it returns false for all excel date formats other than M/DD/YYYY. If you use a custom date format it works but not the actual excel date format. I believe this is because it calls the isInternalDateFormat(int format) method which checks against these hex values: case 0x0e: case 0x0f: case 0x10: case 0x11: case 0x12: case 0x13: case 0x14: case 0x15: case 0x16: case 0x2d: case 0x2e: case 0x2f: however these values are only for the custom date formats, not actual date formats. Adding the following should recognize the actual date formats instead of only custom ones: Values were found through experimentation with excell's date formatting. They correspond to the HSSFCell.getCellStyle().getDataFormat() values for excel dates. The format they represent is shown next to them. case 0xA6: // Monday, July 24, 2006 case 0xA7: // 7/24 case 0xA8: // 7/24/06 case 0xA9: // 07/24/06 case 0xAA: // 24-Jul case 0xAB: // 24-Jul-06 case 0xAD: // 24-Jul-06 case 0xAE: // Jul-06 case 0xAF: // July-06 case 0xB0: // July 24, 2006 case 0xB1: // 7/24/06 12:00 AM case 0xB2: // 7/24/06 0:00 case 0xB3: // J case 0xB4: // J-06 case 0xB5: // 7/24/2006 case 0xB6: // 24-Jul-2006
Dont suppose that you could attach an excel file with the first row of the first sheet containing cells which include a date, with each cell having the different format as you indicate below. We can then include it in a unit test and get it fixed. Maybe you could even make the appropriate changes to isInternalDateFormat, include a test case and attach a patch to this bug Jason
Created attachment 18660 [details] The excel sheet with dates formatted to each excel date format. I have attached a spreadsheet with the 17 date formats that excel currently uses.
I believe also the value 164 (equals 0xA4 i think?) should be added to this list. In my OpenOffice browser it shows up in the format dd-MM-yyyy while the double value is something like 38898.0 After adding this value to the accepted list, I could get the proper date of this value as 30-6-2006 (In a dd-MM-yyyy format :). What is the definitive word on this issue? Should the // Internal Date Formats as described on page 427 in // Microsoft Excel Dev's Kit... comment be read as // Some of the Internal Date Formats as described on page 427 in // Microsoft Excel Dev's Kit... If you come across a complete list (or the correct version of the isInternalDateFormat function) please mail it to me at indra@feeddex.nl For now, I just add some additional constants when I encounter them... Maybe we can ask the guys from OpenOffice since they apparently know how to do it. (Version 2.0) Isn't that program open source as well? I'll look it up there...(In reply to comment #2) > Created an attachment (id=18660) [edit] > The excel sheet with dates formatted to each excel date format. > > I have attached a spreadsheet with the 17 date formats that excel currently > uses.
Created attachment 19003 [details] Proposed patch of HSSFUtil.Java Additional values in isInternalDateFormat() for better recognition of date formatted cells.
Comment on attachment 19003 [details] Proposed patch of HSSFUtil.Java I did some experimenting, too and I came up with some of the same values as Jason did. I found some additional ones too. I also noticed that the same date format can have different format codes. (I copy/pasted a date cell from one sheet to another and it had a different hex value there.) I'm attaching a proposed patch that checks for the union of Jason's and my values hoping that most (all?) date formatted cells are recognized that way.
I guess that was Leo, not Jason. Sorry about that.
I'm not sure this approach is correct. I'm not an MS Excel format expert but if they say that there are only several formats described as date/time formats in Microsoft Excel Dev's Kit then only those formats should be guarantied to be date/time. I bring this up because I have an Excel file created in OpenOffice that has "standard number" format for some numbers (data format 164 in my case) and this format is evaluated by the patch as being date cell (which it definitely is not in my case). On the opposite, there are cells formated as date (Czech format, data format 165) and this format is evaluated as not being date cell (which it definitely is in my case). I thougth about a more general approach that would determine whether the cell is date/time or not using the format string specified in the data format and not an index which seems it really doesn't work across different Excel files. So simply said, if the formatting string contains date/time formatters then the cell would be evaluated as date cell, otherwise it is a numeric cell. I went through the API looking for a way to get HSSFDataFormat for given cell but found only that I can create instance of it using Workbook object which I don't know how to get from HSSFWorkbook object. I would like to verify that this approach will work and eventually write the patch.
The problem is differentiating actual excel "date" formats from "custom" formats that are normally dates. For instance, dd-MM-yyyy is not an actual supported "date" format however there are "custom" formats available (in excel and open office) that define the format dd-MM-yyyy as a date. Perhaps there should be two separate methods, one for actual "date" formats and one for formats that usually look like dates.
There are also localized date formats that are not English date formats nor ISO date formats so "guessing" the date format using date time formatters would be IMO appropriate 'cos users want date values to be processed as dates no matter whether it is Excel "native" or custom format. When reading Excel file, I do not really care whether the date format is Excel "native" or it is a custom format. I just want to read the value from cell as date if the data format says that the cell value should be formatted as date. Any suggestion how to implement this? I have idea how to implement it but I cannot get HSSFDataFormat object for a cell as noted in comment #7.
Since custom formats may be specified it is difficult to gaurantee what a certain format may represent. The custom format 0xA4 may represent dd-MM-yyyy or something else if it was defined differently. However, a native date format like 0xA6 should always represent a date since it is not customized.
(In reply to comment #10) > Since custom formats may be specified it is difficult to gaurantee what a > certain format may represent. > > The custom format 0xA4 may represent dd-MM-yyyy or something else if it was > defined differently. > > However, a native date format like 0xA6 should always represent a date since it > is not customized. And that's the reason why it think that using index to determine whether cell contains date or not is not safe and appropriate. Using the formatting string would be better approach I think.
Created attachment 19084 [details] date cell resolution patch I have created patch that determines more safely whether the cell is a date or not. Because I need to get format specification to determine the cell format, I did not patch HSSFDateUtil but used HSSFWorkbook and Workbook to implement it. The code uses same approach as is used in JExcelApi.
Is there anything new about this bug?
I was trying out the proposed code below, Proposed patch of HSSFUtil.Java patch 2006-10-13 10:14 10.19 KB Edit And it failed when I used the default format of: 12/31/06 (English US) 31/12/06 (English Canada) I fixed this by adding case 0xa5: Note, I did save my .xls (97/2000/XP) Spreadsheet in OpenOffice in Vista. Also I have not read the Microsoft SDK, I just did a HSSFCellStyle.getDataFormat() to find the hex value. PS. vote for this bug!!!
There has been quite a lot of work on cell dates in trunk over the last month or so. This might well have been fixed, could you try it with SVN trunk and see if it now works for you?
(In reply to comment #15) Which SVN trunk are you talking about? Okay, I have confirmed that the hex values for OpenOffice .xls files are different. For a Spreadsheet created in Office XP, default format is "case 0x0e:" (which is fixed in the latest POI 3.0.1 version). case 0xa5: is the default Date value in OpenOffice. case 0xa4: is NOT a date format in OpenOffice. (2006-10-13 proposed fix includes this as a Date) So it's Spreadsheets saved in OpenOffice that may potentially cause a ruckus. Custom date formats are also suspect. Anyone test with Office on a Mac?
(In reply to comment #16) > Which SVN trunk are you talking about? The POI svn trunk. If you can't do an svn checkout, there are nightly builds at http://encore.torchbox.com/poi-svn-build/
POI should check if any date-formatting tokens are used in a custom format or if the format is some locale-specific (e.g. Finnish dates), or of course any standard or usual date format. It doesn't matter which date format is in use as long as the cell contains a date. POI should detect all formats that are dates, custom, localized, standard, everything. We had this problem in our project (with Finnish dates) and our developers had to implement a local solution for us. They did not do it well enough to be published here, but the principle was that if any date or time formatting tokens were used, the cell was interpreted as being a date. Worked well enough for us. Please fix this asap. Localized and custom dates processing is a very basic feature that badly needs support from POI!
(In reply to comment #18) > POI should detect all formats that are dates, > custom, localized, standard, everything. Please feel free to suggest a patch to do this. Unfortunately, excel doesn't actually store dates as such. Instead, it just stores a number, then uses the formatting rules to decide if it's really a number or a date. It sucks, but that's just what they did. For the standard date formats, everything is easy. For custom date formats, we have to try and spot if the formatting rules seem to be numeric, or date based, and guess from that. Patches to allow POI to detect other date formats are gladly received :)
This bug references a very old version of POI. As no new comments have been added in a long time, and lots of work on this area has taken place in POI in the mean time, I'm assuming that this bug has now been fixed If the bug still exists with the latest version of POI, please re-open the bug and add a comment indicating this, ideally also with a failing unit test