Summary: | The data format string is incorrect for Accounting style | ||
---|---|---|---|
Product: | POI | Reporter: | jimmy422 |
Component: | HSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P3 | ||
Version: | 3.0-FINAL | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Windows XP | ||
Attachments: | This is a sample java program. Create a test.xls having a cell value with Accounting format |
Description
jimmy422
2009-05-28 21:23:52 UTC
There clearly is a difference as you have pointed out. Excel displays: _($* #,##0.00_);_($* (#,##0.00);_($* "-"??_);_(@_) new HSSFWorkbook().createDataFormat().getFormat((short)0x2c); returns: _($*#,##0.00_);_($*(#,##0.00);_($*"-"??_);_(@_) However, it is not clear how this causes a problem. My understanding was that the 'built-in' formats are referenced by their index (in this case 0x2c) and not by their textual representation. I took a look at Excel's internal representation of built-in format 0x2c and it is different again: _("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_) It has the spaces after the asterisks, but it also has quotes around the currency and backslashes before some round brackets. I've compared the formats produced in a new file created by Excel, and one created by POI. They are the same. You can see POI's code for creating format 0x2c on line 1269 of the following: http://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/hssf/model/Workbook.java?annotate=764203 As far as I can tell, this discrepancy won't cause POI to read or write xls files incorrectly. Perhaps you have a different use case which relies on the value returned by HSSFDataFormat.getFormat(short). If so, can you please describe *that* in more detail? This will help write a junit that verifies the need for this code change. There is a difference, and I think POI is wrong. The format string returned by POI is _($*#,##0.00_);_($*(#,##0.00);_($*\"-\"??_);_(@_) which is wrong. Excel will not accept that string if you try to set that value as a custom format in Excel. This format string is syntactically incorrect. Excel displays _($* #,##0.00_);_($* (#,##0.00);_($* "-"??_);_(@_) which is the correct format. Note the space character after the asterisk. The asterisk means use the following character to pad the formatted value, and repeat the pad character enough times to fill the cell. That's why the format string from POI is wrong. The "*#" in the format string would mean use "#" to pad the formatted value of the cell, and format the number with ",##0.00" (which is not defined). Josh reports seeing the value _("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_) This is again different. The quotes around the $ sign mean that the cell value will always use "$" as the currency symbol. Without the quotes, Excel will use the currency symbol from the current Locale. I'm not sure how Josh saw this third value of the format string. I see the value of the format string by 1) Format | Cells, choose Accounting, and leave the other choices at their default values. Click OK 2) Format | Cells, change the selection from Accounting to Custom, and look at what shows in the field labelled Type. I think Micheal is correct. I googled "excel record 0x2c" and got a link to this page from Microsoft: http://support.microsoft.com/kb/147942 It includes their definition of the formats: 0x00 General 0x01 0 0x02 0.00 0x03 #,##0 0x04 #,##0.00 0x05 ($#,##0_);($#,##0) 0x06 ($#,##0_);[Red]($#,##0) 0x07 ($#,##0.00_);($#,##0.00) 0x08 ($#,##0.00_);[Red]($#,##0.00) 0x09 0% 0x0a 0.00% 0x0b 0.00E+00 0x0c # ?/? 0x0d # ??/?? 0x0e m/d/yy 0x0f d-mmm-yy 0x10 d-mmm 0x11 mmm-yy 0x12 h:mm AM/PM 0x13 h:mm:ss AM/PM 0x14 h:mm 0x15 h:mm:ss 0x16 m/d/yy h:mm 0x25 (#,##0_);(#,##0) 0x26 (#,##0_);[Red](#,##0) 0x27 (#,##0.00_);(#,##0.00) 0x28 (#,##0.00_);[Red](#,##0.00) 0x29 _(* #,##0_);_(* (#,##0);_(* "-"_);_(@_) 0x2a _($* #,##0_);_($* (#,##0);_($* "-"_);_(@_) 0x2b _(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(@_) 0x2c _($* #,##0.00_);_($* (#,##0.00);_($* "-"??_);_(@_) 0x2d mm:ss 0x2e [h]:mm:ss 0x2f mm:ss.0 0x30 ##0.0E+0 0x31 @ Seems authoritative to me. Josh? Sorry for the ambiguity in my earlier comment - I wasn't suggesting that POI was correct. POI's external text representation of format 0x2c is missing some spaces, and this should be fixed. However, I always like to accompany each fix with something (junit/comment) that says "this is what will go wrong without the fix". I have tried to write some code that will exploit this incorrect value returned by HSSFDataFormat.getFormat(short). As far as I can tell, the only place in POI that uses this string value is HSSFDataFormat.getFormat(String). That method just translates the offending format text back to 0x2c, which should be OK. I guess that Vijayendra's problem involves processing of these format string values *outside* of POI. Vijayendra, do you have specific details of how POI's text value for data format 0x2c causes problems for you? (In reply to comment #2) > > The format string returned by POI is > _($*#,##0.00_);_($*(#,##0.00);_($*\"-\"??_);_(@_) > which is wrong. Excel will not accept that string if you try to set that value > as a custom format in Excel. This format string is syntactically incorrect. It's probably not critical to the resolution of this bug, but I have different observations. My Excel(2007) accepts that string without complaint, and seems to interpret it OK too. It's not likely that anyone would want this exact format, but there is no problem with the syntax. The '*#' means 'right pad with #'. The comma is then left without a preceding '0' or '#', so it is interpreted as a literal comma instead of a thousands separator. I have noticed another (perhaps related problem) that POI doesn't have logic for *parsing* the text representations of data formats. For example, there is currently nothing that relates line 1265 of Workbook (internal representation) to line 118 of BuiltinFormats (Excel GUI representation): http://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/hssf/model/Workbook.java?annotate=782398 http://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/usermodel/BuiltinFormats.java?annotate=782398 ...though these two strings logically refer to the same thing. If someone tried to create their own custom format string (using the Excel GUI syntax), HSSFDataFormat.getFormat(String) won't convert it to the internal format that Excel seems to need. The fix for this should be not too difficult. This page provides a useful description: http://www.ozgrid.com/Excel/CustomFormats.htm BTW - the internal representation of data formats is also visible in the output of BiffViewer. For example: Offset=0x000004E7(1255) recno=51 sid=0x041E size=0x003F(63) [FORMAT] .indexcode = 0x002C .isUnicode = false .formatstring = _("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_) [/FORMAT] Sorry for replying late ... The parsing logic is written as per the Excel formats which has the space. As Michael Zalewski (https://issues.apache.org/bugzilla/show_bug.cgi?id=47282#c2) pointed correctly, after parsing as per current format string in POI, it gives the value as #0.00 at my end. If the space is removed while debugging, it parses the currencies as perfectly. The standard formats does have space after * and I think POI should follow the same. (In reply to comment #7) > The parsing logic is written as per the Excel formats which has the space. Which parsing logic? Excel's? POI's? Have you written your own parsing logic? If so, is it easy enough to share a small part of that code which exposes the bug? > As Michael Zalewski > (https://issues.apache.org/bugzilla/show_bug.cgi?id=47282#c2) > pointed correctly, after parsing as per current format string in POI, > it gives the value as #0.00 at my end. 'it gives' - which app are you talking about? Probably not Excel, because Excel would actually give a result of #,0.00. > If the space is removed while debugging, it parses the currencies as perfectly. I guess you mean 'If the space is *added* while debugging...' > The standard formats does have space after * and I think POI should follow the > same. POI *will* be changed to match Excel. That was never in question. Instead, please help us understand the reasoning behind the change. (In reply to comment #8) > (In reply to comment #7) > > The parsing logic is written as per the Excel formats which has the space. > > Which parsing logic? Excel's? POI's? Have you written your own parsing logic? > If so, is it easy enough to share a small part of that code which exposes the > bug? Vijayendra - The parsing logic is written externally. No Excel's or POI's parsing logic is used. I cannot disclose the code, since it is complex and huge. > > > > > As Michael Zalewski > > (https://issues.apache.org/bugzilla/show_bug.cgi?id=47282#c2) > > pointed correctly, after parsing as per current format string in POI, > > it gives the value as #0.00 at my end. > > 'it gives' - which app are you talking about? Probably not Excel, because > Excel would actually give a result of #,0.00. Vijayendra - The custom application > > > > > If the space is removed while debugging, it parses the currencies as perfectly. > > I guess you mean 'If the space is *added* while debugging...' Vijayendra - Yeah. Sorry. If space *added*. > > > > > The standard formats does have space after * and I think POI should follow the > > same. > > POI *will* be changed to match Excel. That was never in question. Instead, > please help us understand the reasoning behind the change. To summarize the logic, it parse the format string and tries to collect various formats (represented by custom java classes) in sequence, which later helps in formatting the value. The logic is written as per the standard formats. So cannot change it for a single value. So while parsing the string, if the space is not found, it makes the formatting messy. The issue with missing blanks in the formats seems to have been fixed via bug 55730, I could not see what else is missing in this old issue. Therefore I am closing it for now, please report new bugs for any things that are still not solved here. |