The method formatCellValue(cell) is returning the cell value with a leading "*" character. I tested the same spreadsheet with version 3.7 and did not have that problem. If I create the DataFormatter with the 'emulateCSV' option, that fixes the problem.
I should clarify that this happens for cells that have type NUMERIC and that were formatted in Excel as Currency with a dollar sign.
Are you able to create a simple junit unit test that demonstrates the problem?
Here is a test case. I created a worksheet and entered some numbers. A:1 = 36,353,425 B:1 = 456,789 and formatted the cells as "User Defined", format (1,1234) which means that negative numbers are displayed parentheses The cell style format string is _(* #,##0_);_(* \(#,##0\);_(* "-"_);_(@_) public class TestDataFormatter extends TestCase { private Workbook workbook; private DataFormatter formatter; protected void setUp(){ try { File file = new File("contrib/numbers.xls"); this.workbook = WorkbookFactory.create(file); this.formatter = new DataFormatter(); // (true); } catch (Exception e){ e.printStackTrace(); } } public void testDataTypes(){ Sheet sheet = this.workbook.getSheetAt(0); System.out.println("Sheet: " + sheet.getSheetName()); int lastRowNum = sheet.getLastRowNum(); for(int j = 0; j <= lastRowNum; j++) { Row row = sheet.getRow(j); parseRow(row); } } private void parseRow(Row row){ int lastCellNum = row.getLastCellNum(); for(int i = 0; i <= lastCellNum; i++) { Cell cell = row.getCell(i); if (cell != null){ String val = (this.formatter.formatCellValue(cell)); System.out.println(row.getRowNum() + ":" + cell.getColumnIndex() + " " + val); if (cell.getCellType() == Cell.CELL_TYPE_NUMERIC){ assertTrue (val.indexOf("*") < 0); } } } } }
I may be missing something, but your format contains a leading * character, so isn't having a * character in the formatted value correct?
Yes, that is interesting. I hadn't noticed that before. This formula was in a spreadsheet supplied by the client. However, the asterisk does not display in Excel. I guess I'll just stick with version 3.7 for now, since that DataFormatter does not output the "*"
We could change it to exclude the asterisk, but first we'd want to know why! Are you able to find any documentation on what it does?
If the format string is something similar to a RegEx expression, then I think the "*" would just indicate one or more of any character. Similary, in the format string, the "#" character indicates a number.
The asterisk character is used to indicate that the contents of the cell should be padded to length - that is should fill the cell completely. The pad character is the character that follows the asterisk in the formatting string. Take a look here http://office.microsoft.com/en-us/excel-help/number-format-codes-HP005198679.aspx - under the Text and Spacing heading there is a sub-heading 'Repeating Characters', expand this and you should see an explanation of how the asterisk is used.
private String cleanFormatForNumber(String formatStr) { StringBuffer sb = new StringBuffer(formatStr); if (emulateCsv) { // Requested spacers with "_" are replaced by a single space. // Full-column-width padding "*" are removed. // Not processing fractions at this time. Replace ? with space. // This matches CSV output. for (int i = 0; i < sb.length(); i++) { char c = sb.charAt(i); if (c == '_' || c == '*' || c == '?') { if (i > 0 && sb.charAt((i - 1)) == '\\') { // It's escaped, don't worry continue; } if (c == '?') { sb.setCharAt(i, ' '); } else if (i < sb.length() - 1) { // Remove the character we're supposed // to match the space of / pad to the // column width with if (c == '_') { sb.setCharAt(i + 1, ' '); } else { sb.deleteCharAt(i + 1); } // Remove the character too sb.deleteCharAt(i); i--; // !!!!!!!!!!!!!!! LOST !!!!!!!!!!!!!!!!!! } } } } else { // If they requested spacers, with "_", // remove those as we don't do spacing // If they requested full-column-width // padding, with "*", remove those too for (int i = 0; i < sb.length(); i++) { char c = sb.charAt(i); if (c == '_' || c == '*') { if (i > 0 && sb.charAt((i - 1)) == '\\') { // It's escaped, don't worry continue; } if (i < sb.length() - 1) { // Remove the character we're supposed // to match the space of / pad to the // column width with sb.deleteCharAt(i + 1); } // Remove the _ too sb.deleteCharAt(i); i--; // !!!!!!!!!!!!!!! LOST !!!!!!!!!!!!!!!!!! } } } // Now, handle the other aspects like // quoting and scientific notation for(int i = 0; i < sb.length(); i++) { char c = sb.charAt(i); // remove quotes and back slashes if (c == '\\' || c == '"') { sb.deleteCharAt(i); i--; // for scientific/engineering notation } else if (c == '+' && i > 0 && sb.charAt(i - 1) == 'E') { sb.deleteCharAt(i); i--; } } return sb.toString(); }
now for format string _(* after checking _ removed two first chars and i++, so in next step asterisk skipped. after sb.deleteCharAt(i); need add i--; Error in the current version of http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/ss/usermodel/DataFormatter.java
Created attachment 30630 [details] comparison with Microsoft CSV space replaced with ·
(In reply to Evgeniy Buyanov from comment #10) > Error in the current version of > http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/ss/usermodel/DataFormatter.java 1. wrong split format string for p;n;z;t by ';' - quotes for text is ignored. 2. mask for zero number and text part is ignored. 3. special symbols must not work inside text in quotes 4. question mark '?' is not replaced on space ' ' in java.text.Format.format 5. '{' transfer to java.text.Format.format without quotes "'" 6. single quotes "'" need to be doubled 7. can not work for "# ##0,0 " (automatic divide by 1000 in Excel) 8. can not work for "# ####/####"
There has been quite a few fixes in this area since the bug was raised, including after 3.13 was released Would it be possible for someone to retest this with a recent nightly build / recent svn checkout build / 3.14 beta 1 once released? And if it remains, a unit test showing the issue would be most helpful for getting it fixed!