Bug 52592 - DataFormatter appends asterisk to currency data
Summary: DataFormatter appends asterisk to currency data
Status: NEEDINFO
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.10-dev
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks: 55265
  Show dependency tree
 
Reported: 2012-02-03 16:02 UTC by bscartine
Modified: 2015-11-04 17:44 UTC (History)
1 user (show)



Attachments
comparison with Microsoft CSV (1.97 KB, text/html)
2013-07-26 08:13 UTC, Evgeniy Buyanov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bscartine 2012-02-03 16:02:48 UTC
The method formatCellValue(cell) is returning the cell value with a leading "*" character.

I tested the same spreadsheet with version 3.7 and did not have that problem.

If I create the DataFormatter with the 'emulateCSV' option, that fixes the problem.
Comment 1 bscartine 2012-02-03 16:03:43 UTC
I should clarify that this happens for cells that have type NUMERIC and that were formatted in Excel as Currency with a dollar sign.
Comment 2 Nick Burch 2012-02-03 16:13:20 UTC
Are you able to create a simple junit unit test that demonstrates the problem?
Comment 3 bscartine 2012-02-03 18:03:44 UTC
Here is a test case.  I created a worksheet and entered some numbers.
A:1 =  36,353,425 
B:1 =  456,789 

and formatted the cells as "User Defined", format (1,1234)

which means that negative numbers are displayed parentheses


The cell style format string is

_(* #,##0_);_(* \(#,##0\);_(* "-"_);_(@_)


public class TestDataFormatter extends TestCase  {

    private Workbook workbook;
    private DataFormatter formatter;

    protected void setUp(){
        try {
            File file = new File("contrib/numbers.xls");
            this.workbook = WorkbookFactory.create(file);
            this.formatter = new DataFormatter(); // (true);
        }
        catch (Exception e){
            e.printStackTrace();
        }
    }
    
    public void testDataTypes(){
        Sheet sheet = this.workbook.getSheetAt(0);
        System.out.println("Sheet: " + sheet.getSheetName());
        int lastRowNum = sheet.getLastRowNum();
        for(int j = 0; j <= lastRowNum; j++) {
            Row row = sheet.getRow(j);
            parseRow(row);
        }
    }
    
    private void parseRow(Row row){
        int lastCellNum = row.getLastCellNum();
        for(int i = 0; i <= lastCellNum; i++) {
            Cell cell = row.getCell(i);
            if (cell != null){
                String val = (this.formatter.formatCellValue(cell));
                System.out.println(row.getRowNum() + ":" + cell.getColumnIndex() + " " + val);
                if (cell.getCellType() == Cell.CELL_TYPE_NUMERIC){
                    assertTrue (val.indexOf("*") < 0);
                }
            }
        }
    }
    
}
Comment 4 Nick Burch 2012-02-03 18:24:16 UTC
I may be missing something, but your format contains a leading * character, so isn't having a * character in the formatted value correct?
Comment 5 bscartine 2012-02-03 19:08:26 UTC
Yes, that is interesting.  I hadn't noticed that before.
This formula was in a spreadsheet supplied by the client.
However, the asterisk does not display in Excel.

I guess I'll just stick with version 3.7 for now, since that DataFormatter does not output the "*"
Comment 6 Nick Burch 2012-02-04 13:36:12 UTC
We could change it to exclude the asterisk, but first we'd want to know why! Are you able to find any documentation on what it does?
Comment 7 bscartine 2012-02-06 22:07:07 UTC
If the format string is something similar to a RegEx expression, then I think the "*" would just indicate one or more of any character.  Similary, in the format string, the "#" character indicates a number.
Comment 8 Mark B 2012-02-07 12:32:25 UTC
The asterisk character is used to indicate that the contents of the cell should be padded to length - that is should fill the cell completely. The pad character is the character that follows the asterisk in the formatting string.

Take a look here http://office.microsoft.com/en-us/excel-help/number-format-codes-HP005198679.aspx - under the Text and Spacing heading there is a sub-heading 'Repeating Characters', expand this and you should see an explanation of how the asterisk is used.
Comment 9 Evgeniy Buyanov 2013-07-23 23:37:49 UTC
    private String cleanFormatForNumber(String formatStr) {
        StringBuffer sb = new StringBuffer(formatStr);

        if (emulateCsv) {
            // Requested spacers with "_" are replaced by a single space.
            // Full-column-width padding "*" are removed.
            // Not processing fractions at this time. Replace ? with space.
            // This matches CSV output.
            for (int i = 0; i < sb.length(); i++) {
                char c = sb.charAt(i);
                if (c == '_' || c == '*' || c == '?') {
                    if (i > 0 && sb.charAt((i - 1)) == '\\') {
                        // It's escaped, don't worry
                        continue;
                    }
                    if (c == '?') {
                        sb.setCharAt(i, ' ');
                    } else if (i < sb.length() - 1) {
                        // Remove the character we're supposed
                        //  to match the space of / pad to the
                        //  column width with
                        if (c == '_') {
                            sb.setCharAt(i + 1, ' ');
                        } else {
                            sb.deleteCharAt(i + 1);
                        }
                        // Remove the character too
                        sb.deleteCharAt(i); 
                        i--; // !!!!!!!!!!!!!!! LOST !!!!!!!!!!!!!!!!!!
                    }
                }
            }
        } else {
            // If they requested spacers, with "_",
            //  remove those as we don't do spacing
            // If they requested full-column-width
            //  padding, with "*", remove those too
            for (int i = 0; i < sb.length(); i++) {
                char c = sb.charAt(i);
                if (c == '_' || c == '*') {
                    if (i > 0 && sb.charAt((i - 1)) == '\\') {
                        // It's escaped, don't worry
                        continue;
                    }
                    if (i < sb.length() - 1) {
                        // Remove the character we're supposed
                        //  to match the space of / pad to the
                        //  column width with
                        sb.deleteCharAt(i + 1);
                    }
                    // Remove the _ too
                    sb.deleteCharAt(i);
                    i--; // !!!!!!!!!!!!!!! LOST !!!!!!!!!!!!!!!!!!
                }
            }
        }

        // Now, handle the other aspects like 
        //  quoting and scientific notation
        for(int i = 0; i < sb.length(); i++) {
           char c = sb.charAt(i);
            // remove quotes and back slashes
            if (c == '\\' || c == '"') {
                sb.deleteCharAt(i);
                i--;

            // for scientific/engineering notation
            } else if (c == '+' && i > 0 && sb.charAt(i - 1) == 'E') {
                sb.deleteCharAt(i);
                i--;
            }
        }

        return sb.toString();
    }
Comment 10 Evgeniy Buyanov 2013-07-23 23:53:45 UTC
now for format string _(* after checking _ removed two first chars and i++, so in next step asterisk skipped.
after sb.deleteCharAt(i); need add i--;

Error in the current version of http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/ss/usermodel/DataFormatter.java
Comment 11 Evgeniy Buyanov 2013-07-26 08:13:51 UTC
Created attachment 30630 [details]
comparison with Microsoft CSV

space replaced with &middot;
Comment 12 Evgeniy Buyanov 2013-07-26 08:18:26 UTC
(In reply to Evgeniy Buyanov from comment #10)
> Error in the current version of
> http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/ss/usermodel/DataFormatter.java

1. wrong split format string for p;n;z;t by ';' - quotes for text is ignored.
2. mask for zero number and text part is ignored.
3. special symbols must not work inside text in quotes
4. question mark '?' is not replaced on space ' ' in java.text.Format.format
5. '{' transfer to java.text.Format.format without quotes "'"
6. single quotes "'" need to be doubled
7. can not work for "# ##0,0 " (automatic divide by 1000 in Excel)
8. can not work for "# ####/####"
Comment 13 Nick Burch 2015-11-04 17:44:36 UTC
There has been quite a few fixes in this area since the bug was raised, including after 3.13 was released

Would it be possible for someone to retest this with a recent nightly build / recent svn checkout build / 3.14 beta 1 once released? And if it remains, a unit test showing the issue would be most helpful for getting it fixed!