Bug 62307 - XSSFCell.getNumericCellValue() behaves differently when formula is set or not set
Summary: XSSFCell.getNumericCellValue() behaves differently when formula is set or not...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 4.0.x-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-17 09:11 UTC by gallon.fizik@gmail.com
Modified: 2019-01-03 00:16 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description gallon.fizik@gmail.com 2018-04-17 09:11:39 UTC
Javadoc for Cell.getNumericValue() says:
* For strings we throw an exception. For blank cells we return a 0.
* For formulas or error cells we return the precalculated value;
* </p>
* @return the value of the cell as a number
* @throws IllegalStateException if the cell type returned by {@link #getCellType()} is {@link CellType#STRING}

E.g., javadoc doesn't specify a behavior for boolean cells.

@Override
public double getNumericCellValue() {
    CellType cellType = getCellType();
    switch(cellType) {
        case BLANK:
            return 0.0;
        case FORMULA:
            // fall-through
        case NUMERIC:
            if(_cell.isSetV()) {
               String v = _cell.getV();
               if (v.isEmpty()) {
                   return 0.0;
               }
               try {
                  return Double.parseDouble(v);
               } catch(NumberFormatException e) {
                  throw typeMismatch(CellType.NUMERIC, CellType.STRING, false);
               }
            } else {
               return 0.0;
            }
        default:
            throw typeMismatch(CellType.NUMERIC, cellType, false);
    }
}

Consequences:
1. BLANKs always return 0, regardless of the formula.
2. If formula is set, ANY value type is parsed as double, i.e. numerics, booleans and errors. If the string value representation is not parsable as double, NumberFormatException is thrown. 
3. If formula is not set, then string, boolean and error types result in IllegalStateException.

Although this logic corresponds to javadoc (except for booleans), it seems really weird that it behaves in such different ways depending on the formula. If this is really intended behavior, then soory to bother you. Otherwise, I propose a fix:

@Override
    public double getNumericCellValue() {
        CellType valueType = getCellType() == CellType.FORMULA ? getCachedFormulaResultType() : getCellType();

        switch(valueType) {
            case BLANK:
                return 0.0;
            case STRING:
                throw typeMismatch(CellType.NUMERIC, CellType.STRING, false);
            case ERROR:
                // Errors are stored as strings, so we cannot parse them directly.
                // We could return an error code, but it's confusing.
                throw typeMismatch(CellType.NUMERIC, CellType.ERROR, false);
            case BOOLEAN:
                // fall-through
            case NUMERIC:
                if(_cell.isSetV()) {
                    String v = _cell.getV();
                    if (v.isEmpty()) {
                        return 0.0;
                    }
                    // 'Cannot get NUMERIC from NUMERIC' seems weird. I propose to replace it with
                    // IllegalStateException("Invalid string representation of a numeric: " + v)
                    // Or let the exception propagate. Anyway, if the internal value representation is broken, means 
                    // that something is really wrong in POI code.
                  
                    return Double.parseDouble(v);
               
                } else {
                    return 0.0;
                }
            default: // accounts for _NONE and FORMULA
                throw new IllegalStateException("this should never happen");
        }
    }

I will be able to provide a set of test cases later, if necessary.
Comment 1 gallon.fizik@gmail.com 2019-01-01 16:25:23 UTC
Bump.

Abstract Cell javadoc:
    /**
     * Get the value of the cell as a number.
     * <p>
     * For strings we throw an exception. For blank cells we return a 0.
     * For formulas or error cells we return the precalculated value;
     * </p>
     * @return the value of the cell as a number
     * @throws IllegalStateException if the cell type returned by {@link #getCellType()} is {@link CellType#STRING}
     * @exception NumberFormatException if the cell value isn't a parsable <code>double</code>.
     * @see DataFormatter for turning this number into a string similar to that which Excel would render this number as.
     */
    double getNumericCellValue();

The javadoc is missing the following cases:
1. BLANK: implemented in code, returns zero.
2. ERROR: implemented in code, throws ISE (IllegalStateException).
3. BOOLEAN missing from both javadoc and behaves differently depending on the implementation and whether formula is set:

           
formula set | HSSF | XSSF | SXSSF |
===================================
   no       | ISE  | ISE  |  ISE  |
   yes      | ISE  | cast |  ISE  |

What's the criterion that a cellType is convertible to numeric? These methods are not used by the evaluator (the latter has format-independent conversion logic) but may be used as a part of public API... but for what?

I guess the least invasive and most consistent way is to alter XSSF so that calling getNumericCellValue on a cell with BOOLEAN cached result type produced ISE.

Maybe it makes sense to prohibit *any* implicit typecasts in Cell#get*Value in the future.

Any comments?
Comment 2 PJ Fanning 2019-01-01 18:47:27 UTC
I think we need to be pragmatic - breaking compatibility with previous versions has its consequences.
Comment 3 gallon.fizik@gmail.com 2019-01-02 20:47:30 UTC
Fixed via r1850207.
Comment 4 Greg Woolsey 2019-01-03 00:16:58 UTC
Thanks for this, I hadn't realized this was one reason I have some custom evaluation code (formatting requirements are another, but unrelated).  I'll test eventually, but with this I may be able to remove some code in my downstream project.