Bug 59791

Summary: Convert Cell Type to an enum
Product: POI Reporter: Javen O'Neal <onealj>
Component: SS CommonAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: enhancement    
Priority: P2    
Version: 3.15-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Bug Depends on:    
Bug Blocks: 59836    
Attachments: Improve backwards compatibility

Description Javen O'Neal 2016-07-04 09:27:28 UTC
Methods that require a cell type use Cell's CELL_TYPE_* integer constants. Classes that use these constants are untyped, making it possible to have latent failures for invalid constants or bloating the code with boilerplate value checking.

Cell types should be stored in an enum for type safety and make method signatures more meaningful.
Comment 1 Javen O'Neal 2016-07-04 09:45:54 UTC
This is a pretty hefty change. It breaks binary compatibility as the getCellType-like methods return o.a.p.ss.usermodel.CellType enum rather than int, and I didn't want to bloat the classes with getCellTypeType()-like methods.
Cell#setCellType(int) is one of the few methods that was able to maintain backwards compatibility. Nonetheless, if users used Cell.CELL_TYPE_* constants rather than literals, their code will likely work without modification.

It should be pretty straight-forward to replace all "Cell.CELL_TYPE_" strings with "CellType." in the user's code.

Applied in r1751237.
Comment 2 Nick Burch 2016-07-04 09:55:15 UTC
Can we do this in a way that won't break backwards-compatibility? The use of those int constants is something that goes back to the very earliest days of Apache POI, and will be used by a huge number of programs out there. They'll be used for both getters and setters, and in some cases I've spotted (on the mailing list and on stackoverflow, amongst others), people have actually hard-coded the int values like 2 and 3 in their code! :/

If we break source compatibility on this, we risk lots of people not being able to upgrade. If there's a source-breaking change, I'd suggest we wait for 4.0 when we're breaking CT stuff as well
Comment 3 Javen O'Neal 2016-07-04 10:16:37 UTC
Replaced all references to Cell#CELL_TYPE_* with CellType.* in r1751240
Comment 4 Javen O'Neal 2016-07-04 10:37:36 UTC
(In reply to Nick Burch from comment #2)
> Can we do this in a way that won't break backwards-compatibility?

Most of the changes were internal to POI. The user-facing changes were mostly limited to [Cell|HSSFCell|XSSFCell|SXSSFCell|EvaluationCell].[constructor|getCellType|setCellType|getBaseCellType].
There's not much backwards-compatibility breakage as far as the API goes, it's just likely that these changes will impact nearly every POI user because of code that looks like:
switch(cell.getCellType()) {
    case NUMERIC:
        return cell.getNumericCellValue();
    case STRING:
        return cell.getStringCellValue();
    ...
}

I'm afraid that we're still a couple years away from seeing POI 4.0 and don't think this change is worth putting off that long. I'm always cautious to use literals in my code, and I would hope that most POI users who are interested in writing forward-compatible code would use the constants we provide rather than hard-coding literals.

If the user-facing changes are as limited as I think they are, we could revert the few public-facing methods to have signatures that operate on ints, but use CellType anywhere internally, then make the final change when we begin work on POI 4.0.

Perhaps a discussion on the users/dev list would help us figure out whether moving forward with int->CellTypes is preferred by the community.
Comment 5 Nick Burch 2016-07-04 10:58:49 UTC
You're not the standard of coder we need to worry about! ;-)

Can we do this in a way where we can use the enums internally in POI, allow users who want to switch to use enums, but hold-off breaking changes for everyone else until 4?

(My view is that we're likely to have a 3.x and a 4.x branch in parallel for some time, as I can't see us being able to make the switch that quickly and we don't want to stop all other development for many many months!)
Comment 6 Javen O'Neal 2016-07-04 12:01:39 UTC
(In reply to Javen O'Neal from comment #4)
> If the user-facing changes are as limited as I think they are, we could
> revert the few public-facing methods to have signatures that operate on
> ints, but use CellType anywhere internally.

Reverted Cell and EvaluationCell in r1751256.

This preserves binary backwards-compatibility with previous versions of POI. Hopefully by having a few versions where the int getters and setters are deprecated we can complete the transition before POI 4.0.

Currently the code looks like this:

interface Cell or EvaluationCell {
    int getCellType();

    @deprecated will be deleted when we make the CellType enum transition
    @Internal
    CellType getCellTypeEnum();

    int getCachedFormulaResultType();

    @deprecated will be deleted when we make the CellType enum transition
    @Internal
    CellType getCachedFormulaResultTypeEnum();

    @deprecated
    void setCellType(int);

    void setCellType(CellType);
}
Comment 7 Javen O'Neal 2016-07-04 12:36:14 UTC
Similar changes to FormulaEvaluator interface for backwards compatibility in r1751261 and r1751264.

interface FormulaEvaluator {
    int evaluateFormulaCell(Cell);

    @deprecated will be deleted when we make the CellType enum transition
    @Internal
    CellType evaluateFormulaCellEnum(Cell);
}
Comment 8 Javen O'Neal 2016-07-04 13:15:48 UTC
Should we add some kind of warning or decoration to methods that will change their return value so that users can get a heads up before the signature changes?
@deprecation implies the method is going away, but has warnings that integrate well with IDEs and the project can be built to treat deprecation warnings

Searching for all public methods using CellType
setters:
$ grep -nr --exclude-dir=".svn" -P "public [^\(]+\([^\)]*CellType [^\)]+\)" src
getters:
$ grep -nr --exclude-dir=".svn" -P "public [^\(]*CellType [^[(]*\([^\)]*\)" src

With r1751273, I believe I have reverted all public constructors and methods to be backwards compatible (at the binary level). There were some protected methods or inner classes that seemed unlikely to be used in projects where changing the return type would be a concern.
Comment 9 Nick Burch 2016-07-04 13:44:53 UTC
You could try asking the Commons folks - they're the most likely to have gone through something similar amongst the Apache Java communities!
Comment 10 Javen O'Neal 2016-08-22 17:53:08 UTC
Created attachment 34170 [details]
Improve backwards compatibility

This patch improves backwards compatibility (so that Cell.CELL_TYPE_* still returns an int, works in a switch statement in Java 6, and has the same data type as Cell.getCellType()).

Old usage:
int Cell.CELL_TYPE_*
int Cell.getCellType()
Cell.setCellType(int)

New usage:
CellType CellType.*
CellType Cell.getCellTypeEnum()
Cell.setCellType(CellType)

Both usages are supported with this patch.
Comment 11 Javen O'Neal 2016-08-22 17:58:28 UTC
Committed attachment 34170 [details] from comment #10 in r1757235.
Comment 12 Javen O'Neal 2016-08-22 18:15:25 UTC
Added test for hard-coded int literals in r1757237
Comment 13 Javen O'Neal 2016-09-13 23:26:16 UTC
Reverted CellValue#getCellType() to return an int as in POI 3.15 beta 2 and prior releases. This is to maintain backwards compatibility between 3.15 and 3.14. r1760607.