Bug 58348

Summary: Add support for copying rows
Product: POI Reporter: Javen O'Neal <onealj>
Component: SS CommonAssignee: POI Developers List <dev>
Status: NEW ---    
Severity: enhancement CC: fleming
Priority: P2 Keywords: PatchAvailable
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Bug Depends on: 58350, 58439, 58441, 58442, 58572    
Bug Blocks:    
Attachments: Test case workbook
Progress made so far on adding copyRows to XSSF
Progress made so far on adding copyRows to XSSF
[PATCH] copyRows implemented for XSSFWorkbooks

Description Javen O'Neal 2015-09-08 23:05:23 UTC
I am currently working adding the ability to copy rows within POI. I've seen a number of partial solutions online[1][2], but none shift formula references, which is what I need.

I figured this would be a good opportunity to write this into the POI API so others don't have to reinvent the wheel.

Here's the features I'm looking for:
* copies cell values
* copies cell styles
* copies cell formulas, offsetting relative cell references by the distance of the row copy
* copies array formulas
* copies merged cell regions
* copies row height
* copies conditional formatting
* copies tables, pivot tables
* copies hyperlinks
* copies cell comments

For all these options, if an option isn't copied, need to determine if the destination cell maintains its option (for example, cell style), or if the cell is reset to the default value for that option (no style).

Features that we probably don't want/need from shiftRows:
* update Named Regions
* updates formula references that referred to the source range to refer to the target range

The API probably needs to allow the user to choose what gets copied (paste as value only, paste formulas only, paste styles only, paste without styles, etc).

Considerations:
* Needs to have consistent API between HSSFWorkbook, XSSFWorkbook, and SXSSFWorkbook.
* API should allow copying a continuous or discontinuous list of rows from a different sheet or different workbook into existing sheet.
* What is the desired behavior for copying discontinuous rows? Would destination rows be continuous, or would discontinuities be copied over. If we disallow discontinuous row copies, developer would need to make multiple copyRows calls. This seems acceptable.
* What is the desired behavior for how to copy non-monotonic discontinuous rows? Is it okay to not support this?

//shiftRows-like API, doesn't meet the inter-sheet copying criteria
Interface Sheet {
    public void copyRows(int startRow, int endRow, int n);
}

//Better API:
Interface Sheet {
    public void copyRows(List<Rows> srcRows, int startRow, CellCopyOptions options);
    //helper method needed to make first argument of copyRows less painful.
    public List<Row> getRows(int firstRow, int lastRow); //similar to List.subList
}

// Where should this class live? Does a similar class already exist?
SomeUtility {
    class CellCopyOptions {
        boolean cellValues;
        boolean cellStyles;
        boolean cellFormulas;
        boolean mergedRegions;
        boolean rowHeight;
        boolean tables;
        ...
    }
}

How should getRows work for blank rows? If copying all rows on one sheet to another sheet, this could mean creating a list of 1 million rows if the List has null entries to represent blank rows. Seems wasteful if the sheet only has content on the first and last row. If it returns a List with blank rows removed, the caller would need to rely on row.getRowNum() to determine this. We could return a SortedMap<Int, Row> or other sparse structure which is convenient to subdivide, but this could cause problems if a row's row number changes after the SortedMap is created. I'm leaning towards a List that excludes nulls, which would make behavior match Sheet.rowIterator, and rely on row.getRowNum() to determine row number.
Until we decide if copyRows will work on discontinuous or non-monotonic rows, we could check the input and throw an exception. What is the missing row policy for copyRows? Consolidate rows in destination, or leave room for blank rows (related to how getRows is implemented).


[1] http://stackoverflow.com/questions/5785724/how-to-insert-a-row-between-two-rows-in-an-existing-excel-with-hssf-apache-poi
[2] http://www.zachhunter.com/2010/05/npoi-copy-row-helper/
Comment 1 Javen O'Neal 2015-09-13 12:13:24 UTC
Created attachment 33102 [details]
Test case workbook

Adding workbook to use for unit tests
Comment 2 Javen O'Neal 2015-09-14 07:29:53 UTC
Created attachment 33105 [details]
Progress made so far on adding copyRows to XSSF

HSSF and SXSSF implementations will follow once I have XSSF row/cell copy done.
Comment 3 Javen O'Neal 2015-09-22 03:59:31 UTC
Created attachment 33130 [details]
Progress made so far on adding copyRows to XSSF

* Moved several pre-requisites out into their own bugs, marked as blockers for this bug.
* Rebased to r1704452
* still only implemented for XSSFWorkbooks, with placeholders for SXSSFWorkbooks and HSSFWorkbooks.

I am waiting on blockers being resolved before continuing to develop this copyRows feature.
Comment 4 Javen O'Neal 2015-09-23 18:31:12 UTC
Created attachment 33138 [details]
[PATCH] copyRows implemented for XSSFWorkbooks

I think this patch is ready to be deployed for testing with XSSFWorkbooks. copyRows support has been stubbed out for HSSFWorkbooks and SXSSFWorkbooks and will be implemented in a future patch if someone has motivation to write them.
I have annotated most of the new methods with @Beta to indicate the API may change based on your feedback.

Limitations:
* does not evaluate copied formulas or cache the result
* does not copy array formulas
* does not copy PivotTables or Tables
* I have not tested copying rows from an external workbook

After patching with the blocking bugs, this patch has a merge conflict with bug 58439 that will need to be resolved manually (or I can rebase this patch once bug 58439 is fixed).
Comment 5 Javen O'Neal 2015-10-06 19:51:12 UTC
TODO: add unit test for shifting or copying a formula that contains an unregistered user-defined function (bug 58452).
Comment 6 Javen O'Neal 2015-10-11 09:49:53 UTC
TODO: Test for XmlDisconnectedValue errors (learn from shiftRows implementation with bug 53798)
Comment 7 Javen O'Neal 2015-11-02 03:15:36 UTC
Row copy is now ready for testing in XSSFWorkbooks!

Applied changes for row copy support for XSSFWorkbooks in r1711864, r1711866, r1711879, and r1711885 to trunk.
Site docs updated in r1711888.

Several minor supporting changes from attachment 33138 [details] were made in their own commits. When the code for HSSFWorkbooks is written, the code and tests will be moved up to the Sheet/Row/Cell Interface and BaseTestSheet/BaseTestRow/BaseTestCell level.
Comment 8 Javen O'Neal 2015-11-02 03:23:14 UTC
In this first release, the following are implemented and tested:
* copies cell values (all Excel-primitive data types)
* copies cell styles
* copies cell formulas, offsetting relative cell references by the distance of the row copy
* copies merged cell regions

The following are implemented but not tested:
* copies row height

The following are not implemented or tested:
* copies cell comments
* copies array formulas
* copies conditional formatting
* copies tables, pivot tables
* copies hyperlinks
Comment 9 Javen O'Neal 2015-11-02 10:00:54 UTC
Added hyperlink copying and merging (only copies a hyperlink from source if source has a hyperlink) for XSSFWorkbooks in r1711926.