Bug 61474

Summary: Adjusting of formulas in context of column shifting
Product: POI Reporter: Dragan Jovanović <drjovanovic>
Component: SS CommonAssignee: POI Developers List <dev>
Status: NEW ---    
Severity: enhancement CC: drjovanovic
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: new class CompleteFormulaShifter, and two minor changes, in patch format
partial test case for CompleteFormulaShifter
patch281
patch282
patch283
patch284
patch285
Whole patch for current state of column shifting
whole patch for current state of branch

Description Dragan Jovanović 2017-08-31 12:37:55 UTC
In POI there is class FormulaShifter which deals with problem of adjusting cell references inside formulas in case of shifting or copying of rows. This works fine, but only for rows. 
There is completely same need for reference-adjusting in the case of shifting or copying of columns, and it is not implemented.
Algorithm for formula adjusting is pretty complex, with a lot of use cases, so it would be good to use existing algorithm for columns, and not create an other one.

Solution :
I've created class CompleteFormulaShifter, as an improved version of FormulaShifter, with following enhancements :
- it takes formula string as input, not a list of Ptgs;
- it can work in both column and row modes.
Column mode is implemented in a simple way : 
1. transpose cell references (switch column and row coordinates);
2. adjust references using existing algorithm for shifting of rows;
3. transpose new cell references back.
Comment 1 Dragan Jovanović 2017-08-31 12:40:09 UTC
In order to make this work, I have made two minor changes in some old classes. In FormulaShifter,  private static enum ShiftMode became public.
In HSSFCell,  protected CellValueRecordInterface getCellValueRecord() became public.
Comment 2 PJ Fanning 2017-08-31 18:53:05 UTC
Dragan - there appears to be no patch attached. If you are still working on this, could you add your patch as a Pull Request in github. It makes it easier to review the patch. https://github.com/apache/poi
Comment 3 Dragan Jovanović 2017-09-04 12:39:29 UTC
Created attachment 35287 [details]
new class CompleteFormulaShifter, and two minor changes, in patch format
Comment 4 Dragan Jovanović 2017-09-04 12:45:54 UTC
I could not create pull request (I tried to push changes, and got refused), so I am sending a patch with changes.
Comment 5 PJ Fanning 2017-09-04 13:51:51 UTC
Dragan - could you add unit test coverage?
Comment 6 Dragan Jovanović 2017-09-05 09:50:31 UTC
Created attachment 35294 [details]
partial test case for CompleteFormulaShifter

Added junit test for CompleteFormulaShifter. 
It's work in progress and it's not very detailed at the moment, but it should be good enough to help you conclude is CompleteFormulaShifter conceptually good.
Comment 7 Javen O'Neal 2017-09-19 02:56:42 UTC
(In reply to Dragan Jovanović from comment #3)
> Created attachment 35287 [details]
> new class CompleteFormulaShifter, and two minor changes, in patch format

CompleteFormulaShifter appears to duplicate what some of the existing code already does. I'd like to see this contribution have a neutral or a net negative SLoC on the codebase.

I agree that I'd like to see column shifting (moving, copying, inserting, deleting) implemented eventually. There are still quite a few open issues with row shifting. Besides the open bugs, the API is weak and should be improved (rewritten). I think it'd be wise to resolve the duplicated row-shifting code in HSSF and XSSF and get a solid row-shifting API implemented before moving on to column shifting, unless we can do so without increasing the amount of buggy code with a poor API. Column operations are an order of magnitude more difficult in POI than row operations due to the row-major storage.

Open bugs with "shift" in title:
bug 46742: Tracker: Remaining functionality for Sheet.shiftRows()
bug 56454: XSSFSheet.shiftRows(...) and HSSFSheet.shiftRows(...) incorrectly handle merged regions that do not contain column 0.
bug 59733: shiftRows() causes org.apache.xmlbeans.impl.values.XmlValueDisconnectedException
bug 57423: shiftRows() produces a corrupted xlsx file
bug 60072: Sheet shiftRows doesn't trigger charts position updating
bug 59731: Consolidate duplicated code for row shifting
bug 53320: FormulaShifter doesn't take care to absolute references
bug 58221: Shifting rows while trying to sort causes XmlValueDisconnectedException
bug 59306: Shifting rows does not shift chart data series references
bug 56123: ShiftRows, Bug in POI 3.10 Beta 2: "Could not find 'internal references' EXTERNALBOOK"
bug 54509: Cells in shifting rows are not used in column-referencing formulas.
bug 54533: Shifting rows does not shift the Zero Height flag
Comment 8 Dragan Jovanović 2017-09-19 10:39:58 UTC
Thanks for comment. You're right that there is duplication in the code. I would be willing to work on this to eliminate the duplication but that would probably not result in net neutral SLoC because there is additional functionality. Column shifting certainly is useful functionality which  provides value and which we and others need. We will definitely be using it for our application context. 

I have taken a look at the issues that you have listed to see which issues are really relevant. 46742 is certainly relevant as it sketches additional functionality that would be useful. But this is really independent from column shifting. Column shifting does not make these items much easier or harder to fix. Many of the other bugs do not seem to really be very relevant (i.e. 53320, 54509). Unfortunately my employer is not very likely to support work on these issues. 

So overall I see this as your call. I can do work on eliminating the duplications.  

Maybe I would be willing to fix some of the open issues on my own time. But if you want to consider column shifting only after all the earlier bugs have been resolved, that would make little sense to me.
Comment 9 Javen O'Neal 2017-09-19 15:52:22 UTC
Let's rewind to what signature shifting columns should have.

Sheet.shiftRows(int startRow, int endRow, int n[, boolean copyRowHeight])

I wrote copyRows with a CellCopyPolicy object to avoid having a long signature. Do we want something similar for a new shiftRows and shiftColumns method?

Usually when I use shiftRows, I have to shift from some insertion point to the last row in the workbook so I don't shift rows onto existing rows. Can shiftRows/shiftColumns be used with n>0 and endRow!=last row number?

Can we make shiftRows/shiftColumns behave as close to Excel as possible to make it as intuitive to use as possible? I favor calling it insert rows instead of shift rows since it's clearer that it isn't copying or cutting (though we should provide that ability).

If all you have time for now is a clone of shiftRows behavior, we'll commit that and incrementally improve it later if there's interest. It just means breaking the API at a later date.
Comment 10 Dragan Jovanović 2017-09-21 11:38:34 UTC
I can work on Sheet.shiftColumns() method now.

At the moment I am not interested in dealing with CellCopyPolicy details, so I’d just implement Sheet.shiftColumns(int startColumn, int endColumn, int n), and leave CellCopyPolicy for later.
I see there is much more to shift rows/couimns than just adjust formulas, so I’ll have to take care of whole RowShifter class(es). I’ll try to make one XSSF class which will work as both row shifter and column shifter, and if I run into complications, maybe I’ll make completely new ColumnShifter. My focus is on XSSF now, and maybe later we can think about HSSF variation. If you have some hints about possible hidden issues which I have to take care of while working on this, please let me know.

I would like to keep that endRow/endColumn parameter just for the case (though I am not aware of any scenarios where it is needed). I hope it won’t make us a big complication.

I agree that people will often look for an insertRows/insertColumns method rather than a shiftRow/shiftColumn method. But shifting can be made in both directions because it is also used for deletion. A negative value for an insert method step parameter is not very intuitive, so I would prefer keeping the names as they are. 

Also, if you don’t mind, I would like to merge some methods from my CompleteFormulaShifter class content into FormulaShifter, because, in its essence, CompleteFormulaShifter is an upgraded version od FormulaShifter, I just did not want to touch existing class at the very begining. In this way, POI clients may use continue using 
FormulaShifter class, and avoid thinking about improvements.


So my plan now could be :
- merge CompleteFormulaShifter class into FormulaShifter; 
- upgrade XSSFRowShifter to make it work with both rows and columns; 
- maybe move some code from  XSSFSheet.shiftRows() method to XSSFRowShifter class, just to make code more clear;
- implement XSSFSheet.shiftColumns(int startColumn, int endColumn, final int n) method.

Does that sound ok ?
Comment 11 Dragan Jovanović 2017-09-27 11:42:07 UTC
Started working on this.

Moved some logic from CompleteFormulaShifter to FormulaShifter, so now FormulaShifter itsself deals with both rows and columns, and transposing.
Removed parsing logic which was in CompleteFormulaShifter, now again HSSF and XSSF pass ptg[] objects to FormulaShifter, so there is no more duplication of code.
Introduced class XSSFColumnShifter which should work analog to XSSFRowShifter.
Created new class XSSFFormulaShiftingManager. Moved all formula-shifting related methods from XSSFRowShifter to XSSFFormulaShiftingManager. Now both row shifter and column shifter use XSSFFormulaShiftingManager for formula shifting (logic is perfectly same in both cases).
Implemented cell-shifting logic in XSSFColumnShifter. 
Introduced XSSFSheet.shiftColumns() method, similar to XSSFSheet.shiftRow(). At the moment, shiftColumns() can shift cell and formulas. I also have a brief unit test for this.

Now I have to deal with all other stuf which exists in shiftRow() method – removing of overwritten items, updateNamedRanges, shiftMergedRegions, updateConditionalFormatting...
I see there are no unit tests which cover those subjects in shiftRow(). If you are aware of some other existing tests which can help me test this logic for column shifting, please let me know.
Comment 12 Javen O'Neal 2017-09-27 15:32:38 UTC
(In reply to Dragan Jovanović from comment #10)
> So my plan now could be :
> - merge CompleteFormulaShifter class into FormulaShifter; 
> - upgrade XSSFRowShifter to make it work with both rows and columns; 
> - maybe move some code from  XSSFSheet.shiftRows() method to XSSFRowShifter
> class, just to make code more clear;
> - implement XSSFSheet.shiftColumns(int startColumn, int endColumn, final int
> n) method.
> 
> Does that sound ok ?

Sounds good
Comment 13 Javen O'Neal 2017-09-27 15:35:15 UTC
(In reply to Dragan Jovanović from comment #11)
> Started working on this.

Do you have a work-in-progress patch that is shareable yet?
Comment 14 Javen O'Neal 2017-09-27 16:23:52 UTC
(In reply to Dragan Jovanović from comment #11)
> I see there are no unit tests which cover those subjects [updateNamedRanges, shiftMergedRegions, updateConditionalFormatting] in shiftRow(). If
> you are aware of some other existing tests which can help me test this logic
> for column shifting, please let me know.

There are a few spots...

$ trunk grep -l -r shiftRows src/testcases src/ooxml/testcases  
src/testcases/org/apache/poi/ss/usermodel/BaseTestSheetUpdateArrayFormulas.java
src/testcases/org/apache/poi/ss/usermodel/BaseTestSheetShiftRows.java
src/testcases/org/apache/poi/ss/usermodel/BaseTestConditionalFormatting.java
src/testcases/org/apache/poi/ss/usermodel/BaseTestSheet.java
src/testcases/org/apache/poi/hssf/usermodel/TestBugs.java
src/testcases/org/apache/poi/hssf/usermodel/TestHSSFHyperlink.java
src/ooxml/testcases/org/apache/poi/xssf/usermodel/TestXSSFSheetShiftRows.java
src/ooxml/testcases/org/apache/poi/xssf/usermodel/TestXSSFBugs.java
src/ooxml/testcases/org/apache/poi/xssf/usermodel/TestXSSFWorkbook.java
src/ooxml/testcases/org/apache/poi/xssf/usermodel/TestUnfixedBugs.java
Comment 15 Dragan Jovanović 2017-09-28 13:45:59 UTC
Created attachment 35382 [details]
patch281
Comment 16 Dragan Jovanović 2017-09-28 13:46:38 UTC
Created attachment 35383 [details]
patch282
Comment 17 Dragan Jovanović 2017-09-28 13:47:42 UTC
Created attachment 35384 [details]
patch283
Comment 18 Dragan Jovanović 2017-09-28 13:48:14 UTC
Created attachment 35385 [details]
patch284
Comment 19 Dragan Jovanović 2017-09-28 13:48:36 UTC
Created attachment 35386 [details]
patch285
Comment 20 Dragan Jovanović 2017-09-28 13:56:50 UTC
I have just uploaded 5 patch files so you can apply them. I had some complications with git while working on this, so I am not 100% this will work immediately. If you face problems, you can also find me at skype, as zmau3012.


Meanwhile I have moved updateConditionalFormatting() and updateHyperlinks() to other class, so they can be used for both rows and columns. As far as I can see, they should work perfectly same for column shifting, as for row shifting.
Processing of comments is work in progress, just ignore it now.

Thanks for TestXSSFSheetShiftRows, I really needed it. It works the same way as it did before my refactorings.
Comment 21 Dragan Jovanović 2017-10-13 12:32:24 UTC
Created attachment 35417 [details]
Whole patch for current state of column shifting
Comment 22 Dragan Jovanović 2017-10-13 12:41:27 UTC
I have just added a complete patch for column shifting, so, if you applied my previous patches, please revert them. 
I have merged changes that took place in project meanwhile, so this patch should not make problems with that.
This version should cover all basic shifting issues which we mentioned before.
I would like to flag some methods as deprecated, but I did not flag them (before we agree on that), I have just left code comments about that.


After successful apply and build, execute class XSSFColumnShifterTest and make sure all tests pass.
Comment 23 Dragan Jovanović 2017-10-24 14:06:45 UTC
Created attachment 35454 [details]
whole patch for current state of branch

Meanwhile I have not received any feedback, but I have continued working on this occasionally. Attached is a full patch against the Poi Trunk so that you don't have to apply multiple patches when you look at the code. Please give me some feedback once you take a look at this.
Also, it may be easier if I had some write access to poi repository, so I could just push my changes to my branch.
Comment 24 Javen O'Neal 2017-10-24 22:13:26 UTC
(In reply to Dragan Jovanović from comment #23)
> Meanwhile I have not received any feedback, but I have continued working on
> this occasionally.

I will look through your patch when I have free time. This is a volunteer project that I work on outside my day job.

> Attached is a full patch against the Poi Trunk so that
> you don't have to apply multiple patches when you look at the code. Please
> give me some feedback once you take a look at this.
> Also, it may be easier if I had some write access to poi repository, so I
> could just push my changes to my branch.

In the mean time, if you want to be able to submit patches that depend on each other without making each patch cumulative, you could create a fork in GitHub (GitHub.com/Apache/poi), commit to that, and create a pull request as needed.

Github's tools for inline review are pretty good, and it'd make it easier on me and other potential reviewers to give you the feedback you're seeking.
Comment 25 Dragan Jovanović 2017-10-25 08:30:50 UTC
Thanks for feedback, Javen. Forking is exactly what I need. Gonna make it these days, and will send you pull request.
Comment 26 Dragan Jovanović 2017-11-01 14:54:53 UTC
pull request :
https://github.com/apache/poi/pull/81
Comment 27 Dragan Jovanović 2017-12-11 15:12:19 UTC
Let us sum up what's done, and what's left to do :
- Javen implemented adjusting for columns in FormulaShifter class. Those changes are on trunk.
- I have imlpemented XSSFSheet.shiftColumns() method which moves columns, and also processes other things (updateFormulas, shiftMergedRegions, updateConditionalFormatting, updateHyperlinks...) in the same way as XSSFSheet.shiftRows(). Those changes are on fork, and wait to be moved onto trunk.
There are junit tests covering various parts of this algorithm. I have also checked old unit tests which cover shiftRows(), and they behave the same way as they did before.

- In my project I also have an algorithm which copies range of cells to other location. If you're interested, I can think about embedding  that algorithm into POI.

What is the plan for code which waits on fork, on pull request https://github.com/apache/poi/pull/81 ?
Comment 28 Dragan Jovanović 2018-01-22 13:40:51 UTC
Comment on attachment 35454 [details]
whole patch for current state of branch

There is a pull request for this : https://github.com/apache/poi/pull/81, so this patch is not needed. It is also out of date.
Comment 29 PJ Fanning 2018-04-02 13:13:22 UTC
https://github.com/apache/poi/pull/81 was merged