Bug 59853

Summary: [PATCH] XSSFSheet PivotTable support doesn't support Structured Reference (table) sources
Product: POI Reporter: Greg Woolsey <greg.woolsey>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: enhancement Keywords: PatchAvailable
Priority: P2    
Version: 3.15-dev   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Additional pivot table base data options - Named Range and Table
Updated patch for table/named range based pivot tables

Description Greg Woolsey 2016-07-13 22:12:10 UTC
The current helper methods to create a PivotTable in XSSFSheet have quite a bit of useful logic, dealing with setting up the proper CT* objects and references to
However, these only handle pivot tables based on AreaReferences.  Excel allows PivotTables based on defined names and tables.

These are stored in a different field than the area reference, in CTWorksheetSource.

Areas go in the "ref" attribute, as "A1:D10" style values.

Named Ranges and Table names go instead in the "name" attribute.  As far as I can tell, they overload this field, and you have to check names and tables to find the match, similar to how tables and ranges are both listed in Excel in the Manage Names dialog.

How exactly to do this in POI is the question, and probably merits some discussion. I'm willing to do the work and submit a patch, of course, but the solution should have some level of agreement first to avoid unnecessary churn.

We could augment the current methods with similar ones that take a String name, or add new objects that represent a named range or table.

Then we can refactor the XSSFSheet code a bit to isolate the CTWorksheetSource element creation and avoid duplicating the rest of the setup logic.

Another option could be to just expose XSSFSheet.createPivotTable(), which hooks up a new one to the sheet properly.  Callers would likely then want to call XSSFPivotTable.setDefaultPivotTableDefinition() themselves to avoid problems with incomplete definitions.

I may find other gaps/improvements for XSSFPivotTable as I move into my next phase of investigation :)
Comment 1 Javen O'Neal 2016-07-14 08:48:34 UTC
I prefer methods that use higher level classes (AreaReference) over  lower level classes and primitives (String). There are already too many classes that (incorrectly) implement address/reference parsing when it's not the class's job.

What method signatures are you proposing?
Comment 2 Greg Woolsey 2016-07-14 19:16:17 UTC
Currently XSSFSheet has:

public XSSFPivotTable createPivotTable(AreaReference source, CellReference position, Sheet sourceSheet);
public XSSFPivotTable createPivotTable(AreaReference source, CellReference position);

I think these would work as additional methods, with some refactoring of the private and protected stuff these call:

public XSSFPivotTable createPivotTable(Name source, CellReference position, Sheet sourceSheet);
public XSSFPivotTable createPivotTable(Name source, CellReference position);
public XSSFPivotTable createPivotTable(Table source, CellReference position, Sheet sourceSheet);
public XSSFPivotTable createPivotTable(Table source, CellReference position);

That keeps it using Objects instead of names, even if the underlying structure winds up just using the name.

I think the supporting code can be rearranged to set the reference or name attribute independent of the other setup steps, so those can be common and only the final source association is distinct for the 3 flavors.
Comment 3 Javen O'Neal 2016-07-17 03:23:26 UTC
Your additional createPivotTable(Name, ...) sounds good to me. The fact that OOXML overloads the name field is an implementation detail that users of POI should not need to know.

* Do Tables have an entry in the Name Manager or elsewhere?
* Is there any way to reference a table without having the table object besides creating a new class (TableReference) or falling back to a String for the table name and the Workbook that contains a table with that name?

If not, having createPivotTable(Table, ...) is fine.
Comment 4 Greg Woolsey 2016-07-25 17:33:14 UTC
(In reply to Javen O'Neal from comment #3)
> * Do Tables have an entry in the Name Manager or elsewhere?

Looking for a Table the same way you look for a named range returns no matches, so I don't think they are referenced in any similar fashion in the OOXML or POI.  Excel appears to do its own thing combining them into the same Edit Names dialog - my guess it was an expedient hack to reuse existing UI code.

> * Is there any way to reference a table without having the table object
> besides creating a new class (TableReference) or falling back to a String
> for the table name and the Workbook that contains a table with that name?

Not that I can see.

> If not, having createPivotTable(Table, ...) is fine.

I'll work up a patch with the suggestions I made in this issue.  Been on vacation, have another one coming up, we'll see how much I get done this week on this front and others.
Comment 5 Greg Woolsey 2016-08-08 04:24:43 UTC
Created attachment 34110 [details]
Additional pivot table base data options - Named Range and Table

After more vacations than I thought we could pull off in one summer, I finally got this ready.  There were some interesting wrinkles down in the weeds, but nothing that affected a broad area of the API.

I've refactored the TestXSSFPivotTable class into an abstract base class with the tests, and two concrete subclasses with the setup.  Each class sets up the pivot tables differently, one with area references as before, and one with names (named ranges).  Both pass with no changes, as they should.

I'm open to refactoring ideas for XSSFPivotTable.createSourceReferences() - it now has two mutually exclusive parameters, not my favorite pattern.  However, it's protected, and called from exactly one place, which is also overloaded that way, a private method, XSSFSheet.createPivotTable(...). 

Since those are internal, I was OK with it.
Comment 6 Greg Woolsey 2016-08-14 21:07:07 UTC
(In reply to Greg Woolsey from comment #5)
> Created attachment 34110 [details]
> Additional pivot table base data options - Named Range and Table
> 
Anyone have time yet to look over this patch and apply it if it looks good, or let me know what I missed?
Comment 7 Javen O'Neal 2016-08-14 22:04:42 UTC
We're in a code freeze for 3.15 beta 3 release right now, but I'll take a look once the release is out.
Comment 8 Javen O'Neal 2016-08-14 23:15:22 UTC
> +++ src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFPivotCacheDefinition.java
> table.getName().equals(name)
Are table names case insensitive or could table.getName() ever have different capitalizaion than wsSource.getName() (such as a sheet rename after the cache is built up?) Would equalsIgnoreCase be better here?
Review each String.equals(String) usage to see if it should actually be String.equalsIgnoreCase(String), such as in createPivotTable where sheet names are compared.

I agree with you that a method with mutually exclusive parameters is clunky. Can 
> protected void createSourceReferences(AreaReference sourceRef, String sourceName, CellReference position, Sheet sourceSheet)
be broken into 2 functions? Can sourceName be an XSSFName instead of a String?

Also, could you upgrade the unit tests to junit3 instead of junit4 (example: https://svn.apache.org/viewvc/poi/trunk/src/ooxml/testcases/org/apache/poi/xssf/usermodel/TestXSSFTable.java?view=markup)? If not, I will upgrade them when I commit them.

Otherwise, looks pretty good.
Comment 9 Greg Woolsey 2016-08-15 00:12:17 UTC
(In reply to Javen O'Neal from comment #8)
> > +++ src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFPivotCacheDefinition.java
> > table.getName().equals(name)
> Are table names case insensitive

Oops, yes, names are insensitive, like range names.  I'll find those and fix them.

> 
> I agree with you that a method with mutually exclusive parameters is clunky.
> Can 
> > protected void createSourceReferences(AreaReference sourceRef, String sourceName, CellReference position, Sheet sourceSheet)
> be broken into 2 functions? Can sourceName be an XSSFName instead of a
> String?

It would have to be 3 functions, like the createPivotTable methods in XSSFSheet, with duplicated code.  The XML overloads the same "name" attribute to reference an XSSFName and an XSSFTable.  We could do the 3 object variants, and a functional interface to configure the CTWorksheetSource as needed, passing inline anonymous classes to do that bit of the logic. 

That would maintain Java6 compatibility and let createSourceReferences not care whether it referenced a name or area.
 
> Also, could you upgrade the unit tests to junit3 instead of junit4 (example:
> https://svn.apache.org/viewvc/poi/trunk/src/ooxml/testcases/org/apache/poi/
> xssf/usermodel/TestXSSFTable.java?view=markup)? If not, I will upgrade them
> when I commit them.

I kept the test the same as it was when I refactored, but I could convert to the older format if that's the preferred standard.
Comment 10 Javen O'Neal 2016-08-15 00:42:56 UTC
(In reply to Greg Woolsey from comment #9)
> It would have to be 3 functions, like the createPivotTable methods in
> XSSFSheet, with duplicated code.  The XML overloads the same "name"
> attribute to reference an XSSFName and an XSSFTable.  We could do the 3
> object variants, and a functional interface to configure the
> CTWorksheetSource as needed, passing inline anonymous classes to do that bit
> of the logic. 
What subclasses XSSFPivotTable that would require this method to be overridden or use anonymous classes? I haven't looked that closely at the code yet. Leave it alone for now if you can't find a more complicated solution. As you said, it doesn't matter as much being a protected method.
 
> > Also, could you upgrade the unit tests to junit3 instead of junit4
I mis-typed. We would like to upgrade from junit3 (junit.*, "extends TestCase") to junit4 (org.junit.*, "@Test"). On second thought, I'll take care of the upgrade myself. It'll keep your diff smaller and easier to review.
Comment 11 Greg Woolsey 2016-08-15 19:34:41 UTC
Created attachment 34153 [details]
Updated patch for table/named range based pivot tables

Here is an updated patch for your review.  I refactored some more to get rid of the ugly method signatures with mutually exclusive parameters.  I used a simple functional interface, but with Java 6 compatible anonymous class implementations, not Java 8 Lambda syntax.  Should be easy to read and simplifies things.

I didn't touch the unit tests, per your preference.
Comment 12 Javen O'Neal 2016-09-20 08:03:02 UTC
Applied in r1761537. Will be included in POI 3.16 beta 1. Thanks for the patch!

I left a couple comments in the code which need clarification:
In XSSFPivotCacheDefinition:
Is the pivot table name case sensitive?
http://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFPivotCacheDefinition.java?annotate=1761537&pathrev=1761537#l148

In XSSFSheet:
Can we further simplify this to AreaReference.formatAsString() to avoid duplicating cell reference formatting and parsing code?
http://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFSheet.java?annotate=1761537&pathrev=1761537#l4180