Bug 59677

Summary: HSSFWorkbook#setSheetOrder corrupts cross-sheet references when there are complicated external sheet indices
Product: POI Reporter: Matthew Wightman <msww-asfbugs>
Component: HSSFAssignee: POI Developers List <dev>
Status: NEW ---    
Severity: major    
Priority: P2    
Version: 3.14-FINAL   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Matthew Wightman 2016-06-09 08:57:39 UTC
As mentioned in Bug 59673, HSSFWorkbook#setSheetOrder is passing internal sheet
indexes in to FormulaShifter's constructor, but these numbers are used to
update the indices of referenced external sheet records. This can corrupt
references to sheets when the EXTERNSHEET records do not exactly correspond
with the internal sheets, such as if there has been a multiple-sheet range.

For example:

===========
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFCell;

public class MakeBad {

  public static void main(String ... args) throws Exception {
    HSSFWorkbook workbook = new HSSFWorkbook();
    HSSFSheet originalA = workbook.createSheet("A");
    workbook.createSheet("B");
    workbook.createSheet("C");

    originalA.createRow(0).createCell(0).setCellFormula("SUM(B:C!A1:B5)");

    workbook.removeSheetAt(0);

    HSSFSheet newA = workbook.createSheet("A");

    HSSFCell cell = newA.createRow(0).createCell(0);
    cell.setCellFormula("C!A2");
    System.out.println("Before sheet move:");
    System.out.println("  cell formula=" + cell.getCellFormula());
    describeExternalIndices(workbook);

    workbook.setSheetOrder("A", 1);

    System.out.println("After sheet move:");
    System.out.println("  cell formula=" + cell.getCellFormula());
    describeExternalIndices(workbook);
  }

  private static void describeExternalIndices(HSSFWorkbook workbook) {
      System.out.printf("  Sheet %s has internal index %d and external index %d%n", workbook.getSheetName(i), i, workbook.getExternalSheetIndex(i));
    }
  }
}
===========

outputs

===========
Before sheet move:
  cell formula=C!A2
  Sheet B has internal index 0 and external index 1
  Sheet C has internal index 1 and external index 2
  Sheet A has internal index 2 and external index 4
After sheet move:
  cell formula=B!A2
  Sheet B has internal index 0 and external index 1
  Sheet A has internal index 1 and external index 2
  Sheet C has internal index 2 and external index 4
===========

This is because the external index in the formula before the move (2) is the
same as the internal index (1) of the moved sheet, so
FormulaShifter#adjustPtgDueToSheetMove has changed the external index to match
the new internal index of the moved sheet (1). Note also that the external
index records have not been reordered or changed, so the mapping of internal
index to external index remains unchanged.

I think the way to fix this is to change how sheet reorders are handled just in
the case that the external index records are not just the trivial n ascending
internal indexes. When this happens, it is not possible to correct the
references by updating all the Area/Ref3DPtgs as the records for multiple sheet
ranges would be pointing at the wrong set of sheets. Instead, rather than
updating those indices, the external sheet records would need updating with the
new internal sheet ids.

While this approach could also be followed in the case that the external index
records are the trivial case, that would result in the indices getting out of
sync under circumstances that currently both POI and popular spreadsheet
software does not do so.

A complication here is that such a sheet reorder can change the meaning of a
reference. For example, consider a workbook with sheets A, B, and C in that
order, with a formula "=SUM(A:B!A1:B5)" on C. Moving sheet C between A and B
will result in that range including C, while previously it did not. However, I
think this would be the least surprising behaviour POI could have in this case,
as it would maintain the text of the cell formula unchanged. This is also the
behaviour of a couple of common spreadsheet programs that I have checked while
investigating this.
Comment 1 Nick Burch 2016-06-09 18:12:57 UTC
For a Workbook with sheets A, B and C, if someone creates a formula covering a range of sheets A and Bm then re-orders C before B, then having it cover A+C+B seems logical to me

I guess that setSheetOrder should do something for the external sheet records too. Would the delete sheet case need to do so as well, or is that already covered?

As a first step, maybe it would be good to go through the code you've been looking at, and identify any places where internal and external sheet indexes get incorrectly treated as equivalent, then add TODOs / Warnings there. Also in methods where there's a risk of confusion, rename the variables to things like internalSheetIndex and externalSheetIndex to make it clearer