Bug 53314

Summary: Invalid Formula Cell Reference for named ranges referring to whole columns
Product: POI Reporter: Pete Maher <peter.maher>
Component: XSSFAssignee: POI Developers List <dev>
Severity: critical CC: onealj
Priority: P2    
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Sample workbook with whole column named range

Description Pete Maher 2012-05-29 06:53:54 UTC
Created attachment 28847 [details]
Sample workbook with whole column named range

Where a workbook has a whole column cell reference (EG Sheet1!$A:$A) then creating an area reference will fail with the message "Invalid Formula cell reference: 'Sheet1!$A'"

The following code will fail for the sample attached:

        FileInputStream fis = new FileInputStream(sampleSpreadsheet2.getFile().getAbsolutePath());
        XSSFWorkbook wb = new XSSFWorkbook(fis);
        int namedCellIdx = wb.getNameIndex("MyColumn");
        Name aNamedRange = wb.getNameAt(namedCellIdx);
        AreaReference aref = new AreaReference(aNamedRange.getRefersToFormula());
Comment 1 Pete Maher 2012-05-29 07:18:13 UTC
Also in the code snippet previously given, the following would fail after the assignment to aNamedRange:

String sheetName = aNamedRange.getSheetName();
Comment 2 Mark B 2012-05-29 14:29:32 UTC
Just realised you posted this as a bug. Really, this is a user question and it should have been posted to the user list.

Anyway, to repeat what I said, I do not think that it is possible to do what you want with Excel and, if it is an illegal operation, POI will not sidestep rules Excel enforces.

I tried, using Excel, to create a named range using the formulae Sheet1!G and Sheet1!$G and in both cases, Excel generated an error. My advice would be to find an Excel users forum and ask how to create a named range that applies to a complete column there. If they know and can provide a formula that works, then we should be able to make POI do the same.

I am going to change the status of the bug to NEEDINFO until you can find out how to perform this operation in Excel.
Comment 3 Pete Maher 2012-05-29 15:42:21 UTC
Hi Mark B

I think you have inadvertently honed in on the area of the problem in assuming that the range reference string which I was using was that which is contained in the exception - the salient point is that it isn't, POI is confused. (OK - brave statement since I don't know POI particularly well but I'd be happy to wager a small sum that this is a bug and not a user question!).

With a whole column range aNamedRange.getRefersToFormula() correctly returns the String "Sheet1!$A:$A" - The exception in new AreaReference(aNamedRange.getRefersToFormula()) says it has a problem with "'Sheet1!$A'" which highlights the mismatching that POI appears to be doing.

Here is what to do in Excel (and the sample I uploaded was created in exactly this manner):

* New Blank Workbook
* Click on the header of column A so that the whole column is highlighted
* Define a Name for this range called "MyColumn" (This should be for example Sheet1!$A:$A)
* Put some data somewhere in column A so that the resultant AreaReference will be expected to contain something
* Save as whatever.xlsx

Amend my earlier code to refer to your saved workbook.  Execute code, bingo.

Also worth noting that multiple whole column range reference have the same problem (EG Sheet1!$A:$B).  Funnily enough whole row named ranges seem to work fine (EG =Sheet1!$5:$5), albeit I've only taken a cursory glance at those.

Comment 4 Mark B 2012-05-30 06:48:20 UTC
Thanks for the examples Peter.

Should have the chance to play this afternoon to see if I cam make POI behave as Excel does. Will post if I make any progress.
Comment 5 Pete Maher 2012-05-30 10:11:28 UTC
That's great Mark, thanks.

Whole rows also have problems although not the same exception at the same point:

XSSFWorkbook wb = new XSSFWorkbook(fis);
int namedCellIdx = wb.getNameIndex("MyRow"); //EG Sheet1!$5:$5
Name aNamedRange = wb.getNameAt(namedCellIdx);
String refersToFormula = aNamedRange.getRefersToFormula(); //Correct value returned
AreaReference aref = new AreaReference(refersToFormula);
//At this point aref.isSingleCell() == true, no matter how many cells on MyRow are populated
//And first cell and last cell both have a formatAsString() value of "Sheet1!$5"

I can raise a second bug for this if you think it is a separate issue, let me know.
Comment 6 Mark B 2012-05-30 14:16:21 UTC
Need to do some more digging, but I think the problem might lie in the constructor of the CellReference class as, and this is at first glance, it expects a cell reference to include both a column letter and row number. As far as it is concerned,only the sheet name is optional.

Re the problem with rows and the AreaReference class it sounds as though that might be a separate issue but I will need to dig around further. Must admit that I am surprised that, and this is again at first glance, the same problem does not occur. Need to look at this more though.

Looking like a good job I did not accept that wager!
Comment 7 Pete Maher 2012-05-30 15:13:11 UTC

.... you could look on it that the stake on the wager is the time it takes to fix the bug ;-)
Comment 8 Mark B 2012-05-31 08:20:06 UTC

Do not know if I am missing something very obvious and would like to ask you to run a quick test for me please.

When you call the getRefersToFormula() method on the Name object, then you would expect to get back a String that looks something like this - 'Sheet1!$A:$A' - if the whole of Column A on Sheet1 had been 'named' by the user. Then, if you pass this String to the constructor of the AreaReference class, you expect to see an exception thrown at that point.

Can I ask you to run your code again but with a single addition? Strip the sheet name away from the reference before passing that String to the AreaReference class constructor. So, rather than pass this String 'Sheet1!$A:$A', make your code pass just this please '$A:$A'. My tests indicate that it is the presence of the sheet part that causes the exception to be thrown and I will need to dig around further to nail the problem down. For now, you might be able to run your application by stripping off the sheet part - for both rows and columns hopefully as I have not tested the former yet.
Comment 9 Pete Maher 2012-05-31 10:13:07 UTC
Hi Mark

Yes your suggestion works ... with a limitation! (although this bit isn't critical for me currently as the workbooks I am working with are fairly small)

If you get all referenced cells of the new AreaReference - IE
CellReference[] crefs = aref.getAllReferencedCells();
Then crefs.length = 65536.  I was expecting 1048576.

Also crefs[i].sheetName == null should be expected given that reference passed in didn't include a sheet name, no big deal.

So your workaround will is fine so long as there is not data beyond row 65536.

I've coded my own workaround for the timebeing - a bit of  a hack but basically getting the first and last actual rows of the Sheet so I change "Sheet1!$A:$A" to "Sheet1!$A1:$A2345" (Basically I didn't like that getAllReferencedCells() would return a huge array whilst I was only using about 1300 rows within the named column range).

ASIDE:  if aref.getAllReferencedCells() were to be used on a sheet with values in the first and last rows only it would reach an out of memory pretty soon if fixed for the 65536 limit (especially if the range contained multiple columns) - Therefore could probably do with a  new AP method to get all cells *actually* in use in an area rather than referenced by the area.  This would involve iterating over all the cells in the range so would be slower than getAllReferencedCells() but for sparsely populated ranges (typically whole column and whole row ranges) is a better option than running out of resources.
Comment 10 Mark B 2012-05-31 15:53:47 UTC
Hello again Peter and thanks for your testing.

The reason why there are 'only' 65 odd thousand rows in a column is that the support for the OOXML format is not complete yet - at least the part that relates to the sort of operation you wish to perform is not. Yes, also the assumption is that you want ALL of the cells in the row or column and not just the ones that are or might be populated with data and this does have some attendant risks. Perhaps, it might be better to adopt the approach taken if you get an Iterator for the rows on a sheet or the cells on a row. There, as far as I am aware, you will only get the non null objects returned to you.

Must admit that I got a hack very similar to your working but did not post it. Just as you did, I 'modified' the forumla returned by the call to getRefersToFormula() so that it pointed to the first cell on a row/column and the last cell on the row.column. The reason I did not post it was simple, the AreaReference could be created but it would return false if I called either the isWholeColumn() or isWholeRow() method. Comments in the source indicated that passing -1 as the index of the first cell on the row or column would trigger the AreaReference to 'see' the object as a whole row or column but I could not make that work at all.

As AreaReference can understand that A:A indicates a whole row and that only the inclusion of the sheet name seems to throw it, that is where I will concentrate for now. Will leave the bug open and post if I make any further progress.
Comment 11 Dominik Stadler 2016-07-28 07:27:33 UTC
I tried to reproduce some of the issues discussed here but couldn't. Also the discussions are quite old and thus based on rather outdated versions of POI. Therefore I am closing this as WORKSFORME, please create new bug reports for anything that is still missing/incorrect now.