Bug 30635

Summary: HSSFRow.getLastCellNum() is out by one
Product: POI Reporter: Bob Kemp <apache>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED WONTFIX    
Severity: major CC: cpuidle, lukaszr, urs.frei
Priority: P3    
Version: 2.5-FINAL   
Target Milestone: ---   
Hardware: All   
OS: other   
Attachments: Example of a XLS file who returns not enough rows

Description Bob Kemp 2004-08-12 20:44:32 UTC
I'm just starting with POI and wrote a simple program to read a spreadsheet.
The attached program displays incorrect values for HSSFRow.getLastCellNum().
It is too large by 1.  The problem appears to be in RowRecord where 
field_3_last_col has a "// +1" comment by it but that is too deep for me.

The spreadsheet came from US gov statistics and can be supplied upon request, 
however it is several megabytes.

(BTW, you don't have NetBSD as OS in bugzilla)

Bob

import java.io.*;

import org.apache.poi.hssf.usermodel.*;

public class EG {

    public static void main(String[] args)
                            throws IOException {
        FileInputStream fis = new FileInputStream(args[0]);

        HSSFWorkbook wb = new HSSFWorkbook(fis);

        System.out.println("#sheets = " + wb.getNumberOfSheets());

        HSSFSheet sheet = wb.getSheetAt(1);

        System.out.println("#rows = " + (1+sheet.getLastRowNum()));

        HSSFRow row = sheet.getRow(9);
        short firstCellNum = row.getFirstCellNum();
        short lastCellNum = row.getLastCellNum();

        System.out.println("first col = " + firstCellNum);
        System.out.println("last col  = " + lastCellNum);

        System.out.println();

        for (short j=firstCellNum; j < lastCellNum; j++) {
            HSSFCell cell = row.getCell(j);
            String str = null;

            switch (cell.getCellType()) {

                case HSSFCell.CELL_TYPE_STRING:
                    str = cell.getStringCellValue().trim();
                    break;

                case HSSFCell.CELL_TYPE_NUMERIC:
                    str = String.valueOf(cell.getNumericCellValue());
                    break;

            }

            System.out.println("cell[" + j + "] = " + str);
        }

    }
}
Comment 1 Bob Kemp 2004-08-17 12:10:04 UTC
Never report a bug when you're tired :-)
Sorry.
Comment 2 Bob Kemp 2004-08-18 11:13:37 UTC
Nor should one cancel a bug report when tired.
Changing the line

        for (short j=firstCellNum; j < lastCellNum; j++) {

into

        for (short j=firstCellNum; j <= lastCellNum; j++) {

causes a NullPointerException on the last iteration at line 33, ie 

        switch (cell.getCellType()) {

The final value printed out (on the penultimate iteration)
is actually the final value in the row.

Thus HSSFRow.getLastCellNum() returns a value that is out by one.

Bob
Comment 3 MKA 2004-09-20 08:16:42 UTC
This bug was set to invalid twice. Can anybody please explain why? I have 
currently the same problems and would like to know how I should handle this.

Calling row.getCell(row.getLastRowNum()) returns null. And my test sheet has 
only one row with three columns. But the method returns 3 instead of the 
expected 2 (as the javadoc says that getLastRowNum() is 0-based).
Comment 4 Bob Kemp 2004-09-20 09:17:24 UTC
The day after submitting the report, I looked at it again and thought I had been 
wrong.  Later, when I started using the POI code again, it became clear that the 
bug was real and so I reopened it and added further explanation to show how to 
definitively prove the bug's existence.

Sorry for the confusion this caused.  Tired at the terminal :-)
Comment 5 Andre Schild 2004-12-13 16:10:41 UTC
Created attachment 13742 [details]
Example of a XLS file who returns not enough rows
Comment 6 Andre Schild 2004-12-13 16:11:15 UTC
We too see this behaviour on SOME xls files. We have attached such a problematic
xls file.
Comment 8 Andreas Goetz 2006-06-13 16:35:47 UTC
I can confirm this issue- just experienced myself. Is there any progress on this?
Comment 9 Sanjay Madhavan 2006-07-26 12:06:01 UTC
We are seeing the same problem with getLastCellNum() being off by one when the
file is created by POI. If we open and save the file in excel the count is then
correct.

/sanjay
Comment 10 Andreas Goetz 2006-09-05 13:08:02 UTC
The issue has been open for more than two years now and still present in latest
3.0 prereleases. Is there any hope for an answer/ fix?
Comment 11 Andreas Goetz 2006-09-05 13:10:20 UTC
I believe bugs 30671 and 23954 may be duplicates of this.
Comment 12 Andreas Goetz 2006-09-05 13:15:36 UTC
*** Bug 30671 has been marked as a duplicate of this bug. ***
Comment 13 Andreas Goetz 2006-09-05 13:17:14 UTC
As workaround, you could use the row iterator to loop over the rows in a
spreadsheet. The iterator does not seem to experience the issue.
Comment 14 Andreas Goetz 2006-09-05 13:26:50 UTC
*** Bug 39921 has been marked as a duplicate of this bug. ***
Comment 15 Jason Height 2006-09-11 11:31:30 UTC
This is/was a documentation bug. The return from getLastCellNum actually returns
the last cell number + 1. Basically this is returning exactly what is in the
internal excel record structure.

I have corrected SVN to include a more appropriate javadoc comment.

Feel free to repoen if i have missed something.

Jason 
Comment 16 Andreas Goetz 2006-09-11 13:16:48 UTC
Jason,

I think there is indeed more. We're seeing to different kinds of files. Please
check bug 39921, too.

If just the XLS record structure is returned, the error may be in creating the
files. I remember reading in one of the other bugs that POI-created files
indicate wrong max row/cell numbers.

Maybe this is the issue we're seing here?
Comment 17 Jason Height 2006-09-11 21:42:42 UTC
quite possibly poi has been writing them incorrectly. Ill look into it. Do you
know the bug id where the last row/col numbers were written incorrectly.

Jason
Comment 18 Andreas Goetz 2006-09-12 07:01:17 UTC
I had previously marked that as duplicate- it's Bug 30671.

Please let me/us know,
Andi
Comment 19 Andreas Goetz 2006-10-04 01:48:45 UTC
I wrote a small test program to analyze the issue (see attachment) and found 
the following (unexpected) result:
0) when XLS file has 0 lines, both getFirst- and getLastRowNum return 0. This 
would seem to indicate that for reading them you'd need to loop from 
frist..last-1.
1) when XLS file has 1 lines, both getFirst- and getLastRowNum still return 0. 
This is confusing- as it would seem to indicate that for reading them you'd 
need to loop from frist..last instead of last-1.
2) when XLS file has 2 lines, both getFirstRowNum still returns 0 while 
getLastRowNum return 1. This is consistent with the finding in 1).

These results are the same when test is executed against files produced by poi 
or Excel 2003 either. I did not validate the bahaviour for columns but suspect 
they might be the same.

The only sensible conclusion I can draw is that getLastRowNum on XLS files is 
not a reliable indicator if the last row really exists. Applications need to 
read up to getLastRowNum and verify if row!=null to handle the case of zero 
total rows.
If this is true, then hte bug- if any- might be in the documentation of the 
getLastRowNum function's behaviour?
Comment 20 Andreas Goetz 2006-10-19 02:16:52 UTC
Jason, any idea?

Thanks,
Andi
Comment 21 Andreas Goetz 2006-12-08 08:00:16 UTC
Any chance to look at this? Thanks!
Comment 22 Andy Oliver 2006-12-09 01:57:49 UTC
this has actually been explained many many times.  The +1 behavior is from the
file format itself.  It is SUPPOSED to be +1.
Comment 23 Andreas Goetz 2006-12-09 02:49:32 UTC
Andy,
I know the explanation, thank you. What I don't understand how, in this case, it
is possible to distinguish between a file with 0 and with 1 line filled. In both
cases the function will return 0?
As for being persistent on this, I was referring to comment 17.
Comment 24 Alex Jacoby 2007-09-29 11:16:52 UTC
New POI user here.  I'm hitting the same problem: I get one result when the file
was last written by POI, and another result if the file was last written by
Excel.  Reading the comments below it looks like the workaround is to not use
the getLastCellNum method, in which case a reasonable fix would involve at the
very least mentioning the problem in the javadocs, and perhaps deprecating it
until the problem is fixed.
Comment 25 Andreas Goetz 2007-09-29 13:25:07 UTC
I agree. Unfortunately nobody is supporting the non-POI-intern users to really
understand the "correct way of doing things"...
Comment 26 Andreas Goetz 2008-06-16 12:24:44 UTC
Regarding commment 22- if you read comment 19 carefully you'll realize that the behaviour is inconsistent, regardless if the behaviour of getLastRomNum() is +1 or +0. 
Comment 27 Nick Burch 2008-07-10 15:40:04 UTC
There appears to be two things here that people are debating:

HSSFRow.getLastCellNum()

HSSFSheet.getLastRowNum()

For those confused about HSSFSheet.getLastRowNum(), the trick when you get zero back is to also check getPhysicalNumberOfRows. If this is zero, you have no rows on the sheet. If this is one, you have a single row at position zero.

I have updated the javadocs to make this clearer
Comment 28 Nick Burch 2008-07-10 15:45:13 UTC
For those asking about HSSFRow.getLastCellNum(), this is also working just as described in the javadocs. I have added a new test to svn trunk, hssf.usermodel.TestBugs#test30635() . This runs through adding rows, checking row numbers, adding cells, checking cell numbers. In all cases, the answers returned by POI exactly match what the javadocs say you will get. So, everything seems to be working correctly with POI
Comment 29 Andreas Goetz 2008-07-10 23:38:36 UTC
Thank you Nick- getPhysicalNumberOfRows was the missing piece of the puzzle. I really appreciate your explanation!

Wouldn't it be nice to have a getLogicalNumerOfRows that has the logic you've descibed. Or we'll need to start using row iterators which work in all cases.