Bug 57494

Summary: Discrepancy between using XSSFWorkbook.getSheetAt(0).getRowIterator() and XSSFWorkbook.getSheetAt(0).getRow(#)
Product: POI Reporter: Hugo Tagle <hugo.tagle>
Component: POI OverallAssignee: POI Developers List <dev>
Status: RESOLVED INVALID    
Severity: normal    
Priority: P2    
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: 10 row spreadsheet. Second row missing data
Four top rows contain data. 158 empty ones at the bottom

Description Hugo Tagle 2015-01-26 01:50:45 UTC
Created attachment 32396 [details]
10 row spreadsheet. Second row missing data

The spreadsheet attached is missing the first row of data after the header row. Creating a XSSFWorkbook(FileInputStream fis) object eliminates the empty row. The sheet rowsIterator contains only the header and the rest of valid rows in the spreadsheet. However, trying to access the rows using the sheet getRow(#) method throws a NullPointerException when trying getSheetAt(0).getRow(1).

Used a testcase to quickly show the issue:

        FileInputStream fis = new FileInputStream(new File("2nd_Row_Missing.xlsx"));
        
        Workbook wb = new XSSFWorkbook(fis);                

        logger.debug("rowIterator()");

        Iterator x = wb.getSheetAt(0).rowIterator();
        int i = -1;
        while (x.hasNext()) {
            i++;
            Row row = (Row) x.next();

            Iterator y = row.cellIterator();
            
            y.next(); // don't care for first numeric cell
            Cell cell = (Cell) y.next(); // I want the second one (string)
            
            logger.debug("row = " + i + ", cell = 1, value =  " + cell.getStringCellValue());
        }

        logger.debug("getRow()");
        logger.debug("row = 0, cell = 1, value = " + wb.getSheetAt(0).getRow(0).getCell(1).toString());
        logger.debug("row = 1, cell = 1, value = null; throws NullPointerException");
        logger.debug("row = 2, cell = 1, value = " + wb.getSheetAt(0).getRow(2).getCell(1).toString());
        logger.debug("row = 3, cell = 1, value = " + wb.getSheetAt(0).getRow(3).getCell(1).toString());
        logger.debug("row = 4, cell = 1, value = " + wb.getSheetAt(0).getRow(4).getCell(1).toString());
        logger.debug("row = 5, cell = 1, value = " + wb.getSheetAt(0).getRow(5).getCell(1).toString());
        logger.debug("row = 6, cell = 1, value = " + wb.getSheetAt(0).getRow(6).getCell(1).toString());
        logger.debug("row = 7, cell = 1, value = " + wb.getSheetAt(0).getRow(7).getCell(1).toString());
        logger.debug("row = 8, cell = 1, value = " + wb.getSheetAt(0).getRow(8).getCell(1).toString());
        logger.debug("row = 9, cell = 1, value = " + wb.getSheetAt(0).getRow(9).getCell(1).toString());


==Standard Output==
rowIterator()
row = 0, cell = 1, value =  ADDR
row = 1, cell = 1, value =  1220 STATE CAPITOL
row = 2, cell = 1, value =  121 S 3RD ST
row = 3, cell = 1, value =  1419 ADAMS BLVD
row = 4, cell = 1, value =  408 NORFOLK AVE
row = 5, cell = 1, value =  704 PROSPECT AVE
row = 6, cell = 1, value =  U S HWY 30 WEST
row = 7, cell = 1, value =  1600 GREAT PLAINS CENTRE
row = 8, cell = 1, value =  1623 HARNEY ST
getRow()
row = 0, cell = 1, value = ADDR
row = 1, cell = 1, value = null; throws NullPointerException
row = 2, cell = 1, value = 1220 STATE CAPITOL
row = 3, cell = 1, value = 121 S 3RD ST
row = 4, cell = 1, value = 1419 ADAMS BLVD
row = 5, cell = 1, value = 408 NORFOLK AVE
row = 6, cell = 1, value = 704 PROSPECT AVE
row = 7, cell = 1, value = U S HWY 30 WEST
row = 8, cell = 1, value = 1600 GREAT PLAINS CENTRE
row = 9, cell = 1, value = 1623 HARNEY ST
Comment 1 Andreas Beeker 2015-01-26 08:33:16 UTC
This is by design ... please have a look at the API:
http://poi.apache.org/apidocs/org/apache/poi/xssf/usermodel/XSSFSheet.html#rowIterator()

And the NullPointerException is thrown because ... well getRow(1) is null, as you mentioned it, so getCell(1) fails
Comment 2 Hugo Tagle 2015-01-26 14:45:30 UTC
Created attachment 32398 [details]
Four top rows contain data. 158 empty ones at the bottom
Comment 3 Hugo Tagle 2015-01-26 14:46:08 UTC
Great. Thx for the quick resolution.

As I understand then, the attached spreadsheet is 9 physical rows which is the result of 10 logical rows minus the empty one.

Why would a second spreadsheet with 4 physical rows and 158 empty ones at the bottom return 162? Should the 158 empty ones at the bottom NOT be considered physical ones as in the case where the empty one appeared in the second logical row?

wb.getSheetAt(0).getPhysicalNumberOfRows() = 162

I attached the second spreadsheet I'm referring to just in case.
Comment 4 Nick Burch 2015-01-26 16:26:44 UTC
POI tells you the values that Excel wrote into the file, you'd need to ask Microsoft if you're struggling to make sense of what POI is telling you the file claims to have...
Comment 5 Hugo Tagle 2015-01-28 14:53:58 UTC
Ok. That's fair. 

I assumed, from POI perspective, that if an empty row in the middle of a spreadsheet was not considered a physical row, empty rows at the end would not be either.

Nothing else from me about this bug.

Best regards.