Bug 54228

Summary: sheet.rowIterator() not grabbing content out of xlsx 2010 file
Product: POI Reporter: James <james.clendennen>
Component: XSSFAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Bug Depends on:    
Bug Blocks: 53672    
Attachments: File that won't convert correctly

Description James 2012-11-30 20:34:09 UTC
Created attachment 29658 [details]
File that won't convert correctly


When trying to use the POI class to grab content from the attached xlsx sheet, the sheet.rowIterator() method is returning rows with no content.

For example:


Row      row        = itrRow.next();

When iterating through the cells of "row", everything is blank and it seems to be coming from the sheet.rowIterator returning a row without the content.  Thanks
Comment 1 Mark B 2012-12-01 07:41:59 UTC
I have just had a look ay your file and would like to ask which rows are reporting they contain nothing? I suspect that these will be the styled rows that do not contain any cells. The reason why you are seeing POI report these is that Excel will still create an entry for that row in the xml markup for the sheet so that it can record the style information for it.

Can you check this please by printing out the numbers of the 'problematic' rows and then comparing them to your source file.
Comment 2 Yegor Kozlov 2012-12-04 10:31:20 UTC
Please try with the latest build form trunk. POI-3.7 is two years old and many bugs have been fixed since then.
Comment 3 James 2012-12-04 15:24:58 UTC

Thanks for the assistance.  I made a mistake before and we are already using 3.8.  Sorry for any confusion.  The only row it returns anything for in "midcon-pjm" sheet is the one with the date on it.  It doesn't return anything for rows 10 - 87.  Let me know if you need any more information and thanks again.
Comment 4 Mark B 2012-12-05 07:49:42 UTC
Ah, I understand now after assuming that you were asking why the Iterator would returned rows that you did not expect to see; those otherwise blank styled rows on your sheet. Instead, it seems to be pretty much the whole of the populated portion of the sheet that is not returned to you. Will take a look later.
Comment 5 Yegor Kozlov 2012-12-05 12:23:48 UTC
The cells that are not read are missing the R attribute (A1-style cell reference). When POI reads worksheet data it collects cells into a map keyed by columnIndex and the column index is derived from the R attribute. If R is missing then columnIndex is 0 and all such cells are put into the same map entry.

Here is sample xml:

    <c r="A1">

The first cell is read and the second is not. Worse, it overwrites the previosly read A1. 

Excel always writes R so POI assumed it is always present. It appears it is not so. Excel resolves cells with missing R relative to the leftmost cell. That is, if R is missing and the previously read cell is A2 then the R is A3, etc.
In the example above the cell is resolved as A2 because the previous cell is A1. 

I committed the fix in r1417379, you are welcome to try with the latest build from trunk. A link to daily builds is on the POI web site .