using last night's build reading an xls file, row.getLastCellNum() returns one over. For example, if there is only one cell, the first will be 0, and the last will be 1, which makes 2 alltogether. If it is by design, it should be documented. with this particular xls file (attached), it also reads non-existent rows. Code follows: public static final void main(String[] args) { if(args==null || args.length==0) { System.out.println("enter name of excel file"); return; } String filename = args[0]; try { HSSFWorkbook w = new HSSFWorkbook(new FileInputStream(filename)); for (int sheet = 0; sheet < w.getNumberOfSheets (); sheet++) { String sheetName; HSSFSheet s = w.getSheetAt(sheet); // get the sheet from the workbook sheetName = w.getSheetName(sheet); // get the name of the sheet int firstRowNumber= s.getFirstRowNum(); int lastRowNumber= s.getLastRowNum(); int rowCount = s.getPhysicalNumberOfRows(); System.out.println ("sheet "+sheetName+" first row "+ firstRowNumber+", last row "+ lastRowNumber); System.out.println ("sheet "+sheetName+" has "+ rowCount+" rows"); if (rowCount==0) { continue; } for (int row=firstRowNumber; row<=lastRowNumber; row++) { HSSFRow r = s.getRow(row); if (r==null) // if there is nothing in a row, it comes back as null { continue; } HSSFCell c = null; int firstCellNumber= r.getFirstCellNum(); int lastCellNumber= r.getLastCellNum(); int colCount= lastCellNumber - firstCellNumber+1; System.out.println ("row "+row+": first cell number: "+firstCellNumber+ ", lastCellNumber: "+lastCellNumber + ", column count: "+colCount); String namesArray[] = new String[colCount]; for (int cell=firstCellNumber; cell<=lastCellNumber; cell++) { //System.out.println("row "+row+", cell "+cell); String cellValue = null; int nameIndex = lastCellNumber-cell; namesArray [nameIndex] = makeName(cell); String cTitle = namesArray[nameIndex]; c = r.getCell ((short)cell); if (c!=null) // if there is nothing in a cell, it comes back as null { //trace (XDState.DEBUG,"Excel Preparser: row "+row+", cell "+cell+": "+c+",type "+c.getCellType()); switch (c.getCellType()) { case (HSSFCell.CELL_TYPE_NUMERIC): { double d = c.getNumericCellValue(); DecimalFormat formatter = new DecimalFormat(); // default rounds to 3 decimal digits cellValue = formatter.format(d); break; } case (HSSFCell.CELL_TYPE_BOOLEAN): { cellValue = new Boolean(c.getBooleanCellValue()).toString(); break; } case (HSSFCell.CELL_TYPE_STRING): { cellValue = c.getStringCellValue(); break; } case (HSSFCell.CELL_TYPE_FORMULA): { double d = c.getNumericCellValue(); DecimalFormat formatter = new DecimalFormat(); // default rounds to 3 decimal digits cellValue = formatter.format(d); break; } case (HSSFCell.CELL_TYPE_ERROR): { cellValue = Byte.toString(c.getErrorCellValue()); break; } case (HSSFCell.CELL_TYPE_BLANK): { cellValue = c.getStringCellValue(); break; } default: { cellValue = "Unknown cell type:"+c.getCellType(); } } System.out.println("cell at row "+row+", col "+cell+", value "+cellValue); } } } } } catch(Exception e) { System.out.println(e+"; "+e.getMessage()); e.printStackTrace(); } } Thanks,
Created attachment 2945 [details] excel file
cool can you also attach the code?
Wow this was a good documentation catch. I looked through and was baffled at first. As it turns out this is the intended behavior, but we should consider changing it. I had to look it up in my well worn copy of the "Microsoft Excel Developer's Kit" on page 303 with some scribbling of mine in the margins. Why is it like this? Well originally these values were provided from the DimensionsRecord which has two fields that use "microsoft math" (off by one like their buffer checking in the C# JIT ;-)) meaning they are one greater than what they should be. As it stands now it looks like we are calculating this field. It would be good if you could raise this on the dev list (I'll provide the reference and history if they miss this). Basically: should we just document this or is it worth breaking some backward compatibility (in a fairly non obvious way) to make this more intuitive. I'm not sure. I'm open to the thoughts of the developers and users. This is a fairly minor issue but probably a pretty irritating usability one.
Created attachment 2952 [details] test case for excel file previously attached. picks up cells at rows 9,10,11
IS ANYBODY WORKING ON THIS? I understand that the doc has to change for row.getLastCellNum() to indicate that the number returned is the index of the NEXT UNUSED cell. The second part of this problem report, however, seems to have escaped everybody's attention, and that is: for the attached excel file, extra (empty, unused) rows are detected. Thank you.
Patience! This is a project of volunteers and consultants like me who give paying customers first crack (mortgage to pay and mouths to feed). Anyhow this part isn't a bug: recordid = 0x203, size =14 [NUMBER] .row = 8 .col = 3 .xfindex = 19 .value = 0.6666666666666666 [/NUMBER] ============================================ Offset 0xef6 (3830) recordid = 0x201, size =6 [BLANK] row = 9 col = 2 xf = 18 [/BLANK] ============================================ Offset 0xf00 (3840) recordid = 0x201, size =6 [BLANK] row = a col = 2 xf = 18 [/BLANK] ============================================ Offset 0xf0a (3850) recordid = 0x201, size =6 [BLANK] row = b col = 2 xf = 18 [/BLANK] See those? They're cells. You perceve them to be blank in the GUI but "BLANK" is a *real* cell with a *real* row. BLANKS sometimes have styles attached to them that look different than the surrounding style (for instance a cell with a border but no value). To see what I mean do java org.apache.poi.hssf.dev.BiffViewer myfile.xls It will dump at text representation of the binary structure to sysout. You might want to redirect the output to a file. hope that helps.
thanks, i understand about blank cells, i get the same results when i use hssf to read these cells, however, when i open the file in excel, these cells don't look any different to me than other ones which are really empty(unused). i cannot detect any styles attached to them or anything. in fact, when the excel file was created, there was nothing done to those cells. could it be related to the ungodly format of the corresponding cells in the preceeding row? unfortunately, i am not familiar with the record format, what is xf in your dump? my problem (or my qa's problem) is that as far as common knowledge, there is NOTHING in these cells, but when read with HSSF, they show blank data.
Right so you're saying the correct behavior is to ignore the ones you can't see visibly in Excel? Why excel created these? Who knows. However they are there and HSSF shows you what is really in the file NOT what you see in Excel. My suggestion is to create a utility function that ignores rows with only blank cells and wrap HSSF. Regardless its not a bug in HSSF to report these rows with blank cells. That is the correct behavior. HSSF goal is to abstract you from the nasty details of the file format as much as possible while giving you access to all the data. However its also intent to match the ACTUAL structure and not the perceived structure of the file. Sometimes the two differ.