When using the HSSFRow.cellIterator to traverse through a document, the column information is in reverse-sequential order. For example, if iterating through a document with data in two rows and three columns, the data will be in this order: (0,2),(0,1),(0,0),(1,2),(1,1),(1,0). The HSSFSheet.rowIterator properly iterates through the data in forward- sequential order. I duplicated this bug in the 1.5 release and the 1.6 build release.
There is no contract guaranteeing the order. Furthermore, they can appear in any order in the underlying file format.
If there is an implied ordering of the cells (a number that can be retrieved from getCellNum()), why wouldn't the cellIterator() method return the rows in that order? It seems inconsitent at best since the rowIterator does return (at least in my example) the rows in the implied order that they exist in the spreadsheet. The documentation should reflect the fact that the *Iterator routines will return the results in random order.
The implied ordering is "whatever is in the file" or some variant of "whatever was most efficient to store". This is where the rubber meets the road. While I realize it can be inconvienient for the user to reorder, its far more efficient then us ordering them in a particular order. If they are precisely in reverse because of something we're doing, feel free to submit a patch, but I'm against enforcing any contract as to the order. Your point about the documentation is well taken, please submit a patch and I'll apply it against the head. (2.0)
(if patch is provided please reopen)
I respectfully disagree with the decision to close this bug. It just makes sense to have the cellIterator() return the Iterator in the correct forward order. This method could be very convenient, but if the programmer has to reorder it, it's pretty much useless. I believe this is happening because HashMap was used. Couldn't a different data structure be used instead? Can we please keep this one open for a while and let some folks vote on it? thanks, Barry
Sure. You can leave it open and please feel free to vote (if enough people feel that way and I think they are making an INFORMED vote I/other commiters may change my/our mind). I'm retargeting to 2.0 because there is like NO way we're backporting such changes into 1.5.1 (behavioral/feature-oriented,etc). However, the fact we're using a HashMap will change in 3.0 and instead we'll probably return them in the order you suggest just due to HOW we'll be storing it. I just don't want to guarantee order in this interface because it could change and the file format itself might effect it. Personally, I think you're suffering from file-format API versus VBA-style API confusion. The HSSF usermodel is to give you access to the file format without exposing you to certain nasty details (such as the fact that rows are completely unrelated to cells and all the little records and intricacies). VBA and Formula 1 make it look like you're using Excel (and one interfaces with Excel single-threadedly, and the other is a full implementation of Excel in Java more or less...to the tune of 10k). Its the difference between abstracting the file format to you and creating an implementation of Excel. We make this decision for performance reasons and simplicity. (Formula 1 and VBA APIs are simpler to conceieve but harder to master because there are just so freaking many of them...10 different ways to do EVERYTHING... HSSF seeks a greater conceptual simplicity. Also "convienience functions" are by [apparent] community consensus until a later release -- we're all infected with eXtremeProgramming style thought.) Besides. Just because you need the cells or rows in order, doesn't mean everyone does. Depending on what you're doing, the reactor pattern (in your own code) might help you here regardless of whether you're using the eventmodel: http://www.freeroller.net/page/acoliver/20021215#the_reactor_pattern_in_reading
Hi, I'm new to this, so please excuse me if I do anything incorrectly. I've voted for this to be changed because of the following: -> While no contract to order exists, there is certainly a logical expectation of sequence because the HSSFSheet.rowIterator() does deliver its results ordered from low to high, so why not HSSFRow.cellIterator? -> It appears easy to do - I got an ordered sequence by simply changing the HashMap cells to TreeMap (and removing the constructors initial capacity) in HSSFRow.java - only 3 lines. By the way, this will make it consistent with the TreeMap rows defined in HSSFSheet.java. If the change is declined, perhaps a compromise method (e.g. HSSFRow.orderedCellIterator() - that converts the HashMap to a TreeMap?). Cheers, Sean
As a result of the recent performance change, the storage of the HSSFCell objects was changed from a TreeMap implementation to an array based one. This has the beneficial sideeffect that the cellIterator is now in cell order. This change is available in SVN. Jason