Bug 45672 - LastCellOfRowDummyRecord is returned multiple times per row
Summary: LastCellOfRowDummyRecord is returned multiple times per row
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: unspecified
Hardware: PC Windows XP
: P2 normal with 6 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2008-08-22 06:33 UTC by Ian Beaumont
Modified: 2009-11-08 06:48 UTC (History)
0 users

Spreadsheet containing issue (20.50 KB, application/octet-stream)
2008-08-22 06:33 UTC, Ian Beaumont
Spreadsheet containing issue (17.00 KB, application/x-msexcel)
2008-08-22 09:38 UTC, Ian Beaumont
Proposed patch to fix issue (1.91 KB, patch)
2008-08-22 09:46 UTC, Ian Beaumont
Details | Diff
execution trace of my code applied to Chris's .xls file (83.39 KB, application/octet-stream)
2009-06-02 17:18 UTC, Adam Pingel
a reworked XLS2CSVmra that I'm using to create the trace (11.69 KB, application/octet-stream)
2009-06-02 17:19 UTC, Adam Pingel
Two-row spreadsheet that triggers LsatCellOfDummyRowRec failure (24.50 KB, application/vnd.ms-excel)
2009-11-08 06:45 UTC, Chris Lott

Note You need to log in before you can comment on or make changes to this bug.
Description Ian Beaumont 2008-08-22 06:33:22 UTC
Created attachment 22473 [details]
Spreadsheet containing issue

I'm trying to convert a Excel spreadsheet to CSV based on something similar to the example XLS2CSVmra.java.

This is using the Event API.  For some reason on certain rows of my spreadsheet I get multiple LastCellOfRowDummyRecord records per row.  These happen at different points doing the processing of the row.  
For example row 16 returns LastCellOfRowDummyRecord after processing column 12, column 18 and column 31.

Problem Spreadsheet attached.

Tested in 3.1 and 3.5 beta 1
Comment 1 Ian Beaumont 2008-08-22 09:38:50 UTC
Created attachment 22475 [details]
Spreadsheet containing issue

This is a smaller/simpler spreadsheet showing the problem
Comment 2 Ian Beaumont 2008-08-22 09:46:48 UTC
Created attachment 22476 [details]
Proposed patch to fix issue

On investigating the problem it seems that the issue is because the spreadsheet contains SharedFormulaRecord and the MissingRecordAwareHSSFListener seems to generate a LastCellOfRowDummyRecord for the row every time it hits one of these records.

I've included a patch which updates the MissingRecordAwareHSSFListener to ignore any SharedFormulaRecord and this seems to fix the issue and the spreadsheet is now processed correctly.  However I don't have a great understanding of POI so whether what I've done is valid I'm not sure.  Certainly has fixed my issue.
Comment 3 Josh Micich 2008-08-23 15:58:01 UTC
I think it is not quite correct to completely ignore the SharedFormulaRecord.

Applied patch with some simplifications in svn r688426

Added junit
Comment 4 Chris Lott 2009-06-02 12:33:39 UTC
I am using the XLS2CSVmra example with POI 3.5beta5 and today I hit a XLS (2003 format) spreadsheet that triggers this exact problem.  The sheet is very wide but only has 2 rows.  The first row has headers and is read/reported perfectly.  The second row has data and becomes *9* rows in the output.  I am using Excel to look at the data in the cells that become the last item in the output.  One is a date, one is a number stored as a string; neither appears to be a formula.  Right now the data is proprietary so I cannot upload it.  I am happy to investigate further, like using the debugger, but don't know what evidence you need.
Comment 5 Adam Pingel 2009-06-02 15:48:38 UTC
Hello.  I noticed Chris's earlier note to the user mailing list and then followed the bug here.  I'm seeing the same issue.  I attached a .java and a trace of the program as applied to Chris's sample .xls file to that message.
Comment 6 David Fisher 2009-06-02 16:03:38 UTC
Hi Adam,

Thanks. If possible, try a recent nightly build first.

BTW - your attachments didn't make it through to the list with your email. Please attach the files to this issue. Hopefully someone will look into it in the near future.

Comment 7 Adam Pingel 2009-06-02 17:15:51 UTC
I tried again with poi-3.5-beta7-20090602.jar and poi-ooxml-3.5-beta7-20090602.jar.  I will attach my .java and the resulting log.
Comment 8 Adam Pingel 2009-06-02 17:18:06 UTC
Created attachment 23744 [details]
execution trace of my code applied to Chris's .xls file
Comment 9 Adam Pingel 2009-06-02 17:19:22 UTC
Created attachment 23745 [details]
a reworked XLS2CSVmra that I'm using to create the trace
Comment 10 Chris Lott 2009-06-23 04:58:01 UTC
I reopened this bug -- I hope that's acceptable to the POI committers.  Adam and I have provided fairly convincing evidence and attached it to this bug report.  I hope this isn't too hard to fix, and that maybe the fix will be delivered in time for the 3.5 final release.  Thanks for listening.
Comment 11 kirklib 2009-10-19 07:16:43 UTC
Was very happy to discover this library, and then quickly disappointed by this bug which is a fairly essential necessity for such a program.

I have detailed the problem I have on the mailing list:

An event is thrown signalling the end of a row whenever it encounters various blank cells in my data.
Comment 12 Nick Burch 2009-11-03 14:48:43 UTC
Ah, I'd forgotten that you can have records that cover multiple cells (MulBlankRecord and MulRKRecord). MissingRecordAwareHSSFListener wasn't handling these properly, and probably nor was anyone's code...

I've updated MissingRecordAwareHSSFListener to expand these out into individual records, so hopefully it'll behave itself better now
Comment 13 Chris Lott 2009-11-08 06:45:06 UTC
Created attachment 24504 [details]
Two-row spreadsheet that triggers LsatCellOfDummyRowRec failure

I sent this spreadsheet to Adam Pingel but forgot to attach it to this bug.  Running XLS2CSVmra + POI 3.5 FINAL on this input yields 3 rows (expected 2).
Comment 14 Chris Lott 2009-11-08 06:48:52 UTC
Nick Burch, thank you very much for the patch.  It works for me!  I tested the nightly build POI-3.6-beta1-20091108.jar from encore.torchbox.com together with the XLS2CSVmra program on the two-row spreadsheet that is attached to this bug.  The failure no longer happens.  POI 3.5 final yields 3 rows, POI 3.6 beta yields 2 rows, which is what I expect.