Bug 47757 - Example code XLSX2CSV using event API
Summary: Example code XLSX2CSV using event API
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.5-dev
Hardware: All All
: P3 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2009-08-28 04:15 UTC by Chris Lott
Modified: 2009-11-16 09:31 UTC (History)
0 users

Zip archive with 2 java classes (7.00 KB, application/zip)
2009-08-28 04:15 UTC, Chris Lott
Excel 2007 sheet with tiny shared-strings table that triggers failure (9.47 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2009-11-13 05:01 UTC, Chris Lott
Patch that incorporates Eric Smith's fix to the problem he discovered. (4.39 KB, patch)
2009-11-13 05:36 UTC, Chris Lott
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Lott 2009-08-28 04:15:33 UTC
Created attachment 24183 [details]
Zip archive with 2 java classes

I would like to offer a program that converts an XLSX workbook to CSV for
possible inclusion in the POI examples area.  It uses a SAX parser so should be
able to convert very large files.  The program depends on POI 3.5 beta 6.

The source is available under the Apache license.  I listed myself as the
copyright owner, but I'm ok with donating it if that is what's required.

Please see the attachment.  Comments and improvements are very welcome.  Thanks
for considering it.
Comment 1 Yegor Kozlov 2009-09-06 05:07:40 UTC
Added to org.apache.poi.xssf.eventusermodel.* in r811816:


I made a few tweaks:

 - use the correct ASF header, all files in the POI project use the same license header.
 - removed log4j initializers, they are not needed any longer
 - POI examples are self-contained, I turned ReadonlySharedStringsTable into a inner class in XLSX2CSV

Other than that, very cool.

Comment 2 Chris Lott 2009-11-13 05:01:52 UTC
Created attachment 24528 [details]
Excel 2007 sheet with tiny shared-strings table that triggers failure

Eric Smith reports a bug in my ReadonlySharedStrings class that is revealed by the attached spreadsheet.   He writes:

Eric Smith> Please see attached. The file has two seperate merged areas; 
Eric Smith> the one which has formatted text exposes the bug while the one 
Eric Smith> that contains just plain text works fine with the original code. 

The input has a single text cell that shows as one word in bold and two words plain.  The XML version has two "t" elements.  The initial symptom of the failure is an ArrayIndexOutOfBounds exception.  The exception is triggered when my code treats the two "t" elements as separte entries; it needs to merge them instead.
Comment 3 Chris Lott 2009-11-13 05:36:08 UTC
Created attachment 24529 [details]
Patch that incorporates Eric Smith's fix to the problem he discovered.
Comment 4 Chris Lott 2009-11-13 05:38:26 UTC
I am reopening this bug so that it pops up in Yegor's queue.  Please test the patch I attached for inclusion in the example program.  Kudos to Eric Smith for finding the problem and providing a fix.
Comment 5 Yegor Kozlov 2009-11-16 09:31:44 UTC
patch applied in r880864