Bug 47757

Summary: Example code XLSX2CSV using event API
Product: POI Reporter: Chris Lott <apache7>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P3    
Version: 3.5-dev   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Zip archive with 2 java classes
Excel 2007 sheet with tiny shared-strings table that triggers failure
Patch that incorporates Eric Smith's fix to the problem he discovered.

Description Chris Lott 2009-08-28 04:15:33 UTC
Created attachment 24183 [details]
Zip archive with 2 java classes

I would like to offer a program that converts an XLSX workbook to CSV for
possible inclusion in the POI examples area.  It uses a SAX parser so should be
able to convert very large files.  The program depends on POI 3.5 beta 6.

The source is available under the Apache license.  I listed myself as the
copyright owner, but I'm ok with donating it if that is what's required.

Please see the attachment.  Comments and improvements are very welcome.  Thanks
for considering it.
Comment 1 Yegor Kozlov 2009-09-06 05:07:40 UTC
Added to org.apache.poi.xssf.eventusermodel.* in r811816:

http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/XLSX2CSV.java

I made a few tweaks:

 - use the correct ASF header, all files in the POI project use the same license header.
 - removed log4j initializers, they are not needed any longer
 - POI examples are self-contained, I turned ReadonlySharedStringsTable into a inner class in XLSX2CSV

Other than that, very cool.

Thanks,  
Yegor
Comment 2 Chris Lott 2009-11-13 05:01:52 UTC
Created attachment 24528 [details]
Excel 2007 sheet with tiny shared-strings table that triggers failure

Eric Smith reports a bug in my ReadonlySharedStrings class that is revealed by the attached spreadsheet.   He writes:

Eric Smith> Please see attached. The file has two seperate merged areas; 
Eric Smith> the one which has formatted text exposes the bug while the one 
Eric Smith> that contains just plain text works fine with the original code. 

The input has a single text cell that shows as one word in bold and two words plain.  The XML version has two "t" elements.  The initial symptom of the failure is an ArrayIndexOutOfBounds exception.  The exception is triggered when my code treats the two "t" elements as separte entries; it needs to merge them instead.
Comment 3 Chris Lott 2009-11-13 05:36:08 UTC
Created attachment 24529 [details]
Patch that incorporates Eric Smith's fix to the problem he discovered.
Comment 4 Chris Lott 2009-11-13 05:38:26 UTC
I am reopening this bug so that it pops up in Yegor's queue.  Please test the patch I attached for inclusion in the example program.  Kudos to Eric Smith for finding the problem and providing a fix.
Comment 5 Yegor Kozlov 2009-11-16 09:31:44 UTC
patch applied in r880864

Thanks,
Yegor