Bug 52835

Summary: bug in paring shared string table
Product: POI Reporter: Wu, Fan <zjuwufan>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows Server 2003   

Description Wu, Fan 2012-03-06 04:27:53 UTC
Given following shared string table,

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"/>

XSSF won't work.

It seems that we assumed that there would be 'count' and 'uniqCount' in the xml.

However if there is no 'count' and 'uniqCount', the excel file cannot be handled successfully.

File attached.
Comment 1 Yegor Kozlov 2012-03-06 11:02:11 UTC
You forgot to attach the file.

Yegor
Comment 2 Wu, Fan 2012-03-06 12:18:40 UTC
The file is about 5MB in size, I failed to upload it. Any other way to share it with you?
Comment 3 Wu, Fan 2012-03-06 12:26:14 UTC
Please got the link by following link.

http://dl.dropbox.com/u/29681633/cant_upload.xlsx
Comment 4 Nick Burch 2012-03-06 15:30:38 UTC
How was this file generated?
Comment 5 Wu, Fan 2012-03-06 15:54:35 UTC
We are building product based on POI. Our user is trying to using the attached file, but failed.

The source is unknown. Do you think it's an issue?
Comment 6 Yegor Kozlov 2012-03-06 18:19:50 UTC
(In reply to comment #0)
> Given following shared string table,
> 
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"/>
> 
> XSSF won't work.
> 


Please explain what is wrong with XSSF: does it throw an exception or returns wrong data or what? 

The latest build from trunk can read the referenced file, navigate over data and save it back to file. I don't see anything wrong on the POI side.

Yegor
Comment 7 Wu, Fan 2012-03-07 02:21:38 UTC
An exception was thrown with following code.

import java.io.IOException;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackageAccess;
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
import org.xml.sax.SAXException;


public class Test {
	public static void main(String[] args)
	{
		try {
			OPCPackage pkg = OPCPackage.open("cant_upload.xlsx", PackageAccess.READ);
			ReadOnlySharedStringsTable table = new ReadOnlySharedStringsTable(pkg);
		} catch (InvalidFormatException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (SAXException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
	}
}
Comment 8 Wu, Fan 2012-03-07 02:26:32 UTC
Callstack looks like this,

at java.lang.Integer.parseInt(Unknown Source)
	at java.lang.Integer.parseInt(Unknown Source)
	at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.startElement(ReadOnlySharedStringsTable.java:190)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.emptyElement(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$ContentDriver.scanRootElementHook(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.readFrom(ReadOnlySharedStringsTable.java:141)
	at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.<init>(ReadOnlySharedStringsTable.java:110)
Comment 9 Yegor Kozlov 2012-03-11 07:21:39 UTC
I relaxed this constraint in r1299338.

With this fix POI's eventusermodel supports parsing SST with missing Count and UniqueCount attributes. Please try the latest build from trunk.

Yegor