Bug 54211

Summary:	OutofMemory Exception while parsing large xlsx
Product:	POI	Reporter:	Sairam <sairampareek>
Component:	XSSF	Assignee:	POI Developers List <dev>
Status:	RESOLVED WORKSFORME
Severity:	normal
Priority:	P2
Version:	3.8-FINAL
Target Milestone:	---
Hardware:	PC
OS:	All

Description Sairam 2012-11-27 10:06:03 UTC

I have a 6.33 MB size xlsx file which i m trying to read. One file is read successfully but when I try to read multiple (4) files concurrently I m getting "outofmemory: java heap space".

Is there any workaround for this.

OPCPackage pkg = OPCPackage.open(fileNameOnDisk, PackageAccess.READ);
//here the error is thrown
ReadOnlySharedStringsTable sst = new ReadOnlySharedStringsTable(pkg);
XSSFReader r = new XSSFReader(pkg);
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");


java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Arrays.java:3209)
	at java.lang.String.<init>(String.java:215)
	at java.lang.StringBuffer.toString(StringBuffer.java:585)
	at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.endElement(ReadOnlySharedStringsTable.java:211)
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.readFrom(ReadOnlySharedStringsTable.java:143)
	at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.<init>(ReadOnlySharedStringsTable.java:112)

Comment 1 Nick Burch 2012-11-27 11:54:53 UTC

You need to increase your heap size, the default JVM heap is very small

If you really can't do that, you'll need to process the shared strings table differently. The code you're using buffers the shared strings table into memory for quick access when processing the slides. It's not usually too big, but it can be a noticable part of the file size. If you can't hold it all in ram, you'll need to process it in a streaming manner and store the id -> string lookup elsewhere (eg on disk, on another box in a KV store / cache) for use when handling the sheets