Bug 54211 - OutofMemory Exception while parsing large xlsx
Summary: OutofMemory Exception while parsing large xlsx
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.8-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-27 10:06 UTC by Sairam
Modified: 2012-11-27 11:54 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sairam 2012-11-27 10:06:03 UTC
I have a 6.33 MB size xlsx file which i m trying to read. One file is read successfully but when I try to read multiple (4) files concurrently I m getting "outofmemory: java heap space".

Is there any workaround for this.

OPCPackage pkg = OPCPackage.open(fileNameOnDisk, PackageAccess.READ);
//here the error is thrown
ReadOnlySharedStringsTable sst = new ReadOnlySharedStringsTable(pkg);
XSSFReader r = new XSSFReader(pkg);
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");


java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Arrays.java:3209)
	at java.lang.String.<init>(String.java:215)
	at java.lang.StringBuffer.toString(StringBuffer.java:585)
	at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.endElement(ReadOnlySharedStringsTable.java:211)
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.readFrom(ReadOnlySharedStringsTable.java:143)
	at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.<init>(ReadOnlySharedStringsTable.java:112)
Comment 1 Nick Burch 2012-11-27 11:54:53 UTC
You need to increase your heap size, the default JVM heap is very small

If you really can't do that, you'll need to process the shared strings table differently. The code you're using buffers the shared strings table into memory for quick access when processing the slides. It's not usually too big, but it can be a noticable part of the file size. If you can't hold it all in ram, you'll need to process it in a streaming manner and store the id -> string lookup elsewhere (eg on disk, on another box in a KV store / cache) for use when handling the sheets