I have a 6.33 MB size xlsx file which i m trying to read. One file is read successfully but when I try to read multiple (4) files concurrently I m getting "outofmemory: java heap space". Is there any workaround for this. OPCPackage pkg = OPCPackage.open(fileNameOnDisk, PackageAccess.READ); //here the error is thrown ReadOnlySharedStringsTable sst = new ReadOnlySharedStringsTable(pkg); XSSFReader r = new XSSFReader(pkg); XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser"); java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.<init>(String.java:215) at java.lang.StringBuffer.toString(StringBuffer.java:585) at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.endElement(ReadOnlySharedStringsTable.java:211) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.readFrom(ReadOnlySharedStringsTable.java:143) at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.<init>(ReadOnlySharedStringsTable.java:112)
You need to increase your heap size, the default JVM heap is very small If you really can't do that, you'll need to process the shared strings table differently. The code you're using buffers the shared strings table into memory for quick access when processing the slides. It's not usually too big, but it can be a noticable part of the file size. If you can't hold it all in ram, you'll need to process it in a streaming manner and store the id -> string lookup elsewhere (eg on disk, on another box in a KV store / cache) for use when handling the sheets