I try to read .xlsx file of 1 million rows by using usermodel api, the file size is 11.2 MB, the jvm throw Exception : Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:115) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:55) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:82) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:220) at SXSSFWorkbookTest.test2(SXSSFWorkbookTest.java:51) at SXSSFWorkbookTest.main(SXSSFWorkbookTest.java:37) To resolve the issue, i use the eventmodel api, that is ok!. But i want to know why the usermodel api take so much memory when parseing huge rows file???
(In reply to comment #0) > I try to read .xlsx file of 1 million rows by using usermodel api, the file > size is 11.2 MB, the jvm throw Exception : code: InputStream inp = new FileInputStream("100w.xlsx"); Workbook wb = WorkbookFactory.create(inp); exception: > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2786) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) > at > org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:115) > at > org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:55) > at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:82) > at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:220) > at SXSSFWorkbookTest.test2(SXSSFWorkbookTest.java:51) > at SXSSFWorkbookTest.main(SXSSFWorkbookTest.java:37) > To resolve the issue, i use the eventmodel api, that is ok!. > But i want to know why the usermodel api take so much memory when parseing huge > rows file???
As discussed many times on the list, the usermodel loads everything into memory, so you need lots of memory available to hold everything. The event model just does one little bit at a time, so is much lower memory footprint (but you can't do random access)