Summary: | XSSFReader caused OutOfMemoryError when reading a large excel file in HDFS as inputStream | ||
---|---|---|---|
Product: | POI | Reporter: | Dejian Tu <dejian.tu> |
Component: | XSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED INVALID | ||
Severity: | enhancement | CC: | dev, pcllau |
Priority: | P2 | ||
Version: | 3.15-FINAL | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All |
Description
Dejian Tu
2017-01-10 00:20:06 UTC
1,000,000 rows is massive. That's nearly the maximum number of rows allowed per the file format specification. 140 MB file size is massive. Keep in mind that this is zipped XML files, and I would expect 90-95% compression for these files. Unzip this on your hard drive to see how much disk space is consumed when you expand it. It should be in the neighborhood of 1-3 GB. You're also opening the file via an input stream, which has some memory overhead. Therefore, 3.25 GB of memory consumption is reasonable in this case, considering input stream overhead, memory alignment, garbage collection, temporary files for unzipping, maintaining references to files in the unzipped directory structure, creating XML trees for the minimum files needed for XSSFReader. If you have any suggestions and could contribute a patch towards lowering XSSFReader's memory footprint, we'd greatly appreciate the help. (In reply to Dejian Tu from comment #0) > AFAIK, reading from a String path or File uses much less memory than reading > from inputStream for XSSFReader. But for my case, because the excel file is > in HDFS file system, we could not pass the HDFS path to XSSFReader directly. This sounds like a point for discussion on the mailing list (and perhaps Stack Overflow or other community to get suggestions on how to write a program that can deal with data stored on a distributed file system), and is not a bug without convincing evidence and a patch. I encounter the same question. 700,000 rows, size:139M I find use OPCPackage.open(inputStream) will get OutOfMemoryError,But use OPCPackage.open(filePath) run very well. |