Bug 48085 - "OutOfMemoryError: Java heap space" while parsing defect XLS file
Summary: "OutOfMemoryError: Java heap space" while parsing defect XLS file
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.5-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-29 08:24 UTC by Leonhard Wimmer
Modified: 2009-11-03 11:05 UTC (History)
0 users



Attachments
POI throws OutOfMemoryError while parsing this short file (1.00 KB, application/vnd.ms-excel)
2009-10-29 08:24 UTC, Leonhard Wimmer
Details
screenshot of Excel 2008 for Mac error message when opening err.xls (26.75 KB, image/png)
2009-10-29 08:57 UTC, David Fisher
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Leonhard Wimmer 2009-10-29 08:24:42 UTC
Created attachment 24444 [details]
POI throws OutOfMemoryError while parsing this short file

An OutOfMemoryError is thrown while parsing a very short (1024 bytes long),
defect XLS file with
WorkbookFactory.create(inputStream);
(see attached file)

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at
org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java:82)
    at
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:164)
    at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:316)
    at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:297)
    at
org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:60)
Comment 1 Nick Burch 2009-10-29 08:27:07 UTC
Try increasing your java heap size - the default is very small

Excel files aren't very memory friendly to process, if you really do have memory issues, then you'll need to use the eventmodel (see the docs for details)
Comment 2 Leonhard Wimmer 2009-10-29 08:34:14 UTC
(In reply to comment #1)
> Try increasing your java heap size - the default is very small
> Excel files aren't very memory friendly to process, if you really do have
> memory issues, then you'll need to use the eventmodel (see the docs for
> details)

Increasing the heap size to 2gb doesn't help and the file is only of 1024 bytes size! I don't think the HSSF parser should take more than 2gb for a defect 1024 byte file!
Comment 3 David Fisher 2009-10-29 08:57:11 UTC
Created attachment 24445 [details]
screenshot of Excel 2008 for Mac error message when opening err.xls

This looks like a classic case of a malformed file causing an infinite recursion error. I have some concerns that this is an example of a DOS attack against any POI based spiders.

I do wonder how this file was created, was it a damaged file from a bad disk or a crash?

I think this is a bug, but I don't think it has a high priority. I suggest that the OP put a try-catch around opening the workbook. I know that this is not ideal.
Comment 4 Leonhard Wimmer 2009-10-29 09:14:22 UTC
(In reply to comment #3)
> This looks like a classic case of a malformed file causing an infinite
> recursion error. I have some concerns that this is an example of a DOS attack
> against any POI based spiders.

> I do wonder how this file was created, was it a damaged file from a bad disk or
> a crash?

I don't know how this file has been generated (I got it as bug report for a POI-enabled application). I don't think that it was maliciously generated. It looks to be like an XLS header with all nullbytes replaced with spaces. It' clearly invalid, but Apache POI normally throws a "normal" exception if used with invalid files and does not need huge amounts of memory.

> I think this is a bug, but I don't think it has a high priority. I suggest that
> the OP put a try-catch around opening the workbook. I know that this is not
> ideal.

Thank you for the tip. This seems to at least avoid the complete crash of the application in this case.
Comment 5 Josh Micich 2009-11-03 11:05:10 UTC
Fixed in svn r832505

junit added

The sample file has a corrupted header block. It looks like many of the bytes have been replaced with spaces.  Thus the reported number of allocation table blocks is 0x20202001. I put a restriction that this field must not exceed 0x0000FFFF (which might correspond to a file size of 4GB). The largest value found in the existing POI test data is 0x00000059.