Bug 48206 - Another HSSF OOM Problem - Small XLS File
Summary: Another HSSF OOM Problem - Small XLS File
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.5-FINAL
Hardware: PC Windows Server 2003
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2009-11-16 21:12 UTC by Jacob Steijn
Modified: 2009-11-18 18:54 UTC (History)
0 users

this zip has a sheet that kills hssf, and the larger file is a spreadsheet with VBA that generates the killer sheet. (24.97 KB, application/x-zip)
2009-11-16 21:12 UTC, Jacob Steijn
Java module that calls hssf, to set context of failure. (10.09 KB, text/plain)
2009-11-17 21:36 UTC, Jacob Steijn

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Steijn 2009-11-16 21:12:39 UTC
Created attachment 24545 [details]
this zip has a sheet that kills hssf, and the larger file is a spreadsheet with VBA that generates the killer sheet.

I have a spreadsheet generated by Excel 2003 that reliably triggers POI HSSF to
ask for all the memory my JVM has to give and then asks for more, causing a
heap dump etc.  I have not seen this in bugtrack or on the user lists - for
small sheets - and wonder if perhaps this is a unique error.

My spreadsheet is only 22 records, and about 11 columns. All data is general or
text format. My application is running under Java 1.4, a 2GB JVM on a quad
processor server with 8 GB of physical memory. No other applications on the
server. The application is IBM Maximo Asset Manager 6.2 on a Websphere 6.0.2_27
 application server.

Attached are: the XLS that kills HSSF, the XLS workbook with the VBA script
that creates that file, sample log data with stack trace.

This was originally noted on a system using POI 2.5 FINAL. Subsequently tested
on the same system with the POI 3.5 final library in place, no change in

Note, there are twenty to thirty PCs which produce Excel sheets that we access
with the POI API, only two machines are definitely associated with creating
sheets with this pathological condition. For all I know they are running
defective versions of DAO or some other MS toolkit. The cause is irrelevant. I
would not expect HSSF to be able to read any file we cook up, but I would like
to have a method to protect my application from what is in effect a denial of

Thanks to anyone who can provide a patch, a workaround or any other trick we
can use to protect our system while preserving the functionality.

Stack trace:
[11/11/09 6:50:04:243 EST] 00000038  O UOW=  source=SystemOut org=IBM
prod=WebSphere component=Application Server
          2009-11-11 06:50:04,227   ERROR   maximo.crontaskmanager - 
    at java.util.ArrayList.add(ArrayList.java(Compiled Code))
    at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:210)
    at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:191)
    at psdi.server.CronTaskManager$CronThread.run(CronTaskManager.java:1297)
Comment 1 Josh Micich 2009-11-17 12:47:18 UTC
I experimented a little with the smaller the XLS file from your zip
(name: "Stine Nov. 9  lk.xls", md5sum: bba472f37e3df4a8fb9a83459b54bec6).

It appears that this file is definitely broken.  Excel 2003 and 2007 both report data loss (several times) while opening this file.  So while it is probably OK for POI to throw some sort of exception, I agree that OutOfMemoryError is not the right one.

Having said that, it appears that the latest POI (svn trunk) does *not* throw OOME.  The line numbers in the stack trace you supplied don't correspond to any recent version of POI.  This leads me to believe that the bug has been fixed since whatever version of POI you are running.  I have marked this bug as 'fixed', but please re-open if you can show the OOME on a more recent version of POI.

For the record, here is the exception I get:
java.lang.RuntimeException: Buffer underrun - requested 2 bytes but 1 was available
	at org.apache.poi.poifs.filesystem.DocumentInputStream.checkAvaliable(DocumentInputStream.java:202)
	at org.apache.poi.poifs.filesystem.DocumentInputStream.readUShort(DocumentInputStream.java:300)
	at org.apache.poi.poifs.filesystem.DocumentInputStream.readShort(DocumentInputStream.java:220)
	at org.apache.poi.hssf.record.RecordInputStream.readShort(RecordInputStream.java:234)
	at org.apache.poi.hssf.record.PrintSetupRecord.<init>(PrintSetupRecord.java:81)
	... 10 more
Comment 2 David Fisher 2009-11-17 12:57:06 UTC
Could have been fixed with this one - https://issues.apache.org/bugzilla/show_bug.cgi?id=48085
Comment 3 Jacob Steijn 2009-11-17 21:36:01 UTC
Created attachment 24555 [details]
Java module that calls hssf, to set context of failure.
Comment 4 Jacob Steijn 2009-11-17 21:45:31 UTC
Just re-reviewed the stack trace I just referenced and compared to the original stack trace. The line numbers are the same. That throws me, because I absolutely positively removed the POI 2.5 jar and replaced it with the POI 3.5 jar, both in my Eclipse installation (and referenced external jars) and in the target system library. 

At this time on the east coast I am mentally running on empty, but I've got another local resource who can give me a second set of eyes on the setup to see if he can spot something I've missed. I would think that rebuilding with the newer jar would at least give us  new line numbers, if not a better result.
Comment 5 Jacob Steijn 2009-11-18 18:54:43 UTC
Issue occurs in POIN 2.5, the functions used are not present in 3.5. Time to update the code and test again